idnits 2.17.1 draft-ietf-dhc-failover-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 33 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** The abstract seems to contain references ([RFC2131]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1414 has weird spacing: '...od ends lease...' == Line 1910 has weird spacing: '...eserved not...' == Line 2068 has weird spacing: '... htype chad...' == Line 2402 has weird spacing: '... Len reque...' == Line 5138 has weird spacing: '...ore the expir...' == (1 more instance...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests, and the algorithm for load balancing described in section 5.3 MUST NOT be used. When allocating new IP addresses, each server allocates from its own IP address pool, where the primary MUST allocate only FREE IP addresses, and the secondary MUST allocate only BACKUP IP addresses. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease on an IP address irrespec-tive of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the potential-expiration-time already ack-nowledged by the other server or the lease-expiration-time or potential-expiration-time received from the partner server. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests, and any load balancing (described in section 5.3) MUST NOT be used. When allocating new IP addresses, each server SHOULD allocate from its own IP address pool (if that can be determined), where the primary MUST allocate only FREE IP addresses, and the secondary MUST allocate only BACKUP IP addresses. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease on an IP address irrespective of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the potential-expiration-time already acknowledged by the other server or the lease-expiration-time or potential-expiration-time received from the partner server. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 2000) is 8776 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 657 == Missing Reference: 'USERCLASS' is mentioned on line 5371, but not defined == Missing Reference: 'IPAMTLS' is mentioned on line 5236, but not defined == Unused Reference: 'RFC 2132' is defined on line 5339, but no explicit reference was found in the text == Unused Reference: 'IMAPTLS' is defined on line 5348, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2487 (ref. 'SMTPTLS') (Obsoleted by RFC 3207) -- Possible downref: Non-RFC (?) normative reference: ref. 'NAMESPACE' -- Possible downref: Non-RFC (?) normative reference: ref. 'DDNS' ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. 'MD5') ** Obsolete normative reference: RFC 2139 (ref. 'RADIUS') (Obsoleted by RFC 2866) -- Possible downref: Non-RFC (?) normative reference: ref. 'LOADB' -- Possible downref: Non-RFC (?) normative reference: ref. 'AGENTINFO' Summary: 11 errors (**), 0 flaws (~~), 14 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Ralph Droms 2 INTERNET DRAFT Bucknell University 4 Kim Kinnear 5 Mark Stapp 6 Cisco Systems 8 Bernie Volz 9 Steve Gonczi 10 Process Software 12 Greg Rabil 13 Mike Dooley 14 Arun Kapur 15 Lucent Technologies 17 October 1999 18 Expires April 2000 20 DHCP Failover Protocol 21 23 Status of this Memo 25 This document is an Internet-Draft and is in full conformance with 26 all provisions of Section 10 of RFC2026. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet- Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 Copyright Notice 46 Copyright (C) The Internet Society (1999). All Rights Reserved. 48 Abstract 50 DHCP [RFC 2131] allows for multiple servers to be operating on a 51 single network. Some sites are interested in running multiple 52 servers in such a way so as to provide redundancy in case of server 53 failure. In order for this to work reliably, the cooperating primary 54 and secondary servers must maintain a consistent database of the 55 lease information. This implies that servers will need to coordinate 56 any and all lease activity so that this information is synchronized 57 in case of failover. 59 This document defines a protocol to provide this synchronization 60 between two servers. One server is designated the "primary" server, 61 the other is the "secondary" server. This document also describes a 62 way to integrate the failover protocol with the DHCP loadbalancing 63 approach. 65 This document is a significant revision of draft-ietf-dhc-failover- 66 04.txt. 68 Table of Contents 70 1. Introduction................................................. 4 71 2. Terminology.................................................. 5 72 2.1. Requirements terminology................................... 5 73 2.2. DHCP and failover terminology.............................. 5 74 3. Background and External Requirements......................... 8 75 3.1. Key aspects of the DHCP protocol........................... 8 76 3.2. BOOTP relay agent implementation........................... 10 77 3.3. What does it mean if a server can't communicate with its partner? 11 78 3.4. Challenging scenarios for a Failover protocol.............. 12 79 3.5. Using TCP to detect partner server failure................. 13 80 4. Design Goals................................................. 14 81 4.1. Design requirements for this protocol...................... 14 82 4.2. Goals for this protocol.................................... 15 83 4.3. Limitations of this Protocol............................... 16 84 5. Protocol Overview............................................ 16 85 5.1. Messages and States........................................ 17 86 5.2. Fundamental restrictions................................... 19 87 5.3. Load balancing............................................. 26 88 5.4. Operating in NORMAL state.................................. 27 89 5.5. Operating in COMMUNICATIONS-INTERRUPTED state.............. 27 90 5.6. Operating in PARTNER-DOWN state............................ 27 91 5.7. Operating in RECOVER state................................. 28 92 5.8. Operating in STARTUP state................................. 28 93 5. Protocol Overview (continued) 94 5.9. Time synchronization between servers....................... 28 95 5.10. IP address binding-status................................. 29 96 5.11. DNS dynamic update considerations......................... 34 97 5.12. Reservations and failover................................. 38 98 5.13. Dynamic BOOTP and failover................................ 39 99 5.14. Guidelines for selecting MCLT............................. 39 100 6. Packet Formats............................................... 40 101 6.1. Common message format...................................... 40 102 6.2. Common option format....................................... 43 103 6.3. BNDUPD message format...................................... 55 104 6.4. BNDACK message format...................................... 58 105 6.5. Bulking for BNDUPD and BNDACK messages..................... 59 106 6.6. UPDREQ message format...................................... 60 107 6.7. UPDREQALL message format................................... 60 108 6.8. UPDDONE message format..................................... 60 109 6.9. POOLREQ message format..................................... 61 110 6.10. POOLRESP message format................................... 61 111 6.11. CONNECT message format.................................... 62 112 6.12. CONNECTACK message format................................. 62 113 6.13. STATE message format...................................... 63 114 6.14. CONTACT message format.................................... 64 115 6.15. DISCONNECT message format................................. 64 116 7. Protocol Messages............................................ 64 117 7.1. BNDUPD message............................................. 64 118 7.2. BNDACK message............................................. 75 119 7.3. UPDREQ message............................................. 76 120 7.4. UPDREQALL message.......................................... 78 121 7.5. UPDDONE message............................................ 79 122 7.6. POOLREQ message............................................ 80 123 7.7. POOLRESP message........................................... 81 124 7.8. CONNECT message............................................ 81 125 7.9. CONNECTACK message......................................... 85 126 7.10. STATE message............................................. 88 127 7.11. CONTACT message........................................... 89 128 7.12. DISCONNECT message........................................ 89 129 8. Connection Management........................................ 90 130 8.1. Connection granularity..................................... 90 131 8.2. Creating the TCP connection................................ 90 132 8.3. Using the TCP connection for determining communications status 91 133 8.4. Using the TCP connection for binding data.................. 93 134 8.5. Using the TCP connection for control messages.............. 94 135 8.6. Losing the TCP connection.................................. 94 136 9. Protocol States.............................................. 94 137 9.1. Server Initialization...................................... 95 138 9.2. Server State Transitions................................... 95 139 9.3. STARTUP state.............................................. 98 140 9.4. PARTNER-DOWN state......................................... 100 141 9.5. RECOVER state.............................................. 102 142 9.6. NORMAL state............................................... 104 143 9.7. COMMUNICATIONS-INTERRUPTED State........................... 107 144 9.8. POTENTIAL-CONFLICT state................................... 110 145 9.9. RESOLUTION-INTERRUPTED state............................... 111 146 9.10. RECOVER-DONE state........................................ 112 147 9.11. PAUSED state.............................................. 113 148 9.12. SHUTDOWN state............................................ 113 149 10. Safe Period................................................. 114 150 11. Security.................................................... 116 151 11.1. Simple shared secret...................................... 116 152 11.2. TLS....................................................... 117 153 12. Acknowledgments............................................. 117 154 13. References.................................................. 119 155 14. Author's information........................................ 120 156 15. Full Copyright Statement.................................... 121 158 1. Introduction 160 DHCP [RFC 2131] allows for multiple servers to be operating on a sin- 161 gle network. Some sites are interested in running multiple servers 162 in such a way so as to provide redundancy in case of server failure 163 since the DHCP subsystem is in many cases a critical part of the net- 164 work infrastructure. 166 This document defines a protocol to provide synchronization between 167 two servers in order that each can take over for the other should 168 either one fail or become unreachable. 170 One server is designated the "primary" server, the other is the 171 "secondary" server, and all DHCP client requests are sent to each 172 server. 174 In order to provide a high availability DHCP service, these 175 cooperating primary and secondary servers must maintain a consistent 176 database of lease information. This implies that servers will need 177 to coordinate any and all lease activity so that this information is 178 synchronized in case failover is required. The protocol messages and 179 processing techniques required to maintain a consistent database are 180 specified in the protocol described here. 182 The failover protocol also contains an algorithm which allows each 183 server to determine to which DHCP clients it should provide service 184 when both servers are operating normally, and this capability can be 185 used to support load balancing. 187 2. Terminology 189 This section discusses both the generic requirements terminology com- 190 mon to many IETF protocol specifications as well as specialized DHCP 191 and failover protocol specific terminology. 193 2.1. Requirements terminology 195 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 196 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 197 document are to be interpreted as described in RFC 2119 [RFC 2119]. 199 2.2. DHCP and failover terminology 201 This document uses the following terms: 203 o "DHCP client" or "client" 205 A DHCP client is an Internet host using DHCP to obtain confi- 206 guration parameters such as a network address. The term 207 "client" used within this document always means a DHCP client, 208 and never one of the two failover servers. 210 o "DHCP server" or "server" 212 A DHCP server is an Internet host that returns configuration 213 parameters to DHCP clients. 215 o "binding" 217 A binding is a collection of configuration parameters, including 218 at least an IP address, associated with or "bound to" a DHCP 219 client. Bindings are managed by DHCP servers. 221 o "binding database" 223 The collection of bindings managed by a primary and secondary. 225 o "failover endpoint" 227 The failover protocol allows for there to be a unique failover 228 endpoint per partner per role (where role is primary or secon- 229 dary). This failover endpoint can take actions and hold unique 230 states. There are thus a maximum of two failover endpoints per 231 server per partner (one for each partner as a primary and one 232 for that same partner as a secondary.) 234 o "lazy update" 236 Lazy update refers to the requirement placed on a server imple- 237 menting a failover protocol to update its failover partner when- 238 ever the binding database changes. A failover protocol which 239 didn't support lazy update would require the failover partner 240 update to be complete before a DHCP server could respond to a 241 DHCP client request with a DHCPACK. A failover protocol which 242 does support lazy update places no such restriction on the 243 update of the failover partner server, and so a server can allo- 244 cate an IP address or extend a lease on an IP address and then 245 update its failover partner as time permits. A failover proto- 246 col which supports lazy update not only removes the requirement 247 to update the failover partner prior to responding to a DHCP 248 client with a DHCPACK, but also allows gathering up batches of 249 updates from one failover server to its partner. 251 o "subnet address pool" 253 A subnet address pool is the set of IP address which is associ- 254 ated with a particular network number and subnet mask. In the 255 simple case, there is a single network number and subnet mask 256 and a set of IP addresses. In the more complex case (sometimes 257 called "secondary subnets", sometimes "superscopes"), several 258 (apparently unrelated) network number and subnet mask combina- 259 tions with their associated IP addresses may all be configured 260 together into one subnet address pool. 262 o "Primary server" or "Primary" 264 A DHCP server configured to provide primary service to a set of 265 DHCP clients for a particular set of subnet address pools. 267 o "Secondary server" or "Secondary" 269 A DHCP server configured to act as backup to a primary server 270 for a particular set of subnet address pools. 272 o "stable storage" 274 Every DHCP server is assumed to have some form of what is called 275 "stable storage". Stable storage is used to hold information 276 concerning IP address bindings (among other things) so that this 277 information is not lost in the event of a server failure which 278 requires restart of the server. 280 o "MCLT" 281 The MCLT refers to maximum client lead time. This time is con- 282 figured on the primary server and transmitted from the primary 283 to the secondary server in the CONNECT message. It is the max- 284 imum amount of time that one server can give to a client for a 285 binding beyond that known and ACKed by the partner server. See 286 section 5.2.1 for details. 288 o "DNS" 290 An abbreviation for "Domain Name System", a scheme where a cen- 291 tral name repository is used to map names to IP addresses and IP 292 addresses to names. 294 o "FQDN" 296 An FQDN is a "fully qualified domain name". A fully qualified 297 domain name generally is a host name with at least one zone 298 name, for example "www.dhcp.org" is a fully qualified domain 299 name. 301 o "partner" 303 A "partner", for the purposes of this document, refers to a 304 failover server, typically the other failover server. In many 305 (if not most) cases, the failover protocol is symmetric with 306 respect to the primary or secondary nature of the servers, and 307 so it is often appropriate to dicuss "updating the partner 308 server", since it could be a primary server updating a secondary 309 server or a secondary server updating a primary server. 311 o "RR" 313 "RR" is an abbreviation for "resource record". All records in 314 the DNS are resource records. The resource records of most 315 relevance to this document are the "A" resource record, which 316 maps a DNS name to a particular IP address, the "PTR" resource 317 record, which allows a "reverse map", from the IP address back 318 to a DNS name, and the "KEY" resource record, which is used in 319 ways defined in [DDNS] to tag a DNS name with the identity of 320 the DHCP client with which it is associated. 322 o "DDNS" 324 An abbreviation for "Dynamic DNS", which refers to the capabil- 325 ity to update a DNS server's name (actually resource record) 326 database using an on-the-wire protocol defined in [RFC2136]. 328 o "binding-status" 329 The binding-status is the status of an IP address with respect 330 to its association with a client. There are specific binding- 331 status values defined for use by the failover protocol, e.g., 332 ACTIVE, FREE, RELEASED, ABANDONED, etc. These are designed to 333 map more or less directly onto the binding-status values used 334 internally in most DHCP server implementations. The term 335 binding-status refers to the concept also sometimes known as 336 "lease state" or "IP address state", but in this document the 337 term "state" is reserved for the failover state of a failover 338 endpoint, and binding-status is always used to refer to the 339 state associated with an IP address or lease. 341 3. Background and External Requirements 343 This section highlights key aspects of the DHCP protocol on which the 344 failover protocol depends. It also discusses the requirements that 345 the failover protocol places on other aspects of the network infras- 346 tructure, and some general issues surrounding server failure detec- 347 tion. Some failure scenarios that provide particular challenges to a 348 failover protocol are discussed. Finally, the challenges inherent in 349 using a TCP connection as a means to detect failure of a partner 350 server are elaborated. 352 3.1. Key aspects of the DHCP protocol 354 The failover protocol is designed to augment the DHCP protocol as 355 described in RFC 2131 [RFC 2131]. There are several key aspects of 356 the DHCP protocol which are required by the failover protocol in 357 order to successfully meet its design goals. 359 3.1.1. Broadcast behavior 361 There are two aspects of the broadcast behavior of the DHCP protocol 362 which are key to making the failover protocol operate successfully. 363 The first is simply that the DHCP protocol requires a DHCP client to 364 broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages. 365 Because of this requirement, a DHCP client who was communicating with 366 one server will automatically be able to communicate with another 367 server if one is available. 369 The second aspect of broadcast behavior is similar to the first, but 370 involves the distinction between a DHCPREQUEST/RENEW and 371 DHCPREQUEST/REBINDING. A DHCPREQUEST/RENEW is the message that a 372 DHCP client uses to extend its lease. It is unicast to the DHCP 373 server from which it acquired the lease. However, the DHCP protocol 374 (in a farsighted move), was explicitly designed so that in the event 375 that a DHCP client cannot contact the server from which it received a 376 lease on an IP address using a DHCPREQUEST/RENEW, the client is 377 required to broadcast its renewal using a DHCPREQUEST/REBINDING to 378 any available DHCP server. Since all DHCP clients were required to 379 implement this algorithm, the failover protocol can have a different 380 server from the one that initially granted a lease be the server to 381 renew a lease. Thus, one server can take over for another with no 382 interruption in the service as experience by the DHCP client or its 383 associated applications software. 385 3.1.2. Client responsibility 387 In the DHCP protocol the DHCP clients are entrusted with a consider- 388 able responsibility. In particular, after they are granted a lease 389 on an IP address, they are enjoined to only use that IP address while 390 their lease is valid. Every DHCP client is expected to stop using an 391 IP address if the expiration time on the lease has passed and if it 392 cannot get an extension on the lease for that IP address from some 393 DHCP server. Thus, the correct behavior of every DHCP client in this 394 regard is required to ensure the integrity of the DHCP service. On 395 the other hand, incorrect behavior by a client in this area will tend 396 to adversely affect at most one other DHCP client. 398 Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or 399 DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or 400 broadcast for a REBINDING) MUST still have time to run on the lease 401 for that IP address. The DHCP server sends the DHCPACK back unicast 402 to the IP address from which the RENEW or REBINDING originated. 404 Given the existing responsibility placed on the client to only use an 405 IP address when the lease is valid, and to only send in a RENEW or 406 REBINDING if the lease is valid, the failover protocol relies on DHCP 407 clients to perform responsibly and will, in the absence of conflict- 408 ing information, believe a DHCP client that is attempting to RENEW or 409 REBIND a lease on an IP address is the legitimate owner of that IP 410 address. 412 If clients do not follow these rules, it is possible for an address 413 to be in use by more than one client. For a single server, this hap- 414 pens because the server has leased the expired address to another 415 client and the original client is also attempting to use the address. 416 The server would NAK the renewal request. This is made slightly worse 417 in the failover protocol if the two servers are unable to communicate 418 with each other and one server leases an available address to a new 419 client while the other server receives a renewal from a different 420 client. In this case, both servers lease the same address to dif- 421 ferent clients for the MCLT time. 423 One troublesome issue is that of the DHCP client responsibility when 424 sending in DHCPREQUEST/INIT-REBOOT requests. While the original DHCP 425 RFC was written to require a DHCP client to have time left to run on 426 the lease for an IP address if the client is sending an INIT-REBOOT 427 request, it was sufficiently unclear that some client vendors didn't 428 realize this until recently. Since the INIT-REBOOT request was sent 429 with the IP address in the dhcp-requested-address option and not in 430 the ciaddr (for perfectly good reasons), the similarity to the RENEW 431 and REBINDING case was lost on many people. 433 At present, the failover protocol does not assume that a client send- 434 ing in an INIT-REBOOT request necessarily has a valid lease on the IP 435 address appearing in the dhcp-requested-address option in the INIT- 436 REBOOT request. 438 The implications of this are as follows: Assume that there is a DHCP 439 client that gets a lease from one server while that server is unable 440 to communicate with its failover partner. Then, assume that after 441 that client reboots it is able only to communicate with the other 442 failover server. If the failover servers have not been able to com- 443 municate with each other during this process, then the DHCP client 444 will get a new IP address instead of being able to continue to use 445 its existing IP address. This will affect no applications on the DHCP 446 client, since it is rebooting. However, it will use up an additional 447 IP address in this marginal case. 449 3.1.3. Stable storage update before DHCPACK 451 The DHCP protocol allocates resources, and in order to operate 452 correctly it requires that a DHCP server update some form of stable 453 storage prior to sending a DHCPACK to a DHCP client in order to grant 454 that client a lease on an IP address. 456 One of the goals of the failover protocol is that it not add signifi- 457 cant additional time to this already time consuming requirement to 458 update stable storage prior to a DHCPACK. In particular, adding a 459 requirement to communicate with another server prior to sending a 460 DHCPACK would simplify the failover protocol, but it would limit the 461 potential scalability of any DHCP server which employed the failover 462 protocol in an unacceptable manner. 464 3.2. BOOTP relay agent implementation 466 Many DHCP clients are not resident on the same network segment as a 467 DHCP server. In order to support this form of network architecture, 468 most contemporary routers implement something known as a BOOTP Relay 469 Agent. This capability inside of a router listens for all broadcasts 470 at the DHCP port, port 67, and will relay any broadcasts that it 471 receives on to a DHCP server. The IP address of the DHCP server must 472 have been previously configured into the router. As part of the 473 relay process, the relay agent will place the address of the inter- 474 face on which it received the broadcast into the giaddr field of the 475 DHCP packet. 477 Since the failover protocol requires two DHCP servers to receive any 478 broadcast DHCP messages, in order to work with DHCP clients which are 479 not local to the DHCP server, the BOOTP relay agent on the router 480 closest to the DHCP client must be configured to point at more than 481 one DHCP server. 483 Most BOOTP relay agent implementations allow this duplication of 484 packets. 486 If this is not possible, an administrator might be able to configure 487 the relay agent with a subnet broadcast address, but in this case the 488 primary and secondary DHCP servers in a failover pair must both 489 reside on the same subnet. While this is a realistic configuration, 490 it is not the one that most people will use. 492 3.3. What does it mean if a server can't communicate with its partner? 494 In any protocol designed to allow one server to take over some 495 responsibilities from a partner server in the event of "failure" of 496 that partner server, there is an inherent difficulty in determining 497 when that partner server has failed. 499 In fact, it is fundamentally impossible for one server to distinguish 500 a network communications failure from the outright failure of the 501 server to which it is trying to communicate. In the case where each 502 server is handing out resources (in this case IP addresses) to a 503 client community, mistaking an inability to communicate with a 504 partner server for failure of that partner server could easily cause 505 both servers to be handing out the same IP addresses to different 506 clients. 508 One way that this is sometimes handled is for there to be more than 509 two servers. In the case of an odd number of servers, the servers 510 that can still communicate with a majority of other servers will con- 511 sider themselves operational, and any server which can't communicate 512 to a majority of other servers must immediately cease operations. 514 While this technique works in some domains, having the only server to 515 which a DHCP client can communicate voluntarily shut itself down 516 seems like something worth avoiding. 518 The failover protocol will operate correctly while both servers are 519 unable to communicate, whether they are both running or not. At some 520 point there may be resource contention, and if one of the servers is 521 actually down, then the operator can inform the other server and the 522 operational server will be able to use all of the downed server's 523 resources. 525 The protocol also allows detection of an orderly shutdown of a parti- 526 cipating server. 528 3.4. Challenging scenarios for a Failover protocol 530 There exist two failure scenarios which provide particular challenges 531 the correctness guarantees of a failover protocol. 533 3.4.1. Primary Server crash before "lazy" update: 535 In the case where the primary server sends a DHCPACK to a client for 536 a newly allocated IP address and then crashes prior to sending the 537 corresponding update to the secondary server, the secondary server 538 will have no record of the IP address allocation. When the secondary 539 server takes over, it may well try to allocate that IP address to a 540 different client. In the case where the first client to receive the 541 IP address is not on the net at the time (yet while there was still 542 time to run on its lease), an ICMP echo (i.e., ping) will not prevent 543 the secondary server from allocating that IP address to a different 544 client. 546 The failover protocol deals with this situation by having the primary 547 and secondary servers allocate addresses for new clients from dis- 548 joint address pools. See section 5.4 for details. 550 A more likely (in that DHCPRENEWs are presumably more common than 551 DHCPDISCOVERs) and more subtle version of this problem is where the 552 primary server crashes after extending a client's lease time, and 553 before updating the secondary with a new time using a lazy update. 554 After the secondary takes over, if the client is not connected to the 555 network the secondary will believe the client's lease has expired 556 when, in fact, it has not. In this case as well, the IP address 557 might be reallocated to a different client while the first client is 558 still using it. 560 This scenario is handled by the failover protocol through control of 561 the lease time and the use of the maximum client lead time (MCLT). 562 See section 5.2.1 for details. 564 3.4.2. Network partition where DHCP servers can't communicate but each 565 can talk to clients: 567 Several conditions are required for this situation to occur. First, 568 due to a network failure, the primary and secondary servers cannot 569 communicate. As well, some of the DHCP clients must be able to com- 570 municate with the primary server, and some of the clients must now 571 only be able to communicate with the secondary server. When this 572 condition occurs, both primary and secondary servers could attempt to 573 allocate IP addresses for new clients from the same pool of available 574 addresses. At some point, then, two clients will end up being allo- 575 cated the same IP address. This will cause problems when the network 576 failure that created this situation is corrected. 578 The failover protocol deals with this situation by having the primary 579 and secondary servers allocate addresses for new clients from dis- 580 joint address pools. See section 5.4 for details. 582 3.5. Using TCP to detect partner server failure 584 There are several characteristics of TCP that are important to the 585 functioning of the failover protocol, which uses one TCP connection 586 for both bulk data transfer as well as to assess communications 587 integrity with the other server. Reliable and ordered message 588 delivery are chief among these important characteristics. 590 It would be nice to use the capabilities built in to TCP to allow it 591 to determine if communications integrity exists to the failover 592 partner but this strategy contains some problems which require 593 analysis. There exist three fundamental cases for an open TCP con- 594 nection that must be examined. 596 1. When no data is being sent then no messages are traveling 597 across the TCP connection. 599 2. When data is queued to be sent, and the receiver has not 600 blocked the sending of additional data, then messages are 601 flowing across the TCP connection containing the applications 602 data. 604 3. When data is queued to be sent, and the receiver has blocked 605 the transmission of additional data, then persist messages are 606 flowing from the receiver to the sender to ensure that the 607 sender doesn't miss the receiver opening the window for 608 further transmissions. 610 The first case can be turned into the second case by sending 611 application-level keep-alive messages periodically when there is no 612 other data queued to be sent. Note TCP keep-alive messages might be 613 used as well, but they present additional problems. 615 Thus, we can ensure that the TCP connection has messages flowing 616 periodically across the connection fairly easily. The question 617 remains as to what TCP will do if the other end of the connection 618 fails to respond (either because of network partition or because the 619 receiving server crashes). TCP will attempt to retransmit a message 620 with an exponential backoff, and will eventually timeout that 621 retransmission. However, the length of that timeout cannot, in gen- 622 eral, be set on a per-connection basis, and is frequently as long as 623 nine minutes, though in some cases it may be as short as two minutes. 624 One some systems it can be set system-wide, while on some systems it 625 cannot be changed at all. 627 A value for this timeout that would be appropriate for the failover 628 protocol, say less than 1 minute, could have unpleasant side-effects 629 on other applications running on the same server, assuming that it 630 could be changed at all on the host operating system. 632 Nine minutes is a long time for the DHCP service to be unavailable to 633 any new clients that were being served by the server which has 634 crashed, when there is another server running that could respond to 635 them immediately as soon as it determines that its partner is not 636 operational. 638 The conclusion drawn from this analysis is that TCP provides very 639 useful support for the failover protocol in the areas of reliable and 640 ordered message delivery, but cannot by itself be relied upon to 641 detect partner server failure in a fashion acceptable to the needs of 642 the failover protocol. Additional failover protocol capabilities 643 will need to be created to support timely detection of partner server 644 failure. See section 8.3 for details on this mechanism. 646 4. Design Goals 648 This section lists the design requirements, the design goals, and the 649 limitations of the failover protocol. 651 4.1. Design requirements for this protocol 653 The following list of requirements must be (and are) met by this pro- 654 tocol. They are listed in priority order. 656 1. Implementations of this protocol must work with existing DHCP 657 client implementations based on the DHCP protocol [1]. 659 2. Implementations of the protocol must work with existing BOOTP 660 relay agent implementations. 662 3. The protocol must provide failover redundancy between servers 663 that are not located on the same subnet. 665 4.2. Goals for this protocol 667 The following goals are met by this protocol as well, though they are 668 less important than the requirements listed above. These goals are 669 listed in priority order. 671 1. Provide for continued service to DHCP clients through an 672 automated mechanism in the event of failure of the primary 673 server. 675 2. Avoid binding an IP address to a client while that binding is 676 currently valid for another client. In other words, do not 677 allocate the same IP address to two clients. 679 3. Minimize any need for manual administrative intervention. 681 4. Introduce no additional delays in server response time as a 682 result of the network communications required to implement the 683 failover protocol, i.e., don't require communications with the 684 partner between the receipt of a DHCPREQUEST and the 685 corresponding DHCPACK. 687 5. Share IP address ranges between primary and secondary servers; 688 i.e., impose no requirement that the pool of available 689 addresses be divided between servers. 691 6. Continue to meet the goals and objectives of this protocol in 692 the event of server failure or network partition. 694 7. Provide graceful reintegration of full protocol service after 695 server failure or network partition. 697 8. Allow for one computer to act as a secondary server for multi- 698 ple primary servers. Other topologies (e.g.: mesh) are also 699 possible. primary and secondary servers SHOULD be viewed as 700 "logical" servers and not necessarily physical computers. 702 9. Ensure that an existing client can keep its existing IP 703 address binding if it can communicate with either the primary 704 or secondary DHCP server implementing this protocol - not just 705 whichever server that originally offered it the binding. 707 10. Ensure that a new client can get an IP address from some 708 server. Ensure that in the face of partition, where servers 709 continue to run but cannot communicate with each other, the 710 above goals and requirements may be met. In addition, when the 711 partition condition is removed, allow graceful automatic re- 712 integration without requiring human intervention. 714 11. If either primary or secondary server loses all of the infor- 715 mation that is has stored in stable storage, it should be able 716 to refresh its stable storage from the other server. 718 12. Support load balancing between the primary and secondary 719 servers, and allow configuration of the percentage of the 720 client population served by each with a moderately fine granu- 721 larity. 723 4.3. Limitations of this Protocol 725 The following are explicit limitations of this protocol. 727 1. This protocol provides only one level of redundancy through a 728 single secondary server for each primary server. 730 2. A subset of the address pool is reserved for secondary server 731 use. In order to handle the failure case where both servers 732 are able to communicate with DHCP clients, but unable to com- 733 municate with each other, a subset of the IP address pool must 734 be set aside as a private address pool for the secondary 735 server. The secondary can use these to service newly arrived 736 DHCP clients during such a period. The size of this private 737 pool SHOULD be based only on the arrival rate of new DHCP 738 clients and the length of expected downtime, and is not influ- 739 enced in any way by the total number of DHCP clients supported 740 by the server pair. 742 The failover protocol can be used in a mode where both the 743 primary and secondary servers can share the load between them 744 when both are operating. In this loadbalancing mode, the 745 addresses allocated by the primary server to the secondary 746 server are not unused, but are used instead to service the 747 portion of the client base which to which the secondary server 748 is required to respond. See section 5.3 for more information 749 on loadbalancing. 751 3. The primary and secondary servers do not respond to client 752 requests at all while recovering from a failure that could 753 have resulted in duplicate IP assignments. (When synchroniz- 754 ing in POTENTIAL-CONFLICT state). 756 5. Protocol Overview 758 This section will discuss the failover protocol at a relatively high 759 level of detail. In the event that a description in this section 760 conflicts (or appears to conflict due to the overview nature of this 761 section) with information in later sections of this draft, the infor- 762 mation in the later sections should be considered authoritative. 764 5.1. Messages and States 766 This protocol is centered around the message exchange used by one 767 server to update the other server of binding database changes result- 768 ing from DHCP client activity: 770 o Communication of binding database changes 772 The binding update (BNDUPD) message is used to send the binding 773 database changes to the partner server, and the partner server 774 responds with a binding acknowledgement (BNDACK) message when it 775 has successfully committed those changes to its own stable 776 storage. 778 All of the other messages involve ancillary issues: 780 o Management of available IP addresses 782 The pool request (POOLREQ) is used by the secondary server to 783 request an allocation of IP addresses from the primary server. 784 The pool response (POOLRESP) is used by the primary server to 785 inform the secondary server how many IP addresses were allocated 786 to the secondary server as the result of the pool request. 788 o Synchronization of the binding databases between the servers 789 after they've been out of communications 791 The update request (UPDREQ) message is used by one server to 792 request that its partner send it all binding database informa- 793 tion that it has not already seen. The update request all 794 (UPDREQALL) message is used by one server to request that all 795 binding database information be sent in order to recover from a 796 total loss of its binding database by the requesting server. 797 The update done (UPDDONE) message is used by the responding 798 server to indicate that all requested updates have been sent the 799 responding server and acked by the requesting server. 801 o Connection establishment 803 The connect (CONNECT) message is used by the primary server to 804 establish a high level connection with the other server, and to 805 transmit several important configuration data items between the 806 servers. The connect acknowledgement message (CONNECTACK) is 807 used by the secondary server to respond to a CONNECT message 808 from the primary server. The disconnect (DISCONNECT) message is 809 used by either server when closing a connection. 811 o Server synchronization 813 The state change (STATE) message is used by either server to 814 inform the other server of a change of failover state. 816 o Connection integrity management 818 The contact (CONTACT) message is used by either server to ensure 819 that the other server continues to see the connection as opera- 820 tional. It MUST be transmitted periodically over every esta- 821 blished connection if other message traffic is not flowing, and 822 it MAY be sent at any time. 824 5.1.1. Failover endpoints 826 The proper operation of the failover protocol requires more than the 827 transmission of messages between one server and the other. Each end- 828 point might seem to be a single DHCP server, but in fact there are 829 many situations where additional flexibility in configuration is use- 830 ful. 832 For instance, there might be several servers which are each primary 833 for a distinct set of address pools, and one server which is secon- 834 dary for all of those address pools. The situation with the pri- 835 maries is straightforward, but the secondary will need to maintain a 836 separate failover state, partner state, and communications up/down 837 status for each of the separate primary servers for which it is act- 838 ing as a secondary. 840 The failover protocol calls for there to be a unique failover end- 841 point per partner per role (where role is primary or secondary). 842 This failover endpoint can take actions and hold unique states. 843 There are thus a maximum of two failover endpoints per partner (one 844 for the partner as a primary and one for that same partner as a 845 secondary.) 847 Thus, in the case where there are two primary servers A and B each 848 backed up by a single common secondary server C, there is one fail- 849 over endpoint on each of A and B, and two different failover end- 850 points on C. The two different failover endpoints on C each have 851 unique states and independent TCP connections. 853 This document describes the behavior of the protocol in terms of pri- 854 mary and secondary servers, not primary and secondary failover end- 855 points. However, it is important to remember that every 'server' 856 described in this document is in reality a failover endpoint that 857 resides in a particular process, and that many failover endpoints may 858 reside in the same process. 860 It is not the case that there is a unique failover endpoint for each 861 subnet that participates in a failover relationship. On one server, 862 there is one failover endpoint per partner per role, regardless of 863 how many subnets or address pools are managed by that combination of 864 partner and role. Conversely, on a particular server, any given sub- 865 net or pool will be associated with exactly one failover endpoint. 867 When a connection is received from the partner, the unique failover 868 endpoint to which the message is directed is determined solely by the 869 IP address of the partner and the setting of the SECONDARY bit in the 870 'flags' field of the CONTACT message. 872 Throughout this document, the states and actions taken by "servers" 873 are described. The terms "server", "primary server", and "secondary 874 server" are commonly used to described the failover endpoint taking 875 these states and performing these actions. This description is 876 wholly accurate only for the simplest of cases, where all of the 877 address pools on one server are backed up by all of the address pools 878 on another server. In this case, there is single failover endpoint 879 in each server. In all other cases, the term "server" is used to 880 describe one of the two possible failover endpoints per partner. 882 5.2. Fundamental restrictions 884 There a several fundamental restrictions this protocol places on what 885 one server can do in the absence of knowledge of the other server, 886 and these restrictions are key to the correct operation of the proto- 887 col. 889 5.2.1. Control of lease time 891 The key problem with lazy update is that when the a server fails 892 after updating a client with a particular lease time and before 893 updating its partner, the partner will believe that a lease has 894 expired even though the client still retains a valid lease on that IP 895 address. 897 In order to handle this problem, a period of time known as the "Max- 898 imum Client Lead Time" (MCLT) is defined and must be known to both 899 the primary and secondary servers. Proper use of this time interval 900 places an upper bound on the difference allowed between the lease 901 time provided to a DHCP client by a server and the lease time known 902 by that server's partner. However, the MCLT is typically much less 903 than the lease time that a server has been configured to offer a 904 client, and so some strategy must exist to allow a server to offer 905 the configured lease time to a client. During a lazy update the 906 updating server typically updates its partner with a potential 907 expiration time which is longer than the lease time previously given 908 to the client and which is longer than the lease time that the server 909 has been configured to give a client. This allows that server to 910 give a longer lease time to the client the next time the client 911 renews its lease, since the time that it will give to the client will 912 not exceed the MCLT beyond the potential expiration time acknowledged 913 by the partner. 915 The PARTNER-DOWN state exists so that a server can be sure that its 916 partner is, indeed, down. Correct operation while in that state 917 requires (generally) that the server wait the MCLT after anything 918 that happened prior to its transition into PARTNER-DOWN state (or, 919 more accurately, when the other server went down if that is known). 920 Thus, the server MUST wait the Maximum Client Lead Time after the 921 partner server went down before allocating any of the partner's FREE 922 addresses. In the event the partner was not in communication prior 923 to going down, it might have allocated one or more of its FREE 924 addresses to a DHCP client and been unable to inform the server 925 entering PARTNER-DOWN prior to going down itself. By waiting the 926 MCLT after the time the partner went down, the server in PARTNER-DOWN 927 state ensures that any clients which have a lease on one of the 928 partner's FREE addresses will either time out or contact the server 929 in PARTNER-DOWN by the time that period ends. 931 In addition, once a server has transitioned to PARTNER-DOWN state, it 932 MUST NOT reallocate an IP address from one client to another client 933 until an additional MCLT interval after the lease by the original 934 client expires. (Actually, until the maximum client lead time after 935 what it believes to be the lease expiration time of the first 936 client.) 938 Some optimizations exist for this restriction, in that it only 939 applies to leases that were issued BEFORE entering PARTNER-DOWN. Once 940 a server has entered PARTNER-DOWN and it leases out an address, it 941 need not wait this time as long as it has never communicated with the 942 partner since the lease was given out. 944 The fundamental relationship on which much of the correctness of this 945 protocol depends is that the lease expiration time known to a DHCP 946 client MUST NOT be more than the maximum client lead time greater 947 than the potential expiration time known to a server's partner. 949 The remainder of this section makes the above fundamental relation- 950 ship more explicit. 952 This protocol requires a DHCP server to deal with several different 953 lease intervals and places specific restrictions on their relation- 954 ships. The purpose of these restrictions is to allow the other server 955 in the pair to be able to make certain assumptions in the absence of 956 an ability to communicate between servers. 958 The different lease times are: 960 o desired lease interval 962 The desired lease interval is the lease interval that a DHCP 963 server would like to give to a DHCP client in the absence of any 964 restrictions imposed by the Failover protocol. Its determina- 965 tion is outside of the scope of this protocol. Typically this is 966 the result of external configuration of a DHCP server. 968 o actual lease interval 970 The actual lease internal is the lease interval that a DHCP 971 server gives out to a DHCP client in the dhcp-lease-time option 972 of a DHCPACK packet. It may be shorter than the desired client 973 lease interval (as explained below). 975 o potential lease interval 977 The potential lease interval is the lease expiration interval 978 the local server tells to its partner in the potential- 979 expiration-time option of a BNDUPD message. 981 o acknowledged potential lease interval 983 The acknowledged potential lease interval is the potential lease 984 interval the partner server has most recently acknowledged in 985 the potential-expiration-time option of a BNDACK message. 987 The key restriction (and guarantee) that any server makes with 988 respect to lease intervals is that the actual client lease interval 989 never exceeds the acknowledged potential lease interval (if any) by 990 more than a fixed amount. This fixed amount is called the "Maximum 991 Client Lead Time" (MCLT). 993 The MCLT MAY be configurable on the primary server, but for correct 994 server operation it MUST be the same and known to both the primary 995 and secondary servers. The secondary server determines the MCLT from 996 the MCLT option sent from the primary server to the secondary server 997 in the CONNECT message. 999 A server MUST record in its stable storage both the actual lease 1000 interval and the most recently acknowledged potential lease interval 1001 for each IP address binding. It is assumed that the desired client 1002 lease interval can be determined through techniques outside of the 1003 scope of this protocol. See section 7.1.4 for more details concern- 1004 ing the times that the server MUST record in its stable storage and 1005 the way that they interact with the lease time that may be offered to 1006 a DHCP client. 1008 Again, the fundamental relationship among these times which MUST be 1009 maintained is: 1011 actual lease interval < 1012 ( acknowledged potential lease interval + MCLT ) 1014 Figure 5.1-1 illustrates a initial lease to a client using the rules 1015 discussed in the example which follows it. 1017 DHCP Primary Secondary 1018 time Client Server Server 1020 | (time in intervals) | (absolute time) | 1021 | | | 1022 | >-DHCPDISCOVER-> | | 1023 | <---DHCPOFFER-< | | 1024 | | | 1025 | >-DHCPREQUEST-> | | 1026 | (selecting) | | 1027 | | | 1028 t | <--------DHCPACK-< | | 1029 | lease-time=MCLT | | 1030 | | >-BNDUPD--> | 1031 | | lease-expiration=t+MCLT 1032 | | potential-expiration=t+(MCLT/2)+X 1033 | | | 1034 | | <-BNDACK-< | 1035 | | potential-expiration=t+(MCLT/2)+X 1036 ... ... ... 1037 | | | 1038 t+MCLT/2 | >-DHCPREQUEST-> | | 1039 | (renew) | | 1040 | | | 1041 t1 | <--------DHCPACK-< | | 1042 | lease-time=X | | 1043 | | >-BNDUPD--> | 1044 | | lease-expiration=t1+X 1045 | | potential-expiration=t1+(X/2)+X 1046 | | | 1047 | | <-BNDACK-< | 1048 | | potential-expiration=t1+(X/2)+X 1049 ... ... ... 1051 Figure 5.1-1: Lazy Update Message Traffic 1052 X = Desired Lease Interval 1054 DISCUSSION: 1056 This protocol mandates no algorithm concerning these lease inter- 1057 vals, as long as above fundamental relationship is preserved. 1059 In the interests of clarity, however, let's examine a specific 1060 example. The MCLT in this case is 1 hour. The desired lease 1061 interval is 3 days, and its renewal time is half the lease inter- 1062 val. 1064 The rules for this example are: 1066 o What to tell the client: 1068 Take the remainder of the acknowledged potential lease interval. 1069 If this is a new lease, then this value will be zero. If this 1070 remainder plus the MCLT is greater than the desired lease inter- 1071 val, give the client the desired lease interval else give the 1072 client the remainder plus the MCLT. 1074 o What to tell the failover partner server: 1076 Take the renewal interval (typically half of the actual client 1077 lease interval), add to it the desired lease interval, and add 1078 it to the current time to yield the value that goes into the 1079 potential-expiration-time option. 1081 Also tell the failover partner the actual lease interval by 1082 adding it to the current time to yield the value that goes into 1083 the lease-expiration option. 1085 In operation this might work as follows: 1087 When a server makes an offer for a new lease on an IP address to a 1088 DHCP client, it determines the desired lease interval (in this 1089 case, 3 days). It then examines the acknowledged potential lease 1090 interval (which in this case is zero) and determines the remainder 1091 of the time left to run, which is also zero. To this it adds the 1092 MCLT. Since the actual lease interval cannot be allowed to exceed 1093 the remainder of the current acknowledged potential lease interval 1094 plus the MCLT, the offer made to the client is for the remainder 1095 of the current acknowledged potential lease interval (i.e., zero) 1096 plus the MCLT. Thus, the actual lease interval is 1 hour. 1098 Once the server has performed the ACK to the DHCP client, it will 1099 update the secondary server with the lease information. However, 1100 the desired potential lease interval will be composed of the one 1101 half of the current actual lease interval added to the desired 1102 lease interval. Thus, the secondary server is updated with a 1103 BNDUPD with a lease interval of 3 days + 1/2 hour specified in the 1104 potential-expiration-time option. 1106 When the primary server receives an ACK to its update of the 1107 secondary server's (partner's) potential lease interval, it 1108 records that as the acknowledged potential lease interval. A 1109 server MUST NOT send a BNDACK in response to a BNDUPD message 1110 until it is sure that the information in the BNDUPD message 1111 resides in its stable storage. Thus, the primary server in this 1112 case can be sure that the secondary server has recorded the poten- 1113 tial lease interval in its stable storage when the primary server 1114 receives a BNDACK message from the secondary server. 1116 When the DHCP client attempts to renew at T1 (approximately one 1117 half an hour from the start of the lease), the primary server 1118 again determines the desired lease interval, which is still 3 1119 days. It then compares this with the remaining acknowledged 1120 potential lease interval (3 days + 1/2 hour) and adjusts for the 1121 time passed since the secondary was last updated (1/2 hour). Thus 1122 the time remaining of the acknowledged potential lease interval is 1123 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which 1124 is more than the desired lease interval of 3 days. So the client 1125 is renewed for the desired lease interval -- 3 days. 1127 When the primary DHCP server updates the secondary DHCP server 1128 after the DHCP client's renewal ACK is complete, it will calculate 1129 the desired potential lease interval as the T1 fraction of the 1130 actual client lease interval (1/2 of 3 days this time = 1.5 days). 1131 To this it will add the desired client lease interval of 3 days, 1132 yielding a total desired partner server lease interval of 4.5 1133 days. In this way, the primary attempts to have the secondary 1134 always "lead" the client in its understanding of the client's 1135 lease interval so as to be able to always offer the client the 1136 desired client lease interval. 1138 Once the initial actual client lease interval of the MCLT is past, 1139 the protocol operates effectively like the DHCP protocol does 1140 today in its behavior concerning lease intervals. However, the 1141 guarantee that the actual client lease interval will never exceed 1142 the remaining acknowledged partner server lease interval by more 1143 than the MCLT allows full recovery from a variety of failures. 1145 5.2.2. Controlled re-allocation of IP addresses 1147 When in PARTNER-DOWN state there is a waiting period after which an 1148 IP address can be re-allocated to another client. For leases which 1149 are available when the server enters PARTNER-DOWN state, the period 1150 is the MCLT from entry into PARTNER-DOWN state. For IP addresses 1151 which are not available when the server enters PARTNER-DOWN state, 1152 the period is the MCLT after the lease becomes available. See sec- 1153 tion 9.4.2 for more details. 1155 In any other state, a server cannot reallocate an address from one 1156 client to another without first notifying its partner (through a 1157 BNDUPD message) and receiving acknowledgement (through a BNDACK mes- 1158 sage) that its partner is aware that that first client is not using 1159 the address. 1161 This could be modeled in the following way. Though this specific 1162 implementation is in no way required, it may serve to better illus- 1163 trate the concept. 1165 An "available" IP address on a server may be allocated to any client. 1166 An IP address which was leased to a client and which expired or was 1167 released by that client would take on a new state, EXPIRED or 1168 RELEASED respectively. The partner server would then be notified 1169 that this IP address was EXPIRED or RELEASED through a BNDUPD. When 1170 the sending server received the BNDACK for that IP address showing it 1171 was FREE, it would move the IP address from EXPIRED or RELEASED to 1172 FREE, and it would be available for allocation by the primary server 1173 to any clients. 1175 A server MAY reallocate an IP address in the EXPIRED or RELEASED 1176 state to the same client with no restrictions. 1178 5.3. Load balancing 1180 In order to implement load balancing between a primary and secondary 1181 server pair, each server must respond to DHCPDISCOVER requests from 1182 some clients and not from other clients. In order to do this suc- 1183 cessfully, each server must be able to determine immediately upon 1184 receipt of a DHCP client request whether it is to service this 1185 request or to ignore it in order to allow the other server to service 1186 the request. 1188 In addition, it should be possible to configure the percentage of 1189 clients which will be serviced by either the primary or secondary 1190 server. This configuration should be more or less continuous, from 1191 all serviced by the primary through an even split with half serviced 1192 by each, to all serviced by the secondary. 1194 The technique chosen to support these goals is described in [LOADB]. 1195 When using the load balancing algorithm in [LOADB] among two servers 1196 implementing the failover protocol, both servers MUST use the same 1197 information from the DHCP client packet as the Request ID for the 1198 load balancing algorithm. Both servers MUST use the dhcp-client- 1199 identifier (if it appears), and the client-hardware-address if the 1200 dhcp-client-identifier does not. The client-hardware-address is con- 1201 structed from the htype and chaddr fields of the DHCP client request 1202 in the same manner as described for creation of the client-hardware- 1203 address option in section 6.2. 1205 A bitmap-style Hash Bucket Assignment (as described in section 5.2 of 1206 [LOADB]) is sent by the primary server to the secondary server when- 1207 ever a connection is established, using the hash-bucket-assignment 1208 option defined in section 6.2. This Hash Bucket Assignment is used 1209 by the secondary server to decide which packets to process when in 1210 NORMAL state. 1212 The way in which either primary or secondary servers determine the 1213 hash bucket assignment for it to use when in other than NORMAL state 1214 is outside of the scope of this document. Note, however, that the 1215 primary and secondary servers MUST use identical hash bucket assign- 1216 ments when not in NORMAL state. This common hash bucket assignment 1217 MAY be for all of the hash buckets, indicating that there is no other 1218 DHCP server sharing the load with this failover pair, or it MAY be 1219 for a subset of the hash buckets, which would indicate that there 1220 exists another server or server pair with which this DHCP server pair 1221 is sharing the load. 1223 5.4. Operating in NORMAL state 1225 When in NORMAL state, each server services DHCPDISCOVER's and all 1226 other DHCP requests other than DHCPREQUEST/RENEWAL or 1227 DHCPREQUEST/REBINDING from the client set defined by the load balanc- 1228 ing algorithm. Each server services DHCPREQUEST/RENEWAL or 1229 DHCPDISCOVER/REBINDING requests from any client. 1231 In general, whenever the binding database is changed in stable 1232 storage, then a BNDUPD message is sent with the contents of that 1233 change to the partner server. The partner server then writes the 1234 information about that binding in its bindings database in stable 1235 storage and replies with a BNDACK message. 1237 5.5. Operating in COMMUNICATIONS-INTERRUPTED state 1239 When operating in COMMUNICATIONS-INTERRUPTED state, each server is 1240 operating independently, but does not assume that its partner is not 1241 operating. The partner server might be operating and simply unable 1242 to communicate with this server, or might not be operating. 1244 Each server responds to the full range of DHCP client messages that 1245 it receives, but in such a way that graceful reintegration is always 1246 possible when its partner comes back into contact with it. 1248 5.6. Operating in PARTNER-DOWN state 1250 When operating in PARTNER-DOWN state, a server assumes that its 1251 partner is not currently operating, but does make allowances for the 1252 possibility that that server was operating in the past, though possi- 1253 bly out of communications with this server. It responds to all DHCP 1254 client requests in PARTNER-DOWN state. 1256 5.7. Operating in RECOVER state 1258 A server operating in RECOVER state assumes that it is reintegrating 1259 with a server that has been operating in PARTNER-DOWN state, and that 1260 it needs to update its bindings database before it services DHCP 1261 client requests. 1263 A server may also operate in RECOVER state in order to fully recover 1264 its bindings database from its partner server. 1266 5.8. Operating in STARTUP state 1268 A server operating in STARTUP state assumes that failover is opera- 1269 tional, and it spends a short time whenever it comes up attempting to 1270 contact the partner. During this time (generally a few seconds), the 1271 server is unresponsive to DHCP client requests. This period exists 1272 in order to give a server a chance to determine that its partner has 1273 changed state since it was last in communications, and to react to 1274 that changed state (if any) prior to responding to DHCP client 1275 requests. 1277 The period of time a server remains in STARTUP state SHOULD be long 1278 enough to ensure that it will connect to the other server if that 1279 server is available for connections. 1281 5.9. Time synchronization between servers 1283 The failover protocol is designed to operate between two servers 1284 which have time values which differ by an arbitrarily large amount. 1285 A particular implementation MAY choose to only support servers whose 1286 time values differ by an arbitrarily small amount. 1288 In any event, whether large or only small differences in time values 1289 are supported, every message that is received MUST be tagged with a 1290 time value as soon as possible after receipt. This time value is 1291 used along with the time value that is sent in every message between 1292 the failover partners to develop a delta time between the servers. 1293 This delta time is used during the connection process to establish a 1294 baseline delta time between the servers, and upon receipt of each 1295 message, the delta time for that message is used to refine the delta 1296 time for the server pair. 1298 While the algorithm for this refinement of delta time is not speci- 1299 fied as part of this protocol, a server SHOULD allow the delta time 1300 value for a pair of failover servers to be periodically updated to 1301 account for time drift. In addition, the delta time value between 1302 servers SHOULD be smoothed in some fashion, so that transient network 1303 delays will not cause it to vary wildly. 1305 A server SHOULD recognize a drastic change in the delta time value as 1306 an event to be signaled to a network administrator. 1308 5.10. IP address binding-status 1310 In most DHCP servers an IP address can take on several different 1311 binding-status values, sometimes also called states. While no two 1312 DHCP servers probably have exactly the same possible binding-status 1313 values the DHCP RFC enforces some commonality among the general 1314 semantics of the binding-status values used by various DHCP server 1315 implementations. 1317 In order to transmit binding database updates between one server and 1318 another using the failover protocol, some common denominator 1319 binding-status values must be defined. It is not expected that these 1320 binding-status-values correspond with any actual implementation of 1321 the DHCP protocol in a DHCP server, but rather that the binding- 1322 status values defined in this document should be a superset of most 1323 if not all DHCP server implementations. It is a goal of this proto- 1324 col that any DHCP server can map the various IP address binding- 1325 status values that it uses internally into these failover IP address 1326 binding-status values on transmission of binding database updates to 1327 its partner, and likewise that it can map any failover IP address 1328 binding-status values into its internal IP address binding-status 1329 values upon receipt of a binding database update. 1331 The IP address binding-status values defined for the failover proto- 1332 col are: 1334 o FREE 1336 Lease may be allocated to any DHCP client. 1338 o ACTIVE 1340 Lease is assigned to a client. It MUST have client information 1341 associated with it. 1343 o EXPIRED 1345 Lease has expired. It may be allocated to the same client. 1347 o RELEASED 1349 Lease has been released by client. It may be allocated to the 1350 same client. 1352 o ABANDONED 1353 A server, or client flagged address as unusable. 1355 o RESET 1357 Lease was freed by some external agent. 1359 o BACKUP 1361 Lease belongs to secondary's private address pool. 1363 These binding-status values are communicated from one failover 1364 partner to another using the binding-status option, see section 6.2 1365 for details of this option. Unless otherwise noted above there MAY 1366 be client information associated with each of these binding-status 1367 values. 1369 Again, note that a DHCP server implementing the failover protocol 1370 does not have to implement either this state machine or use these 1371 particular binding-status values in its normal operation of allocat- 1372 ing IP addresses to DHCP clients. It only needs to map its internal 1373 binding-status-values onto these "standard" binding-status values, 1374 and map these "standard" binding-status values back into its internal 1375 binding-status values. In particular, a server which implements a 1376 grace period for a IP address binding SHOULD simply wait to update 1377 its partner server until the grace period on that binding has run 1378 out. 1380 The process of setting an IP address to FREE deserves some detailed 1381 discussion. When an IP address is moved to the EXPIRED,RELEASED, or 1382 RESET binding-status on a server, it will send a BNDUPD with the 1383 binding-status of EXPIRED, RELEASED, or RESET to its partner. If its 1384 partner agrees that is acceptable (see sections 7.1.2 and 7.13 con- 1385 cerning why a server might not accept a BNDUPD) it will return a 1386 BNDACK with no reject-reason, signifying that it accepted the update. 1387 As part of the BNDUPD processing, the server returning the BNDACK 1388 will set the binding-status of the IP address to FREE, and upon 1389 receipt of the BNDACK the server which sent the BNDUPD will set the 1390 binding-status of the IP address to FREE. Thus, the EXPIRED, 1391 RELEASED, or RESET binding-status is something of a transitory state. 1392 This process is encoded in the transition diagram below by "Comm 1393 w/Partner". 1395 An IP address will move between these lease binding-status values 1396 using the following state transition diagram: 1398 DHCP client DECLINE or 1399 server detected problem 1400 from any state 1401 +----------+ V +---------+ 1402 External >---->| RESET | | |ABANDONED| 1403 command | | +-->| | 1404 +----------+ +---------+ 1405 | 1406 Comm w/Parter 1407 V 1408 +---------+ Comm +----------+ Comm +---------+ 1409 | EXPIRED |--------->| FREE |<----------| RELEASED| 1410 | | w/Parter | | w/Partner | | 1411 +---------+ +----------+ +---------+ 1412 ^ ^ | | ^ 1413 | Exp. grace IP address IP addr alloc. | 1414 | period ends leased by to secondary | 1415 | | primary V | 1416 | | | +----------+ | 1417 | | | | BACKUP | | 1418 | wait for | | | | 1419 | grace period | +----------+ | 1420 | | | | | 1421 | | | IP addr leased by | 1422 | Expired grace | secondary | 1423 | period exists V V | 1424 | | +----------+ | 1425 | | Lease on | ACTIVE | DHCPRELEASE | 1426 +-----+-IP addr---| |------------------+ 1427 expires +----------+ 1429 Figure 5.10-1: Transitions between binding-status values. 1431 If a server receives a binding-status that it doesn't implement 1432 internally, it should do something reasonable. A server which doesn't 1433 support an ABANDONED binding-status could set the IP address ACTIVE 1434 and belonging to a client which will never be seen in a DHCP request. 1436 5.10.1. IP address binding-status changes from BNDUPD messages 1438 IP addresses undergo binding status changes for several reasons, 1439 including receipt and processing of DHCP client requests, administra- 1440 tive inputs and receipt of BNDUPD messages. Every DHCP server needs 1441 to respond to DHCP client request and administrative inputs with 1442 changes to its internal record of the binding-status of an IP 1443 address, and this response is not in the scope of the failover proto- 1444 col. However, the receipt of BNDUPD messages implies at least a pos- 1445 sible change of the binding-status for an IP address, and must be 1446 discussed here. See section 7.1.2 for general actions to take upon 1447 receipt of a BNDUPD message. 1449 When receiving a BNDUPD message, it is important to note that it may 1450 not be current, in that the server receiving the BNDUPD message may 1451 have had a more recent interaction with the DHCP client than its 1452 partner who sent the BNDUPD message. In this case, the receiving 1453 server MUST reject the BNDUPD message. In addition, it is worth not- 1454 ing that two (and possibly three) binding-status values are the 1455 direct result of interaction with a DHCP client, ACTIVE and RELEASED 1456 (and possibly ABANDONED). All other binding-status values are either 1457 the result of the expiration of a time period or interaction with an 1458 external agency (e.g., a network admistrator). 1460 Every BNDUPD message SHOULD contain a client-last-transaction-time 1461 option, which MUST, if it appears, be the time that the server last 1462 interacted with the DHCP client. It MUST NOT be, for instance, the 1463 time that the lease on an IP address expired. If there has been no 1464 interaction with the DHCP client in question (or there is no DHCP 1465 client presently associated with this IP address), then there will be 1466 no client-last-transaction-time option in the BNDUPD message. 1468 The following list is indexed by the binding-status that a server 1469 receives in a BNDUPD message. In many cases, the binding-status of 1470 an IP address within the receiving server's data storage will have an 1471 affect upon the checks performed prior to accepting the new binding- 1472 status in a BNDUPD message. 1474 In the following list, to "accept" a BNDUPD means to update the 1475 server's bindings database with the information contained in the 1476 BNDUPD and once that update is complete, send a BNDACK message 1477 corresponding to the BNDUPD message. To "reject" a BNDUPD means to 1478 respond to the BNDUPD with a BNDACK with a reject-reason option 1479 included.. 1481 When interpreting the rules in the following list, if a BNDUPD 1482 doesn't have a client-last-transaction-time value, then it MUST NOT 1483 be considered later than the client-last-transaction-time in the 1484 receiving server's binding. If the BNDUPD contains a client-last- 1485 transaction-time value and the receiving server's binding does not, 1486 then the client-last-transaction-time value in the BNDUPD MUST be 1487 considered later than the server's. 1489 The second rule concerns clients and IP addresses. If the client in 1490 a BNDUPD message the client in a receiving server's binding both 1491 exist and if they differ, then if the receiving server's binding- 1492 status is ACTIVE and the binding-status in the BNDUPD is ACTIVE, then 1493 if the receiving server is a secondary server accept it, else reject 1494 it. 1496 Otherwise, look up the binding-status in the BNDUPD in this list: 1498 o ACTIVE in BNDUPD 1500 If the receiving server's binding-status is ACTIVE, FREE, or 1501 BACKUP, then accept it. 1503 If the receiving server's binding-status is ABANDONED or RESET, 1504 then reject it. 1506 If the receiving server's binding status is RELEASED, EXPIRED, 1507 then if the client-last-transaction-time in the BNDUPD is later 1508 than the client-last-transaction-time in the receiving server's 1509 binding, accept it, else reject it. 1511 o EXPIRED in BNDUPD 1513 If the receiving server's binding-status is ACTIVE, then current 1514 time is later than the receiving server's lease-expiration-time, 1515 accept it, else reject it. 1517 If the receiving server's binding-status is ABANDONED or RESET, 1518 reject it. 1520 If the receiving server's binding-status is FREE or BACKUP, 1521 accept it. 1523 If the receiving server's binding-status is RELEASED, then if 1524 the client-last-transaction-time is greater in the BNDUPD than 1525 in the receiving server's binding, then accept it, else reject 1526 it. 1528 o RELEASED in BNDUPD 1530 If the receiving server's binding-status is ACTIVE, then if the 1531 client-last-transaction-time is greater than the client-last- 1532 transaction-time in the receiving server's binding, accept it, 1533 else reject it. 1535 If the receiving server's binding-status is RELEASED, FREE or 1536 BACKUP, accept it. 1538 If the receiving server's binding-status is ABANDONED or RESET, 1539 reject it. 1541 o FREE or BACKUP in BNDUPD 1543 If the receiving server's binding-status is ACTIVE and the 1544 current time is later than the lease-expiration-time accept it, 1545 else reject it. 1547 If the receiving server's binding-status is ABANDONED, reject 1548 it. 1550 If the receiving server's binding-status is FREE or BACKUP or 1551 RESET, accept it. 1553 o RESET or ABANDONDED in BNDUPD 1555 Accept the new binding-status under all circumstances. 1557 5.11. DNS dynamic update considerations 1559 DHCP servers (and clients) can use DNS Dynamic Updates as described 1560 in [RFC2136] to maintain DNS name-mappings as they maintain DHCP 1561 leases. Many different administrative models for DHCP-DNS integra- 1562 tion are possible. Descriptions of several of these models, and 1563 guidelines that DHCP servers and clients should follow in carrying 1564 them out, are laid out in [DDNS]. The nature of the DHCP failover 1565 protocol introduces some issues concerning dynamic DNS updates that 1566 are not part of non-failover DHCP environments. This section 1567 describes these issues, and defines the information which failover 1568 partners should exchange and the protocol which they should follow in 1569 order to ensure consistent behavior. The presence of this section 1570 should not be interpreted as requiring that implementations of the 1571 DHCP failover protocol must also support DDNS updates. The purpose 1572 of this discussion is to clarify the areas where the DHCP failover 1573 and DHCP-DDNS protocols intersect for the benefit of implementations 1574 which support both protocols, not to introduce a new requirement into 1575 the DHCP failover protocol. Thus, a DHCP server which implements the 1576 failover protocol MAY also support dynamic DNS updates, but if it 1577 does support dynamic DNS updates it SHOULD utilize the techniques 1578 described here in order to correctly distribute them between the 1579 failover partners. 1581 5.11.1. Relationship between failover and dynamic DNS update 1583 The failover protocol describes the conditions under which each fail- 1584 over server may renew a lease to its current DHCP client, and 1585 describes the conditions under which it may grant a lease to a new 1586 DHCP client. An analogous set of conditions determines when a fail- 1587 over server should initiate a DDNS update, and when it should attempt 1588 to remove records from the DNS. The failover protocol's conditions 1589 are based on the desired external behavior: avoiding duplicate 1590 address assignments; allowing clients to continue using leases which 1591 they obtained from one failover partner even if they can only commun- 1592 icate with the other partner; allowing the backup DHCP server to 1593 grant new leases even if it is unable to communicate with the primary 1594 server. The desired external DDNS behavior for DHCP failover servers 1595 is: 1597 1. Allow timely DDNS updates from the server which grants a 1598 client a lease. Recognize that there is often a DDNS update 1599 lifecycle which parallels the DHCP lease lifecycle. This is 1600 likely to include the addition of records when the lease is 1601 granted, and the removal of DNS records when the lease is sub- 1602 sequently made available for allocation to a different client. 1604 2. Communicate enough information between the two failover 1605 servers to allow one to complete the DDNS update 'lifecycle' 1606 even if the other server originally granted the lease. 1608 3. Avoid redundant or overlapping DDNS updates, where both fail- 1609 over servers are attempting to perform DDNS updates for the 1610 same lease-client binding. Avoid situations where one partner 1611 is attempting to add RRs related to a lease binding while the 1612 other partner is attempting to remove RRs related to the same 1613 lease binding. 1615 5.11.2. Use of the DDNS option 1617 In order for either server to be able to complete a DDNS update, or 1618 to remove DNS records which were added by its partner, both servers 1619 need to know the FQDN associated with the lease-client binding. The 1620 FQDN associated with the client's A RR and PTR RR SHOULD be communi- 1621 cated from the server which adds records into the DNS to its partner. 1622 The initiating server SHOULD use the DDNS option in the BNDUPD mes- 1623 sages to inform the partner server of the status of any DDNS updates 1624 associated with a lease binding. Failover servers MAY choose not to 1625 include the DDNS option in BNDUPD messages if there has been no 1626 change in the status of any DDNS update related to the lease binding. 1627 The partner server receiving BNDUPD messages containing the ddn 1628 option SHOULD compare the status flags and the FQDN contained in the 1629 option data with the current DDNS information it has associated with 1630 the lease binding, and update its notion of the DDNS status accord- 1631 ingly. 1633 The initiating server MAY send a BNDUPD to its partner before the 1634 DDNS update has been successfully completed. If it does so, it SHOULD 1635 leave the 'C' bit in the Flags field clear, to indicate to the 1636 partner that the DDNS update may not be complete. When the DDNS 1637 update has been successfully acknowledged by the DNS server, the ini- 1638 tiating DHCP server SHOULD include the DDNS option in its next BNDUPD 1639 message about the binding, so that the partner server will be able to 1640 record the final status of the DDNS update. The initiating server 1641 SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc- 1642 cessfully accepted by the DNS server. 1644 Some implementations will choose to send a BNDUPD without waiting for 1645 the DDNS update to complete, and then will send a second BNDUPD once 1646 the DDNS update is complete. Other implementations will delay sending 1647 the partner a BNDUPD until the DDNS update has been acknowledged by 1648 the DNS server, or until some time-limit has elapsed, in order to 1649 avoid sending a second BNDUPD. 1651 The Domain Name field in the DDNS option contains the FQDN that will 1652 be associated with the A RR (if the server is performing an A RR 1653 update for the client) and the PTR RR. This FQDN may be composed in 1654 any of several ways, depending on server configuration and the infor- 1655 mation provided by the client in its DHCP messages. The client may 1656 supply a hostname which it would like the server to use in forming 1657 the FQDN, or it may supply the entire FQDN. The server may be config- 1658 ured to attempt to use the information the client supplies, it may be 1659 configured with an FQDN to use for the client, or it may be config- 1660 ured to synthesize an FQDN. The responsive server SHOULD include the 1661 FQDN that it will be using in DDNS updates it initiates when it sends 1662 the DDNS option. 1664 Since the responsive server may not have completed the DDNS update at 1665 the time it sends the first BNDUPD about the lease binding, there may 1666 be cases where the FQDN in later BNDUPD messages does not match the 1667 FQDN included in earlier messages. For example, the responsive server 1668 may be configured to handle situations where two or more DHCP client 1669 FQDNs are identical by modifying the most-specific label in the FQDNs 1670 of some of the clients in an attempt to generate unique FQDNs for 1671 them. Alternatively, at sites which use some or all of the informa- 1672 tion which clients supply to form the FQDN, it's possible that a 1673 client's configuration may be changed so that it begins to supply new 1674 data. The responsive server may react by removing the DNS records 1675 which it originally added for the client, and replacing them with 1676 records that refer to the client's new FQDN. In such cases, the 1677 responsive server SHOULD include the actual FQDN that was used in 1678 subsequent DDNS options. The responsive server SHOULD include 1679 relevant client-option data in the client-request-options option in 1680 its BNDUPD messages. This information may be necessary in order to 1681 allow the non-responsive partner to detect client configuration 1682 changes that change the hostname or FQDN data which the client 1683 includes in its DHCP requests. 1685 5.11.3. Adding RRs to the DNS 1687 A failover server which is going to perform DDNS updates SHOULD ini- 1688 tiate the DDNS update when it grants a new lease to a client. The 1689 non-responsive partner SHOULD NOT initiate a DDNS update when it 1690 receives the BNDUPD after the lease has been granted. The failover 1691 protocol ensures that only one of the partners will grant a lease to 1692 any individual client, so it follows that this requirement will 1693 prevent both partners from initiating updates simultaneously. The 1694 server initiating the update SHOULD follow the protocol in [DDNS]. 1695 The server may be configured to perform an A RR update on behalf of 1696 its clients, or not. Ordinarily, a failover server will not initiate 1697 DDNS updates when it renews leases. In two cases, however, a failover 1698 server MAY initiate a DDNS update when it renews a lease to its 1699 existing client: 1701 1. When the lease was granted before the server was configured to 1702 perform DDNS updates, the server MAY be configured to perform 1703 updates when it next renews existing leases. Since both 1704 servers are responsive to renewals in NORMAL state, it is not 1705 enough to simply require the non-responsive server to avoid a 1706 DNS update in this case. The server which would be responsive 1707 to a DHCPDISCOVER from this client (even though the current 1708 request is a DHCPREQUEST/RENEW) is the server which should 1709 initiate the DDNS update. 1711 2. If a server is in PARTNER-DOWN state, it can conclude that its 1712 partner is no longer attempting to perform an update for the 1713 existing client. If the remaining server has not recorded that 1714 an update for the binding has been successfully completed, the 1715 server MAY initiate a DDNS update. It MAY initiate this 1716 update immediately upon entry to PARTNER-DOWN state, it may 1717 perform this in the background, or it MAY initiate this update 1718 upon next hearing from the DHCP client. 1720 5.11.4. Deleting RRs from the DNS 1722 The failover server which makes a lease FREE SHOULD initiate any DDNS 1723 deletes, if it has recorded that DNS records were added on behalf of 1724 the client. 1726 A server "makes a lease FREE" when it initiates a BNDUPD with a 1727 binding-status of FREE, EXPIRED, or RELEASED. Its partner confirms 1728 this status by acking that BNDUPD, and upon receipt of the ACK the 1729 server has "made the address FREE". It is at this point that it 1730 should initiate the DDNS operations to delete RRs from the DDNS. Its 1731 partner SHOULD NOT initiate DDNS deletes for DNS records related to 1732 the lease binding as part of sending the BNDACK message. The 1733 partner MAY have issued BNDUPD messages with a binding-status of 1734 FREE, EXPIRED, or RELEASED previously, but the other server will have 1735 NAKed these BNDUPD messages. 1737 The failover protocol ensures that only one of the two partner 1738 servers will be able to make a lease FREE. The server making the 1739 lease FREE may be doing so while it is in NORMAL communication with 1740 its partner, or it may be in PARTNER-DOWN state. If a server is in 1741 PARTNER-DOWN state, it may be performing DDNS deletes for RRs which 1742 its partner added originally. This allows a single remaining partner 1743 server to assume responsibility for all of the DDNS activity which 1744 the two servers were undertaking. 1746 Another implication of this approach is that no DDNS RR deletes will 1747 be performed while either server is in COMMUNICATIONS-INTERRUPTED 1748 state, since no IP addresses are moved into the FREE state during 1749 that period. 1751 5.12. Reservations and failover 1753 Some DHCP servers support a capability to offer specific pre- 1754 configured IP addresses to DHCP clients. These are real DHCP 1755 clients, they do the entire DHCP protocol, but these servers always 1756 offer the client a specific pre-configured IP address -- and they 1757 offer that IP address to no other clients. Such a capability has 1758 several names, but it is sometimes called a "reservation", in that 1759 the IP address is reserved for a particular DHCP client. 1761 In a situation where there are two DHCP server serving the same sub- 1762 net without using failover, the two DHCP server's need to have dis- 1763 joint IP address pools, but identical reservations for the DHCP 1764 clients. 1766 In a failover context, both servers need to be configured with the 1767 proper reservations in an identical manner, but if we stop there 1768 problems can occur around the edge conditions where reservations are 1769 made for an IP address that has already been leased to a different 1770 client. Different servers handle this conflict in different ways, 1771 but the goal of the failover protocol is to allow correct operation 1772 with any server's approach to the normal processing of the DHCP pro- 1773 tocol. 1775 The general solution with regards to reservations is as follows. 1776 Whenever a reserved IP address becomes FREE (i.e., when first config- 1777 ured or whenever a client frees it or it expires or is reset), the 1778 primary server MUST show that IP address as FREE (and thus available 1779 for its own allocation) and it MUST send it to the secondary server 1780 as BACKUP, in order that the secondary server be able to allocate it 1781 as well. 1783 5.13. Dynamic BOOTP and failover 1785 Some DHCP servers support a capability to offer IP addresses to BOOTP 1786 clients without having a particular address previously allocated for 1787 those clients. This capability is often called something like 1788 "dynamic BOOTP". It is not a capability explicitly discussed in 1789 either the DHCP or BOOTP RFC's, but rather a pragmatic capability 1790 which can work reasonably well for a small set of legacy BOOTP dev- 1791 ices. 1793 This capability has a negative interaction with the fundamental ele- 1794 ments of the failover protocol, in that an address handed out to a 1795 BOOTP device has no term (or effectively no term, in that usually 1796 they are considered leases for "forever"). There is no opportunity 1797 to hand out a lease which is only the MCLT long when first hearing 1798 from a BOOTP device, because they may only interact once with the 1799 DHCP server and they have no notion of a lease expiration time. Thus 1800 the entire concept of the MCLT and waiting the MCLT after entering 1801 PARTNER-DOWN state is broken when dealing with BOOTP devices. 1803 With some restrictions, however, dynamic BOOTP devices can be sup- 1804 ported in a server on a subnet where failover is supported. The only 1805 restriction (and it is not small) is that on any portion of the sub- 1806 net (in any address pool) where dynamic BOOTP devices can be allo- 1807 cated IP addresses, a DHCP server MUST NOT ever use any of the IP 1808 addresses which were previously available for allocation by its fail- 1809 over partner. Thus, the addresses allocated by the primary to the 1810 secondary for allocation MUST NOT ever be used by the primary server 1811 even if it is in PARTNER-DOWN state and has waited the MCLT after 1812 entering that state. The reason for this is because one of those IP 1813 address could have been allocated by the secondary server to a BOOTP 1814 device, and the primary server would have no way of ever knowing that 1815 happened. 1817 5.14. Guidelines for selecting MCLT 1819 There is no one correct value for the MCLT. There is an explicit 1820 tradeoff between various factors in selecting an MCLT value. 1822 5.14.1. Short MCLT 1824 A short MCLT value will mean that after entering PARTNER-DOWN state, 1825 a server will only have to wait a short time before it can start 1826 allocating its partner's IP addresses to DHCP clients. Furthermore, 1827 it will only have to wait a short time after the expiration of a 1828 lease on an IP address before it can reallocate that IP address to 1829 another DHCP client. 1831 However the downside of a short MCLT value is that the initial lease 1832 interval that will be offered to every new DHCP client will be short, 1833 which will cause increased traffic as those clients will need to send 1834 in their first renew in a half of a short MCLT time. In addition, 1835 the lease extensions that a server in COMMUNICATIONS-INTERRUPTED 1836 state can give will be only the MCLT after the server has been in 1837 COMMUNICATIONS-INTERRUPTED for around the desired client lease 1838 period. If a server stays in COMMUNICATIONS-INTERRUPTED for that 1839 long, then the leases it hands out will be short and that will 1840 increase the load on that server, possibly causing difficulty. 1842 5.14.2. Long MCLT 1844 A long MCLT value will mean that the initial lease period will be 1845 longer and the time that a server in COMMUNICATIONS-INTERRUPTED state 1846 will be able to extend leases (after it has been in COMMUNICATIONS- 1847 INTERRUPTED state for around the desired client lease period) will be 1848 longer. 1850 However, a server entering PARTNER-DOWN state will have to wait the 1851 longer MCLT before being able to allocate its partner's IP addresses 1852 to new DHCP clients. This may mean that additional IP addresses are 1853 required in order to cover this time period. Further, the server in 1854 PARTNER-DOWN will have to wait the longer MCLT from every lease 1855 expiration before it can reallocate an IP address to a different DHCP 1856 client. 1858 6. Packet Formats 1860 This section discusses the common message format that all failover 1861 messages have in common, and then defines option used in the failover 1862 protocol. 1864 6.1. Common message format 1866 All failover protocol messages are sent over the TCP connection 1867 between failover endpoints and encoded using a message format 1868 specific to the failover protocol. 1870 There exists a common message format for all failover messages, which 1871 utilizes the options in a way similar to the DHCP protocol. For each 1872 message type, some options are required and some are optional. In 1873 addition, when a message is received any options that are not 1874 understood by the receiving server MUST be ignored. 1876 All of the fields in the fixed portion of the message MUST be filled 1877 with correct data in every message sent. 1879 0 1 2 3 1880 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1881 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1882 | message length (2) | msg type (1) |payload off (1)| 1883 +---------------+---------------+---------------+---------------+ 1884 | time (4) | 1885 +---------------------------------------------------------------+ 1886 | xid (4) | 1887 +---------------------------------------------------------------+ 1888 | 0 or more additional header bytes (variable) | 1889 +---------------------------------------------------------------+ 1890 | payload data (variable) | 1891 | | 1892 | formatted as DHCP-style options | 1893 | using a unique option number space in the RFC TBD | 1894 | format defined by [NAMESPACE] | 1895 +---------------------------------------------------------------+ 1897 message length - 2 bytes, network byte order 1899 This is the length of the message. It includes the two byte message 1900 length itself. The maximum length is 2048 bytes. 1902 msg type - 1 byte 1904 The message type field is used to distinguish between messages. 1906 The following message types are defined: 1908 Value Message Type 1909 ----- ------------ 1910 0 reserved not used 1911 1 POOLREQ request allocation of addresses 1912 2 POOLRESP respond with allocation count 1913 3 BNDUPD update partner with binding info 1914 4 BNDACK acknowledge receipt of binding update 1915 5 CONNECT establish connection with the secondary 1916 6 CONNECTACK respond to attempt to establish connection with partner 1917 7 UPDREQALL request full transfer of binding info 1918 8 UPDDONE ack send and ack of req'd binding info 1919 9 UPDREQ req transfer of un-acked binding info 1920 10 STATE inform partner of current state or state change 1921 11 CONTACT probe communications integrity with partner 1922 12 DISCONNECT close a connection 1924 New message types should be defined in one of two ranges, 0-127 or 1925 129-255. The range of 0-127 is used for messages that MUST be sup- 1926 ported by every server, and if a server receives a message in the 1927 range of 0-127 that it doesn't understand, it MUST close the TCP con- 1928 nection. The range of 128-255 is used for messages which MAY be sup- 1929 ported but are not required, and if a server receives a message in 1930 this range that it does not understand it SHOULD ignore the message. 1932 payload offset - 1 byte 1934 The byte offset of the Payload Data, from the beginning of the 1935 failover message header. The value for the current protocol version 1936 is 8. 1938 time - 4 bytes, network byte order 1940 The absolute time in GMT when the message was transmitted, 1941 represented as seconds elapsed since Jan 1, 1970 (i.e., similar to 1942 the ANSI C time_t time value representation). While the ANSI C 1943 time_t value is signed, the value used in this specification is 1944 unsigned. 1946 A server SHOULD set this time as close to the actual transmission of 1947 the message as possible. 1949 xid - 4 bytes, network byte order 1951 This is the transaction id of the failover message. The sender of a 1952 failover protocol message is responsible for setting this number, and 1953 the receiver of the message copies the number over into any response 1954 message, treating it as opaque data. The sender SHOULD ensure that 1955 every message sent from a particular failover endpoint over the 1956 associated TCP connection has a unique transaction id unless that 1957 message is a re-transmission. 1959 payload data - variable length 1961 The options are placed after the header, after skipping payload 1962 offset bytes from beginning of the message. The payload data options 1963 are not preceded by a "cookie" value. 1965 The payload data is formatted as DHCP style options using the two 1966 byte option number and two byte option length format as specified in 1967 the recommendations of the DHCP panel in [NAMESPACE]. 1969 The maximum length of the payload data in octets is 2048 less the 1970 size of the header, i.e., the maximum message length is 2048 octets. 1972 6.2. Common option format 1974 The options contained in the payload data section of the failover 1975 message all use the two byte option number and two byte length format 1976 as specified by the recommendations of the DHCP panel in [NAMESPACE]. 1978 The option numbers are drawn from an option number space unique to 1979 the failover protocol. All of the message types share a common 1980 option number space and common options definitions, though not all 1981 options are required or meaningful for every message. 1983 In contrast to the options which appear in DHCP client and server 1984 messages, the options in failover message are ordered. That is, for 1985 some messages the order in which the options appear in the payload 1986 data area is significant. The messages for which this is the case 1987 spell it out in detail. 1989 For all options which refer to time, they all use an absolute time in 1990 GMT. Time synchronization has already been achieved between the 1991 source and the target server using the CONNECT message and is updated 1992 using the time in every packet. All time fields in the options 1993 defined below use a time represented as seconds elapsed since Jan 1, 1994 1970 (i.e. ANSI C time_t time value representation). Note that this 1995 is (at present) a signed field. 1997 Additional options can be defined for intervendor or vendor specific 1998 use with limited difficulty due to the large number of option numbers 1999 available. 2001 6.2.1. binding-status 2003 This option is used to convey the current state of a binding. 2005 Code Len Type 2006 +-----+-----+------+-----+-----+ 2007 | 0 | 1 | 0 | 1 | 1-7 | 2008 +-----+-----+------+-----+-----+ 2010 Legal values for this option are: 2012 Value Binding Status 2013 ----- ------------------------------------------------ 2014 1 FREE Lease has never been used 2015 2 ACTIVE Lease is assigned to a client 2016 3 EXPIRED Lease has expired 2017 4 RELEASED Lease has been released by client 2018 5 ABANDONED A server, or client flagged address as unusable 2019 6 RESET Lease was freed by some external agent 2020 7 BACKUP Lease belongs to secondary's private address pool 2022 6.2.2. assigned-IP-address 2024 The IP address to which this message refers. 2026 Code Len Address 2027 +-----+-----+------+-----+----+-----+-----+-----+ 2028 | 0 | 2 | 0 | 4 | a1 | a2 | a3 | a4 | 2029 +-----+-----+------+-----+----+-----+-----+-----+ 2031 6.2.3. sending-server-IP-address 2033 The IP address of the server sending this message. 2035 Code Len Address 2036 +-----+-----+------+-----+----+-----+-----+-----+ 2037 | 0 | 3 | 0 | 4 | a1 | a2 | a3 | a4 | 2038 +-----+-----+------+-----+----+-----+-----+-----+ 2040 6.2.4. addresses-transferred 2042 A 32 bit unsigned long in network byte order. Reports the number of 2043 addresses transferred by the primary to the secondary server 2044 (addresses to be used for the secondary server's private address 2045 pool) 2047 Code Len Number of Addresses 2048 +-----+-----+------+-----+----+-----+-----+-----+ 2049 | 0 | 4 | 0 | 4 | n1 | n2 | n3 | n4 | 2050 +-----+-----+------+-----+----+-----+-----+-----+ 2052 6.2.5. client-identifier 2054 The format, code and conventions used are identical to DHCP option 2055 61. 2057 Code Len Client Identifier 2058 +-----+-----+------+-----+----+-----+--- 2059 | 0 | 5 | 0 | n | i1 | i2 | ... 2060 +-----+-----+------+-----+----+-----+-- 2062 6.2.6. client-hardware-address 2064 The format is similar to DHCP option 61. Byte t1 (type) MUST be set 2065 to the proper ARP hardware address code, as defined in the ARP 2066 section of RFC 1700 (it MUST NOT be zero!) 2068 Code Len htype chaddr 2069 +-----+-----+------+-----+----+-----+-----+--- 2070 | 0 | 6 | 0 | n | t1 | c1 | c2 | ... 2071 +-----+-----+------+-----+----+-----+-----+--- 2073 Either client-identifier, client-hardware-address or BOTH MAY be 2074 present in binding update transactions. At least one of them MUST be 2075 present. If both are present, the client-identifier MUST be used to 2076 uniquely identify the owner of the binding (exactly as in RFC 2131). 2078 6.2.7. DDNS 2080 If an implementation supports Dynamic DNS updates, this option is 2081 used to communicate the status of the DDNS update associated with a 2082 particular lease binding. The Flags field conveys the types of DNS 2083 RRs that are to be updated by the DHCP server, and the status of the 2084 DDNS update. The Domain Name field conveys the DNS FQDN that the 2085 DHCP server is using to refer to the client, in DNS encoding as 2086 specified in [RFC1035]. 2088 Code Len Flags Domain Name 2089 +-----+-----+------+-----+-----+------+------+-----+------ 2090 | 0 | 7 | 0 | n | flags | d1 | d2 | ... 2091 +-----+-----+------+-----+-----+------+------+-----+------ 2093 The Flags field is a 16-bit field; several bit positions are 2094 specified here. 2096 15 7 0 2097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2098 | MBZ |P|D|A|C| 2099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2101 The bits (numbered from the least-significant bit in network 2102 byte-order) are used as follows: 2104 0 (C): A RR update successfully completed 2105 1 (A): Server is controlling A RR on behalf of the client 2106 2 (D): PTR RR update successfully completed (Done) 2107 3 (P): Server is controlling PTR RR on behalf of the client 2108 4-15 : Must be zero 2110 All of the unspecified bit positions SHOULD be set to 0 by servers 2111 sending the Failover-DDNS option, and they MUST be ignored by servers 2112 receiving the option. 2114 6.2.8. reject-reason 2116 This option is used to selectively reject binding updates. It MAY be 2117 used in BNDACK message, always associated with an assigned-IP-address 2118 option, which contains the IP address of the update being rejected. 2120 Code Len Reason Code 2121 +-----+-----+------+-----+----------+ 2122 | 0 | 8 | 0 | 1 | R1 | 2123 +-----+-----+------+-----+----------+ 2125 Reason codes : 2127 0 Reserved 2128 1 Illegal IP address (not part of any address pool) 2129 2 Fatal conflict exists: address in use by other client. 2130 3 Missing binding information. 2131 4 Connection rejected, time mismatch too great. 2132 5 Connection rejected, invalid MCLT. 2133 6 Connection rejected, unknown reason. 2134 7 Connection rejected, duplicate connection. 2135 8 Connection rejected, invalid failover partner. 2136 9 TLS not supported 2137 10 TLS supported but not configured 2138 11 TLS required but not supported by partner 2139 12 Message digest not supported 2140 13 Message digest not configured 2141 14 Protocol version mismatch 2142 15 Missing binding information 2143 16 Outdated binding information 2144 17 Less critical binding information 2145 18 No traffic within sufficient time 2146 19 Hash bucket assignment conflict 2147 20-253, reserved. 2148 254 Unknown: Error occurred but does not match any reason code 2149 255 Reserved for code expansion 2151 6.2.9. message 2153 This option is used to supply a human readable message. It may be 2154 used in association with the Reject Reason Code to provide a human 2155 readable error message for the reject. 2157 Code Len Text 2158 +-----+-----+------+-----+------+-----+-- 2159 | 0 | 9 | 0 | n | c1 | c2 | ... 2160 +-----+-----+------+-----+------+-----+-- 2162 6.2.10. MCLT 2164 Maximum Client Lead Time, in seconds. A 32 bit integer value, in 2165 network byte order. 2167 Code Len Time 2168 +-----+-----+------+-----+----+-----+-----+-----+ 2169 | 0 | 10 | 0 | 4 | t1 | t2 | t3 | t4 | 2170 +-----+-----+------+-----+----+-----+-----+-----+ 2172 6.2.11. vendor-class-identifier 2174 A string which identifies the vendor of the failover protocol 2175 implementation. 2177 The code for this option is 60, and its minimum length is 1. 2179 Code Len vendor class string 2180 +-----+-----+------+-----+----+-----+--- 2181 | 0 | 11 | 0 | n | c1 | c2 | ... 2182 +-----+-----+------+-----+----+-----+--- 2184 6.2.12. lease-expiration-time 2186 The lease expiration time expressed as an absolute time in GMT 2187 represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t 2188 time value representation). 2190 The lease expiration time is the time that a server has ACKed to a 2191 DHCP client. 2193 Code Len Time 2194 +-----+-----+------+-----+----+-----+-----+-----+ 2195 | 0 | 13 | 0 | 4 | t1 | t2 | t3 | t4 | 2196 +-----+-----+------+-----+----+-----+-----+-----+ 2198 6.2.13. potential-expiration-time 2200 The potential expiration time expressed as an absolute time in GMT 2201 represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t 2202 time value representation). 2204 The potential expiration time is the time that one server tells 2205 another server that it may ACK to a client. 2207 Code Len Time 2208 +-----+-----+------+-----+----+-----+-----+-----+ 2209 | 0 | 14 | 0 | 4 | t1 | t2 | t3 | t4 | 2210 +-----+-----+------+-----+----+-----+-----+-----+ 2212 6.2.14. grace-expiration-time 2214 The grace expiration time expressed as an absolute time in GMT 2215 represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t 2216 time value representation). 2218 The grace expiration time is the time that a grace period will 2219 expire. 2221 Code Len Time 2222 +-----+-----+------+-----+----+-----+-----+-----+ 2223 | 0 | 15 | 0 | 4 | t1 | t2 | t3 | t4 | 2224 +-----+-----+------+-----+----+-----+-----+-----+ 2226 6.2.15. client-last-transaction-time 2228 The time at which this server last received a DHCP request from a 2229 particular client expressed as an absolute time in GMT represented as 2230 seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value 2231 representation). 2233 Code Len Partner Down Time 2234 +-----+-----+------+-----+----+-----+-----+-----+ 2235 | 0 | 16 | 0 | 4 | t1 | t2 | t3 | t4 | 2236 +-----+-----+------+-----+----+-----+-----+-----+ 2238 6.2.16. start-time-of-state 2240 The time at which the state contained in this message began, 2241 expressed as an absolute time in GMT represented as seconds elapsed 2242 since Jan 1, 1970 (i.e. ANSI C time_t time value representation). 2244 This option is used for different states in different messages. In a 2245 BNDUPD message it represents the start time of the state of the lease 2246 in the BNDUPD message. In a STATE message, it represents the start 2247 time of the partner server's failover state. 2249 Code Len Start Time of State 2250 +-----+-----+------+-----+----+-----+-----+-----+ 2251 | 0 | 17 | 0 | 4 | t1 | t2 | t3 | t4 | 2252 +-----+-----+------+-----+----+-----+-----+-----+ 2254 6.2.17. server-state 2256 This option is used to convey the current state of the failover 2257 endpoint in the sending server. 2259 Code Len Server State 2260 +-----+-----+------+-----+-----+ 2261 | 0 | 18 | 0 | 1 | 1-9 | 2262 +-----+-----+------+-----+-----+ 2264 Legal values for this option are: 2266 Value Server State 2267 ----- ------------------------------------------------------------- 2268 0 reserved 2269 1 STARTUP Startup state (1) 2270 2 NORMAL Normal state 2271 3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe) 2272 4 PARTNER-DOWN Partner down (unsafe mode) 2273 5 POTENTIAL-CONFLICT Synchronizing 2274 6 RECOVER Recovering bindings from partner 2275 7 PAUSED Shutting down for a short period. 2276 8 SHUTDOWN Shutting down for an extended 2277 period. 2278 9 RECOVER-DONE Interlock state prior to NORMAL 2280 6.2.18. server-flags 2282 This option is used to convey the current flags of the failover 2283 endpoint in the sending server. 2285 Code Len Server Flags 2286 +-----+-----+------+-----+-------+ 2287 | 0 | 19 | 0 | 1 | flags | 2288 +-----+-----+------+-----+-------+ 2290 Legal values for this option are: 2292 Currently, bit 5 is defined. All other bits 2293 are reserved, and must be set to 0. 2295 o STARTUP 2297 Bit 5 is the STARTUP flag. Bit 5 MUST be set to 1 whenever the 2298 server is in STARTUP state, and set to 0 otherwise. (Note that 2299 when in STARTUP state, the state transmitted in the server-state 2300 option is usually the last recorded state from stable storage, 2301 but see section 9.3 for details.) 2303 6.2.19. vendor-specific-options 2305 This option is used to convey options specific to a particular 2306 vendor's implementation. The vendor class identifier is used to 2307 specify which option space the embedded options are drawn from. 2309 It functions similarly to the vendor class identifier and vendor 2310 specific options in the DHCP protocol. 2312 This option contains other options in the same two byte code, two 2313 byte length format. If this option appears in a message without a 2314 corresponding vendor class identifier, it MUST be ignored. 2316 Code Len Embedded options 2317 +-----+-----+------+-----+----+-----+--- 2318 | 0 | 20 | 0 | n | c1 | c2 | ... 2319 +-----+-----+------+-----+----+-----+--- 2321 6.2.20. max-unacked-bndupd 2323 The maximum number of BNDUPD message that this server is prepared to 2324 accept over the TCP connection without causing the TCP connection to 2325 block. 2327 Code Len Maximum Unacked BNDUPD 2328 +-----+-----+------+-----+----+-----+-----+-----+ 2329 | 0 | 21 | 0 | 4 | n1 | n2 | n3 | n4 | 2330 +-----+-----+------+-----+----+-----+-----+-----+ 2332 6.2.21. receive-timer 2334 The number of seconds within which the server must receive a message 2335 from its partner, or it will assume that the partner is down or the 2336 communication path to the partner has failed. 2338 Code Len Receive Timer 2339 +-----+-----+------+-----+----+-----+-----+-----+ 2340 | 0 | 23 | 0 | 4 | s1 | s2 | s3 | s4 | 2341 +-----+-----+------+-----+----+-----+-----+-----+ 2343 6.2.22. hash-bucket-assignment 2345 The set of hash values to which the receiving server MUST respond. 2346 See section 5.3 for more information on how this option is used. 2348 The format and usage of the data in this option is defined in 2349 [LOADB]. 2351 Code Len Hash Buckets 2352 +-----+-----+------+-----+----+-----+-----+-----+ 2353 | 0 | 24 | 0 | 32 | b1 | b2 | ... | b32 | 2354 +-----+-----+------+-----+----+-----+-----+-----+ 2356 6.2.23. message-digest 2358 The message digest for this message. 2360 This option consists of a variable number of bytes which contain the 2361 message digest of the message prior to the inclusion of this option. 2363 When this option appears in a message, it MUST appear as the last 2364 option in the message. 2366 Code Len Message Digest 2367 +-----+-----+------+-----+----+-----+----- 2368 | 0 | 25 | 0 | n | d1 | d2 | ... 2369 +-----+-----+------+-----+----+-----+----- 2371 6.2.24. protocol-version 2373 The protocol version being used by the server. It is only sent in the 2374 CONNECT and CONNECTACK messages. 2376 Code Len Version 2377 +-----+-----+------+-----+----+ 2378 | 0 | 26 | 0 | 1 | v1 | 2379 +-----+-----+------+-----+----+ 2381 6.2.25. TLS-request 2383 This option contains information relating to TLS security 2384 negotiation. It is sent in a CONNECT message 2386 The first byte, req, is the TLS request from this server. A value of 2387 0 indicates no TLS operation, a value of 1 indicates that TLS 2388 operation is desired, and a value of 2 indicates that TLS operation 2389 is required to establish communications with this server. 2391 The second byte, acc, is what this server will accept for TLS 2392 operation. A value of 0 means that this server will not accept TLS 2393 connections. A value of 1 means that this server will accept TLS 2394 connections. 2396 If req is not zero, then acc MUST be 1. 2398 This allows a server which is not configured to require TLS support 2399 to inform its partner that it will accept a TLS connection although 2400 it does not desire one, for instance. 2402 Code Len request accept 2403 +-----+-----+------+-----+----+----+ 2404 | 0 | 27 | 0 | 2 | req| acc| 2405 +-----+-----+------+-----+----+----+ 2407 6.2.26. TLS-reply 2409 This option contains information relating to TLS security 2410 negotiation. It is sent in a CONNECTACK message 2412 The value of 0 indicates no TLS operation, a value of 1 indicates 2413 that TLS operation is required. 2415 Code Len TLS 2416 +-----+-----+------+-----+----+ 2417 | 0 | 28 | 0 | 1 | t1 | 2418 +-----+-----+------+-----+----+ 2420 6.2.27. client-request-options 2422 This option contains options from a DHCP client's request. It is 2423 sent in a BNDUPD message. The first 4 bytes of the option contain 2424 the "magic number" of the option area from which the DHCP client's 2425 request options were taken and serves to define the format of the 2426 rest of the sub-options contained in this option. After the magic 2427 number, the options included are in the normal options format 2428 appropriate for that magic number. 2430 A server SHOULD NOT include all of the options in a DHCP client 2431 request in this option, but rather a server SHOULD include only those 2432 options which are of likely interest to its partner server. See 2433 section 7.1 for details. 2435 Code Len Magic Number Embedded options 2436 +-----+-----+------+-----+----+----+----+----+----+----+-- 2437 | 0 | 29 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 2438 +-----+-----+------+-----+----+----+----+----+----+----+-- 2440 6.2.28. client-reply-options 2442 This option contains options from a DHCP server's reply to a DHCP 2443 client request. It is sent in a BNDUPD message. The first 4 bytes 2444 of the option contain the "magic number" of the option area from 2445 which the DHCP reply options were taken and serves to define the 2446 format of the rest of the sub-options contained in this option. 2447 After the magic number, the options included are in the normal 2448 options format appropriate for that magic number. 2450 A server SHOULD NOT include all of the options in a DHCP server's 2451 reply to a client's request in this option, but rather a server 2452 SHOULD include only those options which are of likely interest to its 2453 partner server. See section 7.1 for details. 2455 Code Len Magic Number Embedded options 2456 +-----+-----+------+-----+----+----+----+----+----+----+-- 2457 | 0 | 30 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 2458 +-----+-----+------+-----+----+----+----+----+----+----+-- 2460 6.3. BNDUPD message format 2462 The binding update (BNDUPD) message is used to send the binding data- 2463 base changes to the partner server. 2465 The message type for the BNDUPD message is 3. 2467 The xid of the BNDUPD MUST be unique with respect to other failover 2468 messages transmitted from this failover endpoint. 2470 The following table summarizes the various options for the BNDUPD 2471 message. 2473 binding-status 2475 Option ACTIVE EXPIRED RELEASED FREE 2476 ------ ------ ------- -------- ---- 2477 assigned-IP-address MUST MUST MUST MUST 2478 binding-status MUST MUST MUST MUST 2479 client-identifier MAY MAY MAY MAY 2480 client-hardware-address MUST MUST MUST MAY 2481 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 2482 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 2483 grace-expiration-time MUST NOT MUST NOT MUST NOT MUST NOT 2484 start-time-of-state SHOULD SHOULD SHOULD SHOULD 2485 client-last-trans.-time MUST SHOULD MUST MAY 2486 DDNS(1) SHOULD SHOULD SHOULD SHOULD 2487 client-request-options SHOULD SHOULD NOT SHOULD SHOULD NOT 2488 client-reply-options SHOULD SHOULD NOT SHOULD SHOULD NOT 2489 all others MAY MAY MAY MAY 2491 binding-status 2493 BACKUP 2494 RESET 2495 Option ABANDONED 2496 ------ --------- 2497 assigned-IP-address MUST 2498 binding-status MUST 2499 client-identifier MAY(2) 2500 client-hardware-address MAY(2) 2501 lease-expiration-time MUST NOT 2502 potential-expiration-time MUST NOT 2503 grace-expiration-time MUST NOT 2504 start-time-of-state SHOULD 2505 client-last-trans.-time MAY 2506 DDNS(1) SHOULD 2507 client-request-options SHOULD NOT 2508 client-reply-options SHOULD NOT 2509 all others MAY 2511 (1) Only SHOULD appear if server supports dynamic DNS. 2513 (2) MUST NOT if binding-status is ABANDONED. 2515 Table 6.3-1: Options used in a BNDUPD message 2517 6.4. BNDACK message format 2519 A server sends a binding acknowledgement (BNDACK) message when it has 2520 successfully committed binding database changes received from a fail- 2521 over partner in a BNDUPD message to its own stable storage. 2523 The message type for the BNDACK message is 4. 2525 The xid in a BNDACK MUST be the same as the xid of the corresponding 2526 BNDUPD. 2528 The following table summarizes the options for the BNDACK message. 2530 binding-status 2532 Option ACTIVE EXPIRED RELEASED FREE 2533 ------ ------ ------- -------- ---- 2534 assigned-IP-address MUST MUST MUST MUST 2535 binding-status MUST MUST MUST MUST 2536 client-identifier MAY MAY MAY MAY 2537 client-hardware-address MUST MUST MUST MAY 2538 reject-reason MAY MAY MAY MAY 2539 message MAY MAY MAY MAY 2540 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 2541 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 2542 grace-expiration-time MUST NOT MUST NOT MUST NOT MUST NOT 2543 start-time-of-state SHOULD SHOULD SHOULD SHOULD 2544 client-last-trans.-time SHOULD SHOULD SHOULD MAY 2545 DDNS(1) SHOULD SHOULD SHOULD SHOULD 2546 all others MAY MAY MAY MAY 2548 binding-status 2549 BACKUP 2550 RESET 2551 Option ABANDONED 2552 ------ --------- 2553 assigned-IP-address MUST 2554 binding-status MUST 2555 client-identifier MAY 2556 client-hardware-address MAY(2) 2557 reject-reason MAY 2558 message MAY 2559 lease-expiration-time MUST NOT 2560 potential-expiration-time MUST NOT 2561 grace-expiration-time MUST NOT 2562 start-time-of-state SHOULD 2563 client-last-trans.-time MAY 2564 DDNS(1) SHOULD 2565 all others MAY 2567 (1) Only SHOULD appear if the server supports dynamic DNS. 2569 (2) MUST NOT if binding-status is ABANDONED. 2571 Table 6.4-1: Options used in a BNDACK message 2573 6.5. Bulking for BNDUPD and BNDACK messages 2574 DISCUSSION: 2576 Bulking is planned for this protocol, but it hasn't been specified 2577 in this revision of the draft. Once the draft settles down, we 2578 will specify the bulking approach in detail. 2580 6.6. UPDREQ message format 2582 The update request (UPDREQ) message is used by one server to request 2583 that its partner send it all binding database information that it has 2584 not already seen. 2586 The message type for the UPDREQ message is 9. 2588 The xid in a UPDREQ message MUST be unique among messages transmitted 2589 from this failover endpoint during the life of this connection. 2591 There are no options that MUST appear in an UPDREQALL message. Any 2592 option MAY appear, though very few will likely be useful. 2594 6.7. UPDREQALL message format 2596 The update request all (UPDREQALL) message is used by one server to 2597 request that all binding database information be sent in order to 2598 recover from a total loss of its binding database by the requesting 2599 server. 2601 The message type for the UPDREQALL message is 7. 2603 The xid in a UPDREQALL message MUST be unique among messages 2604 transmitted from this failover endpoint during the life of this con- 2605 nection. 2607 There are no options that MUST appear in an UPDREQALL message. Any 2608 option MAY appear, though very few will likely be useful. 2610 6.8. UPDDONE message format 2612 The update done (UPDDONE) message is used by the responding server to 2613 indicate that all requested updates have been sent by the responding 2614 server as BNDUPD messages and responded to by the requesting server 2615 using BNDACK messages. While a BNDACK message MUST have been 2616 received for each BNDUPD message prior to the transmission of the 2617 UPDDONE message, this doesn't necessarily mean that all of the BNDUPD 2618 messages were accepted, only that all of them were responded to with 2619 a BNDACK message. Thus, a NAK (comprised of a BNDACK message con- 2620 taining a reject-reason option) could be used to reject a BNDUPD, but 2621 for the purposes of the UPDDONE message, such NAK would count as a 2622 response to the associated BNDUPD message, and would not block the 2623 eventual transmission of the UPDDONE message. 2625 The message type for the UPDDONE message is 7. 2627 The xid in an UPDDONE message MUST be identical to the xid in the 2628 UPDREQ or UPDREQALL message that initiated the update process. 2630 There are no options that MUST appear in an UPDDONE message. Any 2631 option MAY appear, though very few will likely be useful. 2633 6.9. POOLREQ message format 2635 The pool request (POOLREQ) is used by the secondary server to request 2636 an allocation of IP addresses from the primary server. 2638 The message type for the POOLREQ message is 1. 2640 The xid in a POOLREQ message MUST be unique among messages transmit- 2641 ted from this failover endpoint during the life of this connection. 2643 There are no options that MUST appear in a POOLREQ message. Any 2644 option MAY appear. 2646 6.10. POOLRESP message format 2648 The pool response (POOLRESP) is used by the primary server to inform 2649 the secondary server how many IP addresses were allocated to the 2650 secondary server as the result of the pool request. 2652 The message type for the POOLRESP message is 2. 2654 The xid in the POOLRESP message MUST be identical to the xid in the 2655 POOLREQ message for which this POOLRESP is a response. 2657 The following table shows the options that MUST appear in a POOLRESP 2658 message: 2660 Option 2661 ------ 2662 addresses-transferred MUST 2664 Table 6.10-1: Options used in a POOLREQ message 2666 6.11. CONNECT message format 2668 The connect (CONNECT) message is used by the primary server to estab- 2669 lish a high level connection with the other server, and to transmit 2670 several important configuration data items between the servers. 2672 The message type for the CONNECT message is 5. 2674 The xid in a CONNECT message MUST be unique among messages transmit- 2675 ted from this failover endpoint during the life of this connection. 2677 The CONNECT message MUST be the first message sent down a newly esta- 2678 blished connection. 2680 The following table summarizes the options that are associated with 2681 the CONNECT message: 2683 Option 2684 ------ 2685 sending-server-IP-address MUST 2686 max-unacked-bndupd MUST 2687 receive-timer MUST 2688 vendor-class-identifier MUST 2689 protocol-version MUST 2690 TLS-request MUST 2691 MCLT MUST 2692 hash-bucket-assignment MUST 2693 all others MAY 2695 Table 6.11-1: Options used in a CONNECT message 2697 6.12. CONNECTACK message format 2699 The connect response (CONNECTACK) message is used by a secondary 2700 server to respond to the receipt of a CONNECT message from the pri- 2701 mary server. 2703 The message type for the CONNECTACK message is 6. 2705 The xid in the CONNECTACK message MUST be identical to the xid in the 2706 CONNECT message for which this CONNECTACK is a response. 2708 The following table summarizes the options associated with the CON- 2709 NECTACK message: 2711 Option 2712 ------ 2713 sending-server-IP-address MUST 2714 max-unacked-bndupd MUST 2715 receive-timer MUST 2716 vendor-class-identifier MUST 2717 protocol-version MUST 2718 TLS-request MUST 2719 reject-reason MAY(1) 2720 message MAY 2721 MCLT MUST NOT 2722 hash-bucket-assignment MUST NOT 2724 (1) Indicates a rejection of the CONNECT message. 2726 Table 6.12-1: Options used in a CONNECTACK message 2728 6.13. STATE message format 2730 The state (STATE) message is used by either server to communicate the 2731 current state of the failover endpoint with the other server. It 2732 MUST be sent immediately after connection negotiation completes with 2733 the other server, and it MUST be sent whenever the server's state 2734 changes. 2736 The message type for the STATE message is 10. 2738 The xid in a STATE message MUST be unique among messages transmitted 2739 from this failover endpoint during the life of this connection. 2741 The following table shows the options that MUST appear in a STATE 2742 message: 2744 Option 2745 ------ 2746 sending-state MUST 2747 server-flags MUST 2748 start-time-of-state MUST 2750 Table 6.13-1: Options used in a STATE message 2752 6.14. CONTACT message format 2754 The contact (CONTACT) message is used by either server to verify that 2755 the connection is operational to the other server. 2757 The message type for the CONTACT message is 11. 2759 The xid in a CONTACT message MUST be unique among messages transmit- 2760 ted from this failover endpoint during the life of this connection. 2762 There are no options that MUST be used in a CONTACT message. 2764 6.15. DISCONNECT message format 2766 The disconnect (DISCONNECT) message is used by either server just 2767 prior to closing a connection. 2769 The message type for the DISCONNECT message is 12. 2771 The xid in a DISCONNECT message MUST be unique among messages 2772 transmitted from this failover endpoint during the life of this con- 2773 nection. 2775 The DISCONNECT message MUST be the last message sent down a connec- 2776 tion before it is closed. 2778 The following table summarizes the options that are associated with 2779 the DISCONNECT message: 2781 Option 2782 ------ 2783 reject-reason MUST 2784 message SHOULD 2786 Table 6.15-1: Options used in a DISCONNECT message 2788 7. Protocol Messages 2790 This section contains the detailed definition of the protocol mes- 2791 sages, including the information to include when sending the message, 2792 as well as the actions to take upon receiving the message. 2794 7.1. BNDUPD message 2796 The binding update (BNDUPD) message is used to send the binding data- 2797 base changes to the partner server, and the partner server responds 2798 with a binding acknowledgement (BNDACK) message when it has success- 2799 fully committed those changes to its own stable storage. 2801 The rest of the failover protocol exists to determine whether the 2802 partner server is able to communicate or not, and to enable the 2803 partners to exchange BNDUPD/BNDACK messages in order to keep their 2804 binding databases in stable storage synchronized. 2806 7.1.1. Sending the BNDUPD message 2808 A BNDUPD message SHOULD be generated whenever any binding changes. A 2809 change might be in the binding-status, the lease-expiration-time, or 2810 even just the last-transaction-time. In general, any time a DHCP 2811 client sends in a packet that results in a DHCP server writing to its 2812 stable storage, a BNDUPD message SHOULD be generated. 2814 The BNDUPD (and BNDACK) messages refer to the binding-status of the 2815 IP address, and this protocol defines a series of binding-statuses, 2816 discussed in more detail below. Some servers may not support all of 2817 these binding-statuses, and so in those cases they will not be sent, 2818 and upon receipt a reasonable interpretation should be made. 2820 All BNDUPD messages MUST contain the IP address in the assigned-IP- 2821 address option, and it contains the IP address about which the BNDUPD 2822 message is being sent. 2824 All BNDUPD messages MUST contain the binding-status option, and it 2825 will have one of the values in the following list. This list 2826 discusses the meanings of the various binding-statuses and the infor- 2827 mation that should go into the BNDUPD message because of them. 2829 o ACTIVE 2831 Indicates that the IP address is currently leased to a DHCP 2832 client. 2834 client-hardware-address 2836 The client-hardware-address option MUST appear, and be set from 2837 the htype and chaddr of the DHCP client to which this IP address 2838 is leased. 2840 client-identifier 2842 If the DHCP client to which this IP address is leased used a 2843 client-identifier option to identify itself, then the client- 2844 identifier MUST appear in the BNDUPD message, else it MUST NOT 2845 appear. 2847 lease-expiration-time 2849 The lease-expiration-time option MUST appear, and be set to the 2850 expiration time most recently ACKed to the DHCP client. Note 2851 that the time ACKed to a DHCP client is a lease duration in 2852 seconds, while the lease-expiration-time option in a BNDUPD mes- 2853 sage is an absolute time value. 2855 potential-expiration-time 2857 The potential-expiration-time option MUST appear, and be set to 2858 a value beyond that of the lease-expiration time. This is the 2859 value that is ACKed by the BNDACK message. A server sending a 2860 BNDUPD message MUST be able to recover the potential- 2861 expiration-time sent in every BNDUPD, not just those that 2862 receive a corresponding BNDACK, in order to be able to protect 2863 against possible duplicate allocation of IP addresses after 2864 transitioning to PARTNER-DOWN state. See section 5.2.1 for 2865 details as to why the potential-expiration-time exists and 2866 guidelines for how to decide the value. 2868 o EXPIRED 2870 A binding-status of EXPIRED is used when a client's binding on 2871 an IP address has expired and the server does not wish to imple- 2872 ment an expired-grace period. When the partner server ACK's the 2873 BNDUPD of an EXPIRED IP address, the server sets its internal 2874 state to FREE. It is then available to allocation to any client 2875 of the primary server. 2877 client-hardware-address 2879 There SHOULD be a DHCP client associated with the IP address 2880 whose binding has expired. If there is, then the client- 2881 hardware-address option MUST appear, and be set from the htype 2882 and chaddr of the DHCP client to which this IP address was 2883 leased. 2885 client-identifier 2887 There SHOULD be a DHCP client associated with the IP address 2888 whose binding has expired. If there is, then if the DHCP client 2889 to which this IP address was leased used a client-identifier 2890 option to identify itself, then the client-identifier MUST 2891 appear in the BNDUPD message, else it MUST NOT appear. 2893 o RELEASED 2894 A binding-status of RELEASED is used when a DHCP client sends in 2895 a DHCPRELEASE message and the server does not wish to implement 2896 a released-grace period. When the partner server ACK's the 2897 BNDUPD of an RELEASED IP address, the server sets its internal 2898 state to FREE, and it is available for allocation by the primary 2899 server to any DHCP client. 2901 client-hardware-address 2903 There SHOULD be a DHCP client associated with the IP address 2904 whose binding has been released. If there is, then the client- 2905 hardware-address option MUST appear, and be set from the htype 2906 and chaddr of the DHCP client which released this IP address. 2908 client-identifier 2910 There SHOULD be a DHCP client associated with the IP address 2911 whose binding has been released. If there is, then if the DHCP 2912 client which released this IP address used a client-identifier 2913 option to identify itself, then the client-identifier MUST 2914 appear in the BNDUPD message, else it MUST NOT appear. 2916 o FREE 2918 A binding-status of FREE is used when a DHCP server needs to 2919 communicate that an IP address is available for allocation to 2920 another server, but it was not just released, expired, or reset 2921 by a network administrator. When the partner server ACK's the 2922 BNDUPD of an FREE IP address, the server sets its internal state 2923 such that it is available for allocation by any DHCP client. 2925 client-hardware-address 2927 There MAY be a DHCP client associated with the IP address whose 2928 binding is now desired to be FREE. If there is, then the 2929 client-hardware-address option MUST appear, and be set from the 2930 htype and chaddr of the DHCP client which released this IP 2931 address. 2933 client-identifier 2935 There MAY be a DHCP client associated with the IP address whose 2936 binding is now desired to be FREE. If there is, then if the 2937 DHCP client which released this IP address used a client- 2938 identifier option to identify itself, then the client-identifier 2939 MUST appear in the BNDUPD message, else it MUST NOT appear. 2941 client-hardware-address 2942 There MAY be a DHCP client associated with the IP address whose 2943 binding has now expired. If there is, then the client- 2944 hardware-address option MUST appear, and be set from the htype 2945 and chaddr of the DHCP client which released this IP address. 2947 client-identifier 2949 There MAY be a DHCP client associated with the IP address whose 2950 binding has now expired. If there is, then if the DHCP client 2951 which most recently leased this IP address used a client- 2952 identifier option to identify itself, then the client-identifier 2953 MUST appear in the BNDUPD message, else it MUST NOT appear. 2955 grace-expiration-time 2957 The grace-expiration-time option MUST appear, and is the length 2958 of time that this server will wait before trying to make the IP 2959 address available after the lease has expired for this IP 2960 address. 2962 client-hardware-address 2964 There MAY be a DHCP client associated with the IP address whose 2965 binding has now been released by sending a DHCPRELEASE. If 2966 there is, then the client-hardware-address option MUST appear, 2967 and be set from the htype and chaddr of the DHCP client which 2968 released this IP address. 2970 client-identifier 2972 There MAY be a DHCP client associated with the IP address whose 2973 binding has been released. If there is, then if the DHCP client 2974 which most recently leased this IP address used a client- 2975 identifier option to identify itself, then the client-identifier 2976 MUST appear in the BNDUPD message, else it MUST NOT appear. 2978 client-hardware-address 2980 There MAY be a DHCP client associated with the IP address whose 2981 binding is now desired to be FREE. If there is, then the 2982 client-hardware-address option MUST appear, and be set from the 2983 htype and chaddr of the DHCP client which released this IP 2984 address. 2986 client-identifier 2988 There MAY be a DHCP client associated with the IP address whose 2989 binding is now desired to be FREE. If there is, then if the 2990 DHCP client which released this IP address used a client- 2991 identifier option to identify itself, then the client-identifier 2992 MUST appear in the BNDUPD message, else it MUST NOT appear. 2994 grace-expiration-time 2996 The grace-expiration-time MUST appear, and is the length of time 2997 that this server will wait before trying to make the IP address 2998 available after the lease was released for this IP address 3000 o ABANDONED 3002 An ABANDONED IP address is one that has been considered unusable 3003 by the DHCP subsystem. An IP address for which a valid PING 3004 response was received SHOULD be set to ABANDONED. 3006 client-hardware-address 3008 There SHOULD NOT be a DHCP client associated with an ABANDONDED 3009 IP address. The client-hardware-address option MUST NOT appear 3010 in the BNDUPD message. 3012 client-identifier 3014 There SHOULD NOT be a DHCP client associated with the IP address 3015 whose binding has now been ABANDONED. The client-identifier 3016 option MUST-NOT appear in the BNDUPD message. 3018 o RESET 3020 The RESET value of the binding-status is used to indicate that 3021 this IP address was made available by operator command. 3023 o BACKUP 3025 The BACKUP value of binding-status indicates that this IP 3026 address belongs to the secondary server, and can be allocated by 3027 that server to a DHCP client at any time. 3029 client-hardware-address 3031 There MAY be a DHCP client associated with an BACKUP IP address. 3032 If there is, the client-hardware-address option MUST appear, and 3033 be set from the htype and chaddr of the DHCP client to which 3034 this IP address was most recently associated. 3036 client-identifier 3037 There MAY be a DHCP client associated with this IP address. If 3038 the DHCP client to which this IP address is leased used a 3039 client-identifier option to identify itself, then the client- 3040 identifier MUST appear in the BNDUPD message, else it MUST NOT 3041 appear. 3043 The following option information is generic to all BNDUPD messages, 3044 regardless of the value of the binding-status. 3046 o start-time-of-state 3048 The start-time-of-state SHOULD appear. It is set to the time at 3049 which this IP address first took on the state that corresponds to 3050 the current value of binding-status. 3052 o last-transaction-time 3054 The last-transaction-time value SHOULD appear. This is the time at 3055 which this DHCP server last received a packet from the DHCP client 3056 referenced by the client-identifier or client-hardware-address that 3057 was associated with the IP address referenced by the assigned-IP- 3058 address. 3060 o DDNS 3062 If the DHCP server is performing dynamic DNS operations on behalf 3063 of the DHCP client represented by the client-identifier or client- 3064 hardware-address, then it should include a DDNS option containing 3065 the host name, domain name, and status of any dynamic DNS opera- 3066 tions enabled. 3068 o client-request-options 3070 If the BNDUPD was triggered by a request from a DHCP client (typi- 3071 cally those with binding-status of ACTIVE and RELEASED), then the 3072 server SHOULD include options of interest to a failover partner 3073 from the client's request packet in the client-request-options for 3074 transmission to its partner. 3076 A server sending a BNDUPD need not remember the "interesting" 3077 options or the information that would appear in an "interesting" 3078 option for transmission at a time when the BNDUPD is not closely 3079 associated with a DHCP client request. 3081 A server SHOULD send the following "interesting" options. It MAY 3082 send any DHCP client options. As new options are defined, the RFC 3083 defining these options SHOULD include information that they are 3084 "interesting to failover servers" if they should be sent as part of 3085 a BNDUPD. 3087 option option 3088 number name 3089 ----------------------------------------- 3091 12 host-name 3092 81 client-FQDN [DDNS] 3093 82 relay-agent-information [AGENTINFO] 3094 TBD user-class [USERCLASS] 3095 60 vendor-class-identifier 3097 Table 7.1.1-1: Options which SHOULD be sent in 3098 the client-request-options option in a BNDUPD message. 3100 o client-reply-options 3102 If the BNDUPD was triggered by a request from a DHCP client (typi- 3103 cally those with binding-status of ACTIVE and RELEASED), then the 3104 server SHOULD include options of interest to a failover partner 3105 from the server's DHCP reply packet in the client-reply-options for 3106 transmission to its partner. 3108 A server sending a BNDUPD need not remember the "interesting" 3109 options or the information that would appear in an "interesting" 3110 option for transmission at a time when the BNDUPD is not closely 3111 associated with a DHCP client request. 3113 A server SHOULD send the following "interesting" options. It MAY 3114 send any DHCP client options. As new options are defined, the RFC 3115 defining these options SHOULD include information that they are 3116 "interesting to failover servers" if they should be sent as part of 3117 a BNDUPD. 3119 option option 3120 number name 3121 ----------------------------------------- 3123 58 renewal-time 3124 59 rebinding-time 3126 Table 7.1.1-2: Options which SHOULD be sent in 3127 the client-reply-options option in a BNDUPD message. 3129 The BNDUPD message SHOULD be sent as soon as possible from the time 3130 that the DHCP client received a response and the lease bindings data- 3131 base is written on stable storage. 3133 7.1.2. Receiving the BNDUPD message 3135 When a server receives a BNDUPD message, it needs to decide how to 3136 processes the message and whether the message represents a conflict 3137 of any sort. The conflict resolution process SHOULD be used on the 3138 receipt of every BNDUPD message, not just those that are received 3139 while in POTENTIAL-CONFLICT state, in order to increase the robust- 3140 ness of the protocol. 3142 There are three sorts of conflicts: 3144 o Two clients one IP address conflict 3146 This is the duplicate IP address allocation conflict. There are 3147 two different clients each allocated the same address. There 3148 cannot be a client conflict unless there is a client specified 3149 in the BNDUPD message. See section 5.10.1 for how to resolve 3150 this conflict. 3152 o Two IP addresses one client conflict 3154 This conflict exists when a client on one server is associated 3155 with a one IP address, and on the other server with a different 3156 IP address in the same or a related subnet. This does not refer 3157 to the case where a single client has addresses in multiple dif- 3158 ferent subnets or administrative domains, but rather the case 3159 where on the same subnet the client has as lease on one IP 3160 address in one server and on a different IP address on the other 3161 server. 3163 This conflict may or may not be a problem for a given DHCP 3164 server implementation. In the event that a DHCP server requires 3165 that a DHCP client have only one outstanding lease for an IP 3166 address on one subnet, this conflict should be resolved by 3167 accepting the update which has the latest client-last- 3168 transaction-time. 3170 o binding-status conflict 3172 This is normal conflict, where one server is updating the other 3173 with newer information. See section 5.10.1 for details of how 3174 to resolve these conflicts. 3176 See section 5.10.1 for details of how to process binding-status 3177 changes in BNDUPD messages. 3179 7.1.3. Accepting the BNDUPD message 3181 When accepting a BNDUPD message, the information contained in the 3182 client-request-options and client-reply-options SHOULD be examined 3183 for any information of interest to this server. For instance, a 3184 server which wished to detect changes in client specified host names 3185 might want examine and save information from the host-name or 3186 client-FQDN options. Server's which expect to utilize information 3187 from the relay-agent-information option would want to store this 3188 information. 3190 7.1.4. Time values related to the BNDUPD message 3192 There are three time values that may be sent in a BNDUPD message. 3194 o lease-expiration-time 3196 The time that the server gave to the client, i.e., the time that 3197 the server believes that the client's lease will expire. 3199 o potential-expiration-time 3201 The time that the server wants to be sure its partner waits 3202 (added to the MCLT) before assuming that this lease has expired. 3203 Typically some time beyond the desired client lease time. 3205 o client-last-transaction-time 3207 The time that the client last interacted with this server. 3209 As discussed in section 5.2, each server knows what its partner has 3210 ACKed with regard to potential-expiration time. In addition, each 3211 server needs to remember what it has told its partner as the 3212 potential-expiration-time. Moreover, each server must remember what 3213 it has acked to the *other* server as the most recent potential- 3214 expiration-time from that server. 3216 Remember that each server sends a potential-expiration-time and 3217 receives an ACK for that as well as receiving a potential- 3218 expiration-time and needing to remember what it has acked for that. 3220 While they don't have to be named in any particular way, the times 3221 that a server needs to remember for every IP address in order to 3222 implement the failover protocol are: 3224 o lease-expiration-time 3225 The time that this server gave to the DHCP client. A DHCP 3226 server needs to remember this time already, just to be a DHCP 3227 server. 3229 o sent-potential-expiration-time 3231 The latest time sent to the partner for a potential-expiration- 3232 time. 3234 o acked-potential-expiration-time 3236 The latest time that the partner has acked for a potential 3237 expiration time. Typically the same as sent-potential- 3238 expiration-time if there is not a BNDUPD outstanding. 3240 o received-potential-expiration-time 3242 The latest time that this server has ever received as a 3243 potential-expiration-time from its partner in a BNDUPD that this 3244 server ACKed. 3246 So, a server has to remember two additional times concerning BNDUPD 3247 messages that it has initiated, and one additional time concerning 3248 BNDUPD message that it has received. How are these times used? 3250 First, let's look at the time that DHCP server can offer to a DHCP 3251 client. A server can offer to a to a DHCP client a time that is no 3252 longer than the MCLT beyond the max( received-potential-expiration- 3253 time, acked-potential-expiration-time). One might think that the 3254 server should be able to offer only the MCLT beyond the acked- 3255 potential-expiration-time, and while that is certainly simple and 3256 easy to understand, it has negative consequences in actual operation. 3258 To illustrate this, in the simple case where the primary updates the 3259 secondary for a while and then fails, if the secondary can then renew 3260 the client for only the MCLT beyond the acked-potential-expiration- 3261 time, then the secondary will only be able to renew the client for 3262 the MCLT, because the secondary has never sent a BNDUPD packet to the 3263 primary concerning this IP address and client, and so its acked- 3264 potential-expiration-time is zero. 3266 However, if we allow the secondary to renew the client with the MCLT 3267 beyond the max( received-potential-expiration-time, acked-potential- 3268 expiration-time), then the secondary can usually renew the client for 3269 the full lease period, at least for the first renew it sees from the 3270 client, since the received-potential-expiration-time is generally 3271 longer than the client's desired lease interval. The difference in 3272 renew times could make a big difference in server load on the 3273 secondary in this case. 3275 What are the consequences of allowing a server to offer a DHCP client 3276 a lease term of the MCLT beyond the max( received-potential- 3277 expiration-time, acked-potential-expiration-time)? The consequences 3278 appear whenever a server enters PARTNER-DOWN state, and affect how 3279 long that server has to wait before reallocating expired leases. 3280 With this approach, when a server goes into PARTNER-DOWN state, it 3281 must wait the MCLT beyond the max( lease-expiration-time, sent- 3282 potential-expiration-time, acked-potential-expiration-time, 3283 received-potential-expiration-time ) for each IP address before it 3284 can reallocate that IP address to another DHCP client. One might 3285 normally think that it needed to wait only the MCLT beyond the max( 3286 lease-expiration-time, received-potential-expiration-time ), i.e., 3287 beyond what it has told the client and what it has explicitly acked 3288 to the other server. But with the optimization discussed above -- 3289 where either server can offer the DHCP client a lease term of the 3290 MCLT beyond the max( received-potential-expiration-time, acked- 3291 potential-expiration-time), then the additional times sent- 3292 potential-expiration-time and acked-potential-expiration-time must be 3293 added into the expression, since the partner could have used those 3294 times as part of its own lease time calculation. 3296 Thus this optimization may require a longer waiting time when enter- 3297 ing PARTNER-DOWN state, but will generally allow servers to operate 3298 considerably more effectively when running in COMMUNICATIONS- 3299 INTERRUPTED state. 3301 7.2. BNDACK message 3303 Every BNDUPD message that is received by a server MUST be responded 3304 to with a corresponding BNDACK message. The receiving server SHOULD 3305 respond quickly to every BNDUPD message but it MAY choose to respond 3306 preferentially to DHCP client requests instead of BNDUPD messages, 3307 since there is no absolute time period within which a BNDACK must be 3308 sent in response to a BNDUPD message, and DHCP clients frequently do 3309 have time constraints that must be met. 3311 A BNDACK message can only be sent in response to a BNDUPD message 3312 using the same TCP connection from which the BNDUPD message was 3313 received, since the XID's in BNDUPD messages are guaranteed unique 3314 only during the life of a single TCP connection. When a connection 3315 to a partner server goes down, a server with unprocessed BNDUPD mes- 3316 sages MAY simply drop all of those messages, since it can be sure 3317 that the partner will retransmit them when they are next in communi- 3318 cations. A server with unprocessed BNDUPD messages when a TCP con- 3319 nection goes down MAY instead choose to process those BNDUPD mes- 3320 sages, but it MUST NOT send any BNDACK messages in response (again 3321 because of the issues surrounding XID uniqueness). 3323 7.2.1. Sending the BNDACK message 3325 The BNDACK message MUST contain the same xid as the corresponding 3326 BNDUPD message. 3328 All of the options which appear in the BNDUPD message MUST be 3329 included in the BNDACK message. The values in the options MAY be 3330 updated to reflect current information on the server sending the 3331 BNDACK. Note that update of this information may be used for infor- 3332 mational purposes, but MUST NOT be assumed to necessarily be recorded 3333 in the stable storage of the server who sent the BNDUPD message 3334 because there is no corresponding ACK of the BNDACK message. Any 3335 information that SHOULD be recorded in the partner server's stable 3336 storage MUST be transmitted in a subsequent BNDUPD. 3338 If the server is accepting the BNDUPD, the BNDACK message includes 3339 only those options that appeared in the BNDUPD message. If the server 3340 is rejecting the BNDUPD, the additional option reject-reason MUST 3341 appear in the BNDACK message, and the message option SHOULD appear in 3342 this case containing a human-readable error message describing in 3343 some detail the reason for the rejection of the BNDUPD message. 3345 If the server rejects the BNDUPD message with a BNDACK and a reject- 3346 reason option, it may be because the server believes that it has 3347 binding information that the other server should know. A server 3348 which is rejecting a BNDUPD may initiate a BNDUPD of its own in order 3349 to update its partner with what it believes is better binding infor- 3350 mation, but it MUST ensure through some means that it will not end up 3351 a situation where each server is sending BNDUPD messages as fast as 3352 possible because they can't agree on which server has better binding 3353 data. Placing a reasonable delay on the initiation of a BNDUPD mes- 3354 sage after sending a BNDACK with a reject-reason would be one way to 3355 ensure this situation doesn't occur. 3357 7.2.2. Receiving the BNDACK message 3359 When a server receives a BNDACK message, if it doesn't contain a 3360 reject-reason option that means that the BNDUPD message was accepted, 3361 and the server which sent the BNDUPD MUST update its stable storage 3362 with the potential-expiration-time value sent in the BNDUPD message 3363 and returned in the BNDACK message. Other values sent in the BNDUPD 3364 message MAY be used as desired. 3366 7.3. UPDREQ message 3368 The update request (UPDREQ) message is used by one server to request 3369 that its partner send it all of the binding database information that 3370 it has not already seen. Since each server is required to keep 3371 track at all times of the binding information the other server has 3372 received and ACKed, one server can request transmission of all un- 3373 ACKed binding database information held by the other server by using 3374 the UPDREQ message. 3376 The UPDREQ message is used whenever the sending server cannot proceed 3377 before it has processed all previously un-ACKed binding update infor- 3378 mation, since the UPDREQ message should yield a corresponding UPDDONE 3379 message. The UPDDONE message is not sent until the server that sent 3380 the UPDREQ message has responded to all of the BNDUPD messages gen- 3381 erated by the UPDREQ message with BNDACK messages. Thus, the sender 3382 of the UPDREQ message can be sure upon receipt of an UPDDONE message 3383 that it has received and committed to stable storage all outstanding 3384 binding database updates. 3386 See section 9, Protocol state transitions, for the details of when 3387 the UPDREQ message is sent. 3389 7.3.1. Sending the UPDREQ message 3391 There are no options for the UPDREQ message. 3393 The UPDREQ message is sent with a unique xid. 3395 7.3.2. Receiving the UPDREQ message 3397 A server receiving an UPDREQ message MUST send all binding database 3398 changes that have not yet been ACKed by the sending server. These 3399 changes are sent as undistinguished BNDUPD messages. 3401 However, the server which received and is processing the UPDREQ mes- 3402 sage MUST track the BNDACK messages that correspond to the BNDUPD 3403 messages triggered by the UPDREQ message and, when they are all 3404 received, the server MUST send an UPDDONE message. 3406 The server processing the UPDREQ message and sending BNDUPD messages 3407 to its partner SHOULD only track the BNDUPD and BNDACK message pairs 3408 for unACKed binding database changes that were present upon the 3409 receipt of the UPDREQ message. A server which has received an UPDREQ 3410 message SHOULD send BNDUPD messages for binding database changes that 3411 occur after receipt of the UPDREQ message, but it SHOULD NOT include 3412 those additional BNDUPD messages and their corresponding BNDACK mes- 3413 sages in the accounting necessary to consider the UPDREQ complete and 3414 subsequently send the UPDDONE message. If some additional binding 3415 database changes end up becoming part of the set of BNDUPD messages 3416 considered as part of the UPDREQ (due to whatever algorithm the 3417 server uses to scan its bindings database for unacked changes) it 3418 will probably not cause any difficulty, but a server MUST NOT attempt 3419 to include all such later BNDUPD messages in the accounting for the 3420 UPDREQ in order to be able to transmit an UPDDONE message. 3422 When queuing up the BNDUPD messages for transmission to the sender of 3423 the UPDREQ message, the server processing the UPDREQ message MUST 3424 honor the value returned in the max-unacked-bndupd option in the CON- 3425 NECT or CONNECTACK message that set up the connection with the send- 3426 ing server. It MUST NOT send more BNDUPD messages without receiving 3427 corresponding BNDACKs than the value returned in max-unacked-bndupd. 3429 7.4. UPDREQALL message 3431 The update request all (UPDREQALL) message is used by one server to 3432 request that its partner send it all of the binding database informa- 3433 tion. This message is used to allow one server to recover from a 3434 failure of stable storage and to restore its binding database in its 3435 entirety from the other server. 3437 A server which sends an UPDREQALL message cannot proceed until all of 3438 its binding update information is restored, and it knows that all of 3439 that information is restored when an UPDDONE message is received. 3441 See section 9, Protocol state transitions, for the details of when 3442 the UPDREQALL message is sent. 3444 7.4.1. Sending the UPDREQALL message 3446 There are no options for the UPDREQALL message. 3448 The UPDREQALL message is sent with a unique xid. 3450 7.4.2. Receiving the UPDREQALL message 3452 A server receiving an UPDREQALL message MUST send all binding data- 3453 base information to the sending server. These changes are sent as 3454 undistinguished BNDUPD messages. 3456 However, the server processing the UPDREQALL message MUST track the 3457 BNDACK messages that correspond to the BNDUPD messages triggered by 3458 the UPDREQALL message and, when they are all received, the server 3459 MUST send an UPDDONE message. 3461 Just as specified for the processing of the UPDREQ message, the 3462 server processing the UPDREQALL message and sending BNDUPD messages 3463 to its partner SHOULD only track the BNDUPD and BNDACK message pairs 3464 for unACKed binding database changes that were present upon the 3465 receipt of the UPDREQALL message. A server which has received an 3466 UPDREQALL message SHOULD send BNDUPD messages for binding database 3467 changes that occur after receipt of the UPDREQ message, but it SHOULD 3468 NOT include those additional BNDUPD messages and their corresponding 3469 BNDACK messages in the accounting necessary to consider the UPDREQALL 3470 complete and subsequently send the UPDDONE message. If some addi- 3471 tional binding database changes end up becoming part of the set of 3472 BNDUPD messages considered as part of the UPDREALLQ (due to whatever 3473 algorithm the server uses to scan its bindings database for unacked 3474 changes) it will probably not cause any difficulty, but a server MUST 3475 NOT attempt to include all such later BNDUPD messages in the account- 3476 ing for the UPDREQALL in order to be able to transmit an UPDDONE mes- 3477 sage. 3479 When queuing up the BNDUPD messages for transmission to the sender of 3480 the UPDREQALL message, the server processing the UPDREQALL MUST honor 3481 the value returned in the max-unacked-bndupd option in the CONNECT or 3482 CONNECTACK message that set up the connection with the sending 3483 server. It MUST NOT send more BNDUPD messages without receiving 3484 corresponding BNDACKs than the value returned in max-unacked-bndupd. 3486 7.5. UPDDONE message 3488 The update done (UPDDONE) message is used by a server receiving an 3489 UPDREQ or UPDREQALL message to signify that it has sent all of the 3490 BNDUPD messages requested by the UPDREQ or UPDREQALL request and that 3491 it has received a BNDACK for each of those messages. 3493 7.5.1. Sending the UPDDONE message 3495 The UPDDONE message SHOULD be sent as soon as the last BNDACK message 3496 corresponding to a BNDUPD message requested by the UPDREQ or 3497 UPDREQALL is received from the server which sent the UPDREQ or 3498 UPDREQALL. The XID of the UPDDONE message MUST be the same as the 3499 XID of the corresponding UPDREQ or UPDREQALL message. 3501 7.5.2. Receiving the UPDDONE message 3503 A server receiving the UPDDONE message knows that all of the informa- 3504 tion that it requested by sending an UPDREQ or UPDREQALL message has 3505 now been sent and that it has recorded this information in its stable 3506 storage. It typically uses that the receipt of an UPDDONE message to 3507 move to a different failover state. See sections 9.5.2 and 9.8.3 for 3508 details. 3510 7.6. POOLREQ message 3512 The pool request (POOLREQ) message is used by the secondary server to 3513 request an allocation of IP addresses from the primary server. It 3514 MUST be sent by a secondary server to a primary server to request IP 3515 address allocation by the primary. The IP addresses allocated are 3516 transmitted using normal BNDUPD messages from the primary to the 3517 secondary. 3519 The POOLREQ message SHOULD be sent from the secondary to the primary 3520 whenever the secondary transitions into NORMAL state. It SHOULD 3521 periodically be resent in order that any change in the number of 3522 available IP addresses on the primary be reflected in the pool on the 3523 secondary. The period may be influenced by the secondary server's 3524 leasing activity. 3526 7.6.1. Sending the POOLREQ message 3528 The POOLREQ message has no options. It must be sent with a unique 3529 xid. 3531 7.6.2. Receiving the POOLREQ message 3533 When a primary server receives a POOLREQ message it SHOULD examine 3534 the binding database and determine how many IP addresses the secon- 3535 dary server should have, and set these IP addresses to BACKUP state. 3536 It SHOULD then send BNDUPD messages concerning all of these IP 3537 addresses to the secondary server. 3539 Servers frequently have several kinds of IP addresses available on a 3540 particular network segment. The failover protocol assumes that both 3541 primary and secondary servers are configured in such a way that each 3542 knows the type and number of IP addresses on every network segment 3543 participating in the failover protocol. The primary server is 3544 responsible for allocating the secondary server the correct propor- 3545 tion of available IP addresses of each kind, and the secondary server 3546 is responsible for being configured in such a way that it can tell 3547 the kind of every IP address based solely on the IP address itself. 3549 A primary server MUST keep track of how many IP addresses were allo- 3550 cated as a result of processing the POOLREQ message, and send that 3551 number in the POOLRESP message. 3553 A primary server MAY choose to defer processing a POOLREQ message 3554 until a more convenient time to process it, but it should not depend 3555 on the secondary server to retransmit the POOLREQ message in that 3556 case. 3558 If a secondary server receives a POOLREQ message it SHOULD report an 3559 error. 3561 7.7. POOLRESP message 3563 A primary server sends a POOLRESP message to a secondary server after 3564 the allocation process for available addresses to the secondary 3565 server is complete. Typically this message will precede some of the 3566 BNDUPD messages that the primary uses to send the actual allocated IP 3567 addresses to the secondary. 3569 7.7.1. Sending the POOLRESP message 3571 The POOLRESP message MUST contain the same xid as the corresponding 3572 POOLREQ message. 3574 The only option which MUST appear in a POOLREQ message is: 3576 o addressed-transferred 3578 The number of addresses allocated to the secondary server by the 3579 primary server as a result of a POOLREQ is contained in the 3580 addresses-transferred option in a POOLRESP message. Note this 3581 is the number of addresses that are transferred to the secondary 3582 in the primary's binding database as a result of the correspond- 3583 ing POOLREQ message, and that it may be some time before they 3584 can all be transmitted to the secondary server through the use 3585 of BNDUPD messages. 3587 7.7.2. Receiving the POOLRESP message 3589 When a secondary server receives a POOLRESP message, it SHOULD send 3590 another POOLRESP message if the value of the addresses-transferred 3591 option is non-zero. 3593 Typically, no other action is taken on the reception of a POOLRESP 3594 message. 3596 7.8. CONNECT message 3598 The connect message is used to establish an applications level con- 3599 nection over a newly created TCP connection. It gives the source 3600 information for the connection, and some important configuration 3601 information. It MUST be sent only by the primary server. Either 3602 server can initiate a TCP connection, but the CONNECT message is only 3603 sent by the primary server. 3605 7.8.1. Sending the CONNECT message 3607 The CONNECT message MUST be the first message sent by the primary 3608 server after the establishment of a new TCP connection with a secon- 3609 dary server participating in the failover protocol. 3611 The xid of the CONNECT message must be unique. 3613 The IP address of the primary server MUST be placed in the sending- 3614 server-IP-address option. This information is placed in an option 3615 inside of the message in order to allow the identity of the sender to 3616 be covered by a shared secret. 3618 The number of BNDUPD messages the primary server can accept without 3619 blocking the TCP connection MUST be placed in the max-unacked-bndupd 3620 option. This MUST be a number equal to or greater than 1, SHOULD be 3621 a number greater than 10, and SHOULD be a number less than 100. 3623 The length of the receive timer (tReceive, see section 8.3) MUST be 3624 placed in the receive-timer option. 3626 The MCLT MUST be placed in the MCLT option. 3628 The hash-bucket-assignment option MUST be included in the CONNECT 3629 message. In the event that load balancing is not configured for this 3630 server, the hash-bucket-assignment option will indicate that. The 3631 value of the hash-bucket-assignment option is determined from the 3632 specific buckets that the primary server has determined that the 3633 secondary server MUST service as part of the load-balancing algo- 3634 rithm. The way in which the primary server determines this informa- 3635 tion is outside the scope of this protocol definition. The primary 3636 server SHOULD be configured with a percentage of clients that the 3637 secondary server will be instructed to service, and the primary 3638 server SHOULD use the algorithm in [LOADB] to generate a Hash Bucket 3639 Assignment which it sends to the secondary server. 3641 The vendor class identifier MUST be placed in the vendor-class- 3642 identifier option. 3644 The protocol-version option MUST be included in every CONNECT mes- 3645 sage. The current value of the protocol version is 1. 3647 The TLS-request option MUST be sent and contains the desired TLS con- 3648 nection request as well as information concerning whether TLS is sup- 3649 ported. If this CONNECT message is being sent over a already 3650 created TLS connection, the TLS-request MUST NOT appear. 3652 7.8.2. Receiving the CONNECT message 3654 When a server receives a TCP connection on the failover port, if it 3655 is a PRIMARY server it should send a CONNECT message, and if it is a 3656 secondary server it should wait for a CONNECT message. 3658 When a secondary server receives a CONNECT message it should: 3660 1. Record the time at which the message was received. 3662 2. Examine the protocol-version option, and decide if this server 3663 is capable of interoperating with another server running that 3664 protocol version. If not, send the CONNECTACK message with 3665 the appropriate reject-reason. The server MUST include its 3666 protocol-version in the CONNECTACK message. 3668 3. Examine the TLS-request option. Figure out the TLS-reply 3669 value based on the capabilities and configuration of this 3670 server, and save it for the CONNECTACK message. If the 3671 results of the TLS negotiation result in a connection rejec- 3672 tion, then go immediately to send the CONNECTACK message. 3674 The possibilities are: 3676 CONNECT CONNECTACK 3677 TLS-request TLS-reply 3679 Reject 3680 req acc t1 Reason Comments 3681 --- --- -- ------ -------- 3682 0 0 0 3683 0 0 1 11 receiver requires TLS 3684 0 1 0 3685 0 1 1 3686 1 0 - request doesn't make sense 3687 1 1 0 3688 1 1 1 3689 2 0 - request doesn't make sense 3690 2 1 0 9 or 10 receiver won't do TLS 3691 2 1 1 3693 4. Check to see if there is a message-digest option in the CON- 3694 NECT message. If there was, and the server does not support 3695 message-digests, then reject the connection with the appropri- 3696 ate reject-reason in the CONNECTACK. 3698 5. Determine if the sender (from the sending-server-IP-address 3699 option) and the implicit role of the sender (i.e., primary) 3700 represents a server with which the receiver was configured to 3701 engage in failover activity. This is performed after the any 3702 TLS processing so that it occurs after a secure connection is 3703 created, to ensure that there is no tampering with the IP 3704 address of the partner. 3706 If not, then the receiving server should reject the CONNECT 3707 request by sending a CONNECTACK message with a reject-reason 3708 value of: 8, invalid failover partner. 3710 If it is, then the receiving failover endpoint should be 3711 determined. 3713 6. Decide if the time delta between the sending of the message, 3714 in the time field, and the receipt of the message, recorded in 3715 step 1 above, is acceptable. A server MAY require an arbi- 3716 trarily small delta in time values in order to set up a fail- 3717 over connection with another server. See section 5.9 for 3718 information on time synchronization. 3720 If the delta between the time values is too great, the server 3721 should reject the CONNECT request by sending a CONNECTACK mes- 3722 sage with a reject-reason of 4, time mismatch too great. 3724 If the time mismatch is not considered too great then the 3725 receiving server MUST record the delta between the servers. 3726 The receiving server MUST use this delta to correct all of the 3727 absolute times received from the other server in all time- 3728 valued options. Note that server's can participate in fail- 3729 over with arbitrarily great time mismatches, as long as it is 3730 more or less constant. 3732 7. If the receiving server is a secondary server, it MUST examine 3733 the MCLT option in the CONNECT request and use the value of 3734 the MCLT as the MCLT for this failover endpoint. 3736 A receiving secondary server SHOULD be able to operate with 3737 any MCLT sent by the primary, but if it cannot, then it 3738 should send a CONNECTACK with a reject-reason of 5, MCLT 3739 mismatch. 3741 8. The server MUST store hash-bucket-assignment option for use 3742 during processing during NORMAL state. If this hash bucket 3743 assignment conflicts with the secondary server's configured 3744 hash bucket assignment for use in other than NORMAL state, the 3745 secondary server should send a CONNECTACK with a reject reason 3746 of 19, Hash bucket assignment conflict. 3748 9. The receiving server MAY use the vendor-class-identifier to do 3749 vendor specific processing. 3751 7.9. CONNECTACK message 3753 The CONNECTACK message is sent to accept or reject a CONNECT message. 3754 It is sent by the secondary server which received a CONNECT message. 3756 Attempting immediately to reconnect after either receiving a CONNEC- 3757 TACK with a reject-reason or after sending a CONNECTACK with a 3758 reject-reason could yield unwanted looping behavior, since the reason 3759 that the connection was rejected may well not have changed since the 3760 last attempt. A simple suggested solution is to wait a minute or two 3761 after sending or receiving a CONNECTACK message with a reject-reason 3762 before attempting to reestablish communication. 3764 7.9.1. Sending the CONNECTACK message 3766 The xid of the CONNECTACK message MUST be that of the corresponding 3767 CONNECT message. 3769 The IP address of the sending server MUST be placed in the sending- 3770 server-IP-address option. This information is placed in an option 3771 inside of the message in order to allow the identity of the sender to 3772 be covered by a shared secret. 3774 The protocol-version option MUST be included in every CONNECTACK mes- 3775 sage. The current value of the protocol version is 1. 3777 If the connection has been rejected, the reject-reason option MUST be 3778 placed in the CONNECTACK message with an appropriate reason, and a 3779 message option SHOULD be included with a human-readable error message 3780 describing the reason for the rejection in some detail. If the 3781 reject-reason option appears, then the remaining options listed below 3782 do not appear. The sending server should close the connection after 3783 sending the CONNECTACK if the connection was rejected. 3785 The results of the TLS negotiation MUST be placed in the TLS-reply 3786 option. If this CONNECTACK message is being sent over an already TLS 3787 secured connection, then there MUST NOT be a TLS-reply option. 3789 If there was a message-digest option in the CONNECT message, then 3790 there MUST be a message-digest in the CONNECTACK message and any sub- 3791 sequent messages if the CONNECTACK does not contain a reject-reason. 3793 The number of BNDUPD messages the server can accept without blocking 3794 the TCP connection MUST be placed in the max-unacked-bndupd option. 3795 This SHOULD be a number greater than 10, and SHOULD be a number less 3796 than 100. 3798 The length of the receive timer (tReceive, see section 8.3) MUST be 3799 placed in the receive-timer option. 3801 The vendor class identifier MUST be placed in the vendor-class- 3802 identifier option. 3804 If the server is rejecting the CONNECT message, then the reject- 3805 reason option MUST appear. A message option SHOULD appear to give a 3806 human readable version of the rejection reason. 3808 After a connection is created (either by sending a CONNECTACK message 3809 to the first CONNECT message, or sending a CONNECTACK message to a 3810 CONNECT message received over a TLS connection), the server MUST send 3811 a STATE message. 3813 After a connection is created, the server MUST start two timers for 3814 the connection: tSend and tReceive. The tSend timer SHOULD be 3815 approximately 33 percent of the time in the receiver-timer option in 3816 the corresponding CONNECT message. The tReceive timer SHOULD be the 3817 time sent in the receiver-timer option in the CONNECTACK message. 3819 The tReceive timer is reset whenever a message is received from this 3820 TCP connection. If it ever expires, the TCP connection is dropped 3821 and communications with this partner is considered not ok. 3823 The tSend timer is reset whenever a message is sent over this connec- 3824 tion. When it expires, a CONTACT message MUST be sent. 3826 7.9.2. Receiving the CONNECTACK message 3828 If a CONNECTACK message is received with a different XID from the one 3829 in the CONNECT that was sent, it SHOULD be ignored. 3831 When a CONNECTACK message is received, the following actions should 3832 be taken: 3834 1. Record the time the message was received. 3836 2. Check to see if there is a reject-reason option in the CONNEC- 3837 TACK message. If not, continue with step 3. If there is a 3838 reject-reason option, the server SHOULD report the error code. 3839 If a message option appears a server SHOULD display the string 3840 from the message option in a user visible way. The server 3841 MUST close the connection if a reject-reason option appears. 3843 3. Check to see if the xid on the CONNECTACK matches an outstand- 3844 ing CONNECT message on this TCP connection. 3846 4. Check the value of the TLS-reply option, and if it was 1, then 3847 skip processing of the rest of the CONNECTACK message, and 3848 immediately enter into TLS connection setup. 3850 If it does not, a server SHOULD report an error. 3852 This step occurs prior to steps 5 and 6 in order to allow 3853 creation of a secure connection (if required) prior to pro- 3854 cessing the protocol version and IP address information. 3856 5. Examine the value of the protocol-version option. If this 3857 server is able to establish connections with another server 3858 running this protocol version, then continue, else close the 3859 connection. 3861 6. Decide if the time delta between the sending of the message, 3862 in the time field, and the receipt of the message, recorded in 3863 step 1 above, is acceptable. A server MAY require an arbi- 3864 trarily small delta in time values in order to set up a fail- 3865 over connection with another server. 3867 If the delta between the time values is too great, the server 3868 should drop the TCP connection. 3870 If the time mismatch is not considered too great then the 3871 receiving server MUST record the delta between the servers. 3872 The receiving server MUST use this delta to correct all of the 3873 absolute times received from the other server in all time- 3874 valued options. Note that the failover protocol is con- 3875 structed so that two servers can be failover partners with 3876 arbitrarily great time mismatches. 3878 7. If the receiving server is a secondary server, it MUST examine 3879 the MCLT option in the CONNECT request and use the value of 3880 the MCLT as the MCLT for this failover endpoint. 3882 A receiving secondary server SHOULD be able to operate with 3883 any MCLT sent by the primary, but if it cannot, then it MUST 3884 drop the TCP connection. 3886 8. If the receiving server is a secondary server, it MUST store 3887 the hash-bucket-assignment option for use during processing 3888 during NORMAL state. If this hash bucket assignment conflicts 3889 with the server's configured hash bucket assignment for use in 3890 other than NORMAL state, the secondary server should send a 3891 CONNECTACK with a reject reason of 19, Hash bucket assignment 3892 conflict. 3894 9. The receiving server MAY use the vendor-class-identifier to do 3895 vendor specific processing. 3897 10. After accepting a CONNECTACK message, the server MUST send a 3898 STATE message. 3900 After receiving a CONNECTACK message, the server MUST start 3901 two timers for the connection: tSend and tReceive. The tSend 3902 timer SHOULD be approximately 20 percent of the time in the 3903 receiver-timer option in the corresponding CONNECTACK message. 3904 The tReceive timer SHOULD be set to the time sent in the 3905 receiver-timer option in the CONNECT message. 3907 The tReceive timer is reset whenever a message is received 3908 from this TCP connection. If it ever expires, the TCP connec- 3909 tion is dropped and communications with this partner is con- 3910 sidered not ok. 3912 The tSend timer is reset whenever a message is sent over this 3913 connection. When it expires, a CONTACT message MUST be sent. 3915 7.10. STATE message 3917 The state (STATE) message is used to communicate the current failover 3918 state to the partner server. 3920 The STATE message MUST be sent after sending a CONNECTACK message 3921 that didn't contain a reject-reason option, and MUST be sent after 3922 receiving a CONNECTACK message without a reject-reason option. 3924 A STATE message MUST be sent whenever the failover endpoint changes 3925 its failover state and a connection exists to the partner. 3927 The STATE message requires no response from the failover partner. 3929 7.10.1. Sending the STATE message 3931 The current failover state is placed in the server-state option and 3932 the current state of the STARTUP flag is placed in the server-flags 3933 option. 3935 The message is sent with a unique xid. 3937 A server SHOULD only send the STATE message either when the connec- 3938 tion is created (i.e, after sending or receiving a CONNECTACK message 3939 with no reject-reason option), or when there is a change from the 3940 values sent in a previous STATE message. 3942 7.10.2. Receiving the STATE message 3944 Every STATE message SHOULD indicate a change in state or a change in 3945 the flags. 3947 When a STATE message is received, any state transitions specified in 3948 section 9 are taken. 3950 No response to a STATE message is required. 3952 7.11. CONTACT message 3954 The contact (CONTACT) message is sent to verify communications 3955 integrity with a failover partner. The CONTACT message is sent when 3956 no messages have been sent to the failover partner for a specified 3957 period of time. This is determined by the tSend timer expiring (see 3958 section 8.3). 3960 7.11.1. Sending the CONTACT message 3962 The CONTACT message is sent. 3964 7.11.2. Receiving the CONTACT message 3966 When a CONTACT message is received, the tReceive timer is reset (as 3967 it is with any message that is received). 3969 A server MAY use the time in the time field and the time recorded 3970 above to refine the delta time calculations between the servers. 3972 7.12. DISCONNECT message 3974 The DISCONNECT is the last message sent over a connection before 3975 dropping an established connection. 3977 After sending or receiving a DISCONNECT message, a server needs to 3978 have some mechanism to prevent an error loop. Simply reconnecting to 3979 the partner immediately is not the best option, especially after 3980 several consecutive attempts. 3982 A simple suggested solution is to wait a minute or two after sending 3983 or receiving a DISCONNECT before attempting to reestablish communica- 3984 tion. 3986 7.12.1. Sending the DISCONNECT message 3988 The DISCONNECT message MUST be the last message sent by the a server 3989 which is dropping a TCP connection. 3991 The xid of the DISCONNECT message must be unique. 3993 The reject-reason option MUST appear giving a reason why the connec- 3994 tion was dropped. A message option SHOULD appear giving a human 3995 readable error message with possibly more details. 3997 7.12.2. Receiving the DISCONNECT message 3999 When a server receives a DISCONNECT message it should log the message 4000 if there was one and possibly raise an alarm of some sort if the 4001 reject reason was one that was sufficiently serious. 4003 8. Connection Management 4005 Servers participating in the failover protocol communicate over TCP 4006 connections. These TCP connections are used both to transmit bind- 4007 ing information from one server to another as well as to allow each 4008 server to determine whether communications is possible with the other 4009 server. 4011 Central to the operation of the failover protocol is a notion of 4012 "communications okay" or "communications failed". Failover state 4013 transitions are taken in many cases when the status of communications 4014 with the partner changes, and the existence or non-existence of a TCP 4015 connections between failover endpoints is used to determine if com- 4016 munications is "okay" or "failed". 4018 A single TCP connection exists which connects two failover endpoints. 4020 8.1. Connection granularity 4022 There exists one TCP connection between each set of failover end- 4023 points. See section 5.1.1 for an explanation of failover endpoint. 4025 There are a maximum of two TCP connections between any two servers 4026 implementing the failover protocol, one for each of the possible 4027 failover endpoints between these two servers. There is a minimum of 4028 one TCP connection between one server and every other failover server 4029 with which it implements the failover protocol. 4031 8.2. Creating the TCP connection 4033 Every server implementing the failover protocol MUST listen on port 4034 647 for incoming failover TCP connections. The source port of the 4035 TCP connection is unimportant. 4037 Every server implementing the failover protocol SHOULD attempt to 4038 connect to all of its partners periodically, where the period is 4039 implementation dependent and SHOULD be configurable. In the event 4040 that a connection has been rejected by a CONNECTACK message with a 4041 reject-reason option contained in it or a DISCONNECT message, a 4042 server SHOULD r educe the frequency with which it attempts to connect 4043 to that server but it SHOULD continue to attempt to connect periodi- 4044 cally. 4046 Once a connection is established, the primary server MUST send a CON- 4047 NECT message across the connection. A secondary server MUST wait for 4048 the CONNECT message from a primary server. 4050 Every CONNECT message includes a TLS-request option, and if the CON- 4051 NECTACK message does not reject the CONNECT message and the TLS-reply 4052 option says TLS MUST be used, then the servers will immediately enter 4053 into TLS negotiation. 4055 Once TLS negotiation is complete, the primary server MUST resend the 4056 CONNECT message on the newly secured TLS connection and then wait for 4057 the CONNECTACK message in response. The TLS-request and TLS-reply 4058 options MUST have the same values in this second CONNECT and CONNEC- 4059 TACK message as they had in the first messages. 4061 The second message sent over a new connection (either a bare TCP con- 4062 nection or a connection utilizing TLS) is a STATE message. Upon the 4063 receipt of this message, the receiver can consider communications up. 4065 It is entirely possible that two servers will attempt to make connec- 4066 tions to each other essentially simultaneously, and in this case the 4067 secondary server will be waiting for a CONNECT message on each con- 4068 nection. The primary server MUST send a CONNECT message over one 4069 connection and it MUST close the other connection. 4071 A secondary server MUST NOT respond to the closing of a TCP connec- 4072 tion with a blind attempt to reconnect -- there may be another TCP 4073 connection to the same failover partner already in use. 4075 8.3. Using the TCP connection for determining communications status 4077 The TCP connection is used to determine the communications status of 4078 the other server, i.e., communications-ok, or communications- 4079 interrupted. 4081 Three things must happen for a server to consider that communications 4082 are ok with respect to another server: 4084 1. A TCP connection must be established to the other server. 4086 2. A CONNECT message must be received and a CONNECTACK message 4087 sent in response. The CONNECT message is used to determine 4088 the identify of the failover endpoint of the other end of the 4089 TCP connection -- without it, the failover endpoint cannot be 4090 uniquely determined. Without knowledge of the failover end- 4091 point, then the entity with which communications is ok is 4092 undetermined. 4094 3. A STATE message must be received from the other server over 4095 the connection. This STATE message initializes important 4096 information necessary to the operation of the state machine 4097 the governs the behavior of this failover endpoint. 4099 There are two ways that a server can determine that communications 4100 has failed: 4102 1. The TCP connection can go down, yielding an error when 4103 attempting to send or receive a message. This will happen at 4104 least as often as the period of the tSend timer. 4106 2. The tReceive timer can expire. 4108 In either of these cases, communications is considered interrupted. 4110 Several difficulties arise when trying to use one TCP connection for 4111 both bulk data transfer as well as to sense the communications status 4112 of the other server. One aspect of the problem stems from the dif- 4113 ferent requirements of both uses. The bulk data transfer is of 4114 course critically important to the protocol, but the speed with which 4115 it is processed is not terribly significant. It might well be 4116 minutes before a BNDUPD message is processed, and while not optimal, 4117 such an occasional delay doesn't compromise the correctness of the 4118 protocol. However, the speed with which one server detects the other 4119 server is up (or, more importantly, down) is more highly constrained. 4120 Generally one server should be able to detect that the other server 4121 is not communicating within a minute or less. 4123 These differing time constraints makes it difficult to use the same 4124 TCP connection for data transfer as well as to sense communications 4125 integrity. See section 3.5 for additional details on TCP. 4127 The solution to this problem is to require that some message be 4128 received by each end of the connection within a limited time or that 4129 the connection will be considered down. If no messages have been 4130 sent recently, then a CONTACT message is sent. 4132 In the case where there is no data queued to be sent, this is not a 4133 problem, but in the case where there is data queued to be sent to the 4134 partner, then the CONTACT message will not actually be transmitted 4135 until the queued data is sent. Section 3.5 explains why waiting for 4136 TCP to determine that the connection is down is not acceptable, and 4137 leads a requirement that the receiving server never block the sending 4138 server from sending CONTACT messages. 4140 In order to meet this requirement, each server tells the other server 4141 the number of outstanding BNDUPD messages that it will accept. The 4142 receiving server is required to always be able to accept that many 4143 BNDUPD messages off of the connection's input queue even if it cannot 4144 process them immediately, and to accept all other messages immedi- 4145 ately. 4147 Thus, the sending server's TCP is never blocked from sending a mes- 4148 sage except for very short periods, less than a few seconds unless 4149 the network connection itself has problems. In this case, if the 4150 CONTACT messages don't make it to the partner then the partner will 4151 close the connection. 4153 DISCUSSION: 4155 When implementing this capability, one needs to be careful when 4156 sending any message on the TCP connection as TCP can easily block 4157 the server if the local TCP send buffers are full. This can't be 4158 prevented because if the receiver is not reachable (via the net- 4159 work), the sending TCP can't send and thus it will be unable to 4160 empty the local TCP send buffers. So, all send operations either 4161 need to assume they may block for some time or non-blocking sends 4162 must be used. 4164 8.4. Using the TCP connection for binding data 4166 Binding data, in the form of BNDUPD messages and BNDACK messages to 4167 respond to them, are sent across the TCP connection. 4169 In order to support timely detection of any failure in the partner 4170 server, the TCP connection MUST NOT block for more than a very short 4171 time, on the order of a few seconds. Therefore, a server that is 4172 sending BNDUPD messages MUST send only a restricted number before 4173 receiving BNDACK messages about previous messages sent. 4175 The number of outstanding BNDUPD messages that each server will 4176 accept without causing TCP to block transmission of additional data 4177 (i.e, CONTACT messages) is sent by each server in the CONNECT and 4178 CONNECTACK messages in the max-unacked-bndupd option. 4180 8.5. Using the TCP connection for control messages 4182 The TCP connection is used for control messages: POOLREQ, UPDREQ, 4183 STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL- 4184 RESP, UPDDONE. A server MUST immediately accept all of these mes- 4185 sages from the TCP connection. A server MUST immediately accept any 4186 BNDACK which is received as well. 4188 8.6. Losing the TCP connection 4190 When the TCP connection is lost, then communications is not ok with 4191 the other server. A server which has lost communications SHOULD 4192 immediately attempt to reconnect to the other server, and should 4193 retry these connection attempts periodically. 4195 A BNDACK message can only be sent in response to a BNDUPD message 4196 using the same TCP connection from which the BNDUPD message was 4197 received, since the XID's in BNDUPD messages are guaranteed unique 4198 only during the life of a single TCP connection. When a connection 4199 to a partner server goes down, a server with unprocessed BNDUPD mes- 4200 sages MAY simply drop all of those messages, since it can be sure 4201 that the partner will retransmit them when they are next in communi- 4202 cations. A server with unprocessed BNDUPD messages when a TCP con- 4203 nection goes down MAY instead choose to process those BNDUPD mes- 4204 sages, but it MUST NOT send any BNDACK messages in response (again 4205 because of the issues surrounding XID uniqueness). 4207 When the TCP connection is closed explicitly, the DISCONNECT message 4208 with a reject-reason option (and, ideally, a message option) MUST be 4209 sent over the TCP connection. 4211 9. Protocol States 4213 This section discusses the various states that a failover endpoint 4214 may take, and the server actions required when entering the state, 4215 operating in the state, and leaving the state, as well as the events 4216 that cause transitions out of the state into another state. 4218 The state transition diagram in Figure 9.2-1 is relevant for this 4219 section. This is the common state transition diagram for both servers 4220 in a failover pair. In the event that the textual description of a 4221 state differs from the state transition diagram, the textual descrip- 4222 tion is to be considered authoritative. 4224 9.1. Server Initialization 4226 When a server starts it starts out in STARTUP state. See section 9.4 4227 below for details. 4229 9.2. Server State Transitions 4231 Whenever a server transitions into a new state, it MUST record the 4232 state and the time at which it entered that state in stable storage. 4233 If communications is "ok", it MUST also send a STATE message to its 4234 failover partner. 4236 Figure 9.2-1 is the diagram of the server state transitions. The 4237 remainder of this section contains information important to the 4238 understanding of that diagram. 4240 The server stays in the current state until all of the actions speci- 4241 fied on the state transition are complete. If communications fails 4242 during one of the actions, the server simply stays in the current 4243 state and attempts a transition whenever the conditions for a transi- 4244 tion are later fulfilled. 4246 In the state transition diagram below, the "+" or "-" in the upper 4247 right corner of each state is a notation about whether communication 4248 is ongoing with the other server. 4250 The legend "responsive", "balanced", or "unresponsive" in each state 4251 indicates whether the server is responsive to all DHCP client 4252 requests, running in load balanced mode, or totally unresponsive in 4253 the respective state. The terms "responsive" and "unresponsive" have 4254 the obvious meanings, while "balanced" means that a DHCP server may 4255 respond to all DHCPREQUEST messages that are RENEWAL or REBINDING, 4256 and to all other messages from clients for which the load balancing 4257 algorithm indicates that it MUST respond to. See sections 5.3 and 4258 9.6.2 for details on load balancing. 4260 In the state transition diagram below, when communication is reesta- 4261 blished between the two servers, each must record the state of the 4262 partner when communication was restored. State transitions on one 4263 server in some cases imply state transitions on the partner server, 4264 so a record of the current state of the partner server must be kept 4265 by each server. 4267 If the state of the partner changes while communicating a server 4268 moves through the communications-failed transition and into whatever 4269 state results. It then immediately moves through whatever state 4270 transition is appropriate given the current state of the partner 4271 server. A server performing this operation SHOULD NOT close the TCP 4272 connection to its partner. 4274 DISCUSSION: 4276 The point of this technique is simplicity, both in explanation of 4277 the protocol and in its implementation. The alternative to this 4278 technique of memory of partner state and automatic state transi- 4279 tion on change of partner state is to have every state in the fol- 4280 lowing diagram have a state transition for every possible state of 4281 the partner. With the approach adopted, only the states in which 4282 communications are reestablished require a state transition for 4283 each possible partner state. 4285 The current state of a server MUST be recorded in stable storage and 4286 thus be available to the server after a server restart. 4288 +---------------+ V +--------------+ 4289 | RECOVER - | | | STARTUP - | 4290 |(unresponsive) | +->|(unresponsive)| 4291 +---------------+ +--------------+ 4292 Comm. OK +-----------------+ 4293 Other State:-RECOVER | PARTNER DOWN - |<-----------------+ 4294 | | | (responsive) | | 4295 All POTENTIAL- +-----------------+ +--------------+ | 4296 Others CONFLICT------------ | --------+ | RESOLUTION | | 4297 | Comm. OK | | INTERRUPTED | | 4298 UPDREQ(ALL) Other State: | +-| (responsive) | | 4299 Wait UPDDONE | | | | +--------------+ | 4300 Wait MCLT from fail RECOVER All Others| Comm. OK ^ | | 4301 +--------------+ | V V V | Ext. | 4302 |RECOVER-DONE +| +--+ +--------------+ Comm. Cmd. | 4303 |(unresponsive)| | | POTENTIAL + | Failed | | 4304 +--------------+ Wait for +>| CONFLICT |------+ +-->| 4305 Comm. OK Other | |(unresponsive)|<--------+ | 4306 +--Other State:-+ State: | +--------------+ | | 4307 | | | RECOVER | | | | 4308 | All POTENT. DONE | Resolve Conflict | | 4309 | Others: CONFLICT-- | ----+ (see 9.8) | | 4310 | Wait for V V | | 4311 | Other State: NORMAL +-----------------+ | | 4312 | V | NORMAL + | External | | 4313 | +--+----------+-->| (balanced) |-Command---+-- | -----+ 4314 | ^ ^ +-----------------+ | | 4315 | | | | | | 4316 | Wait for Comm. OK Comm. External | 4317 | Other Other Failed Command | 4318 | State: State: | or | | 4319 |RECOVER-DONE NORMAL Start Safe Safe | | 4320 | | COMM. INT. Period Timer Period | | 4321 | Comm. OK. | V expiration | 4322 | Other State: | +------------------+ | | 4323 | RECOVER +--| COMMUNICATIONS - |-----------+ | 4324 V +-------------| INTERRUPTED | Comm. OK | 4325 RECOVER | (responsive) |--Other State:-+ 4326 RECOVER-DONE--------->+------------------+ All Others 4328 Figure 9.2-1: Server state diagram. 4330 9.3. STARTUP state 4332 The STARTUP state affords an opportunity for a server to probe its 4333 partner server, before starting to service DHCP clients. 4335 DISCUSSION: 4337 Without the STARTUP state, a server would likely start in a state 4338 derived from its previously stored state (held in stable storage), 4339 if any. However, this may be inconsistent with the current state 4340 of the partner. The STARTUP state affords the opportunity for a 4341 server to potentially learn the partner's state and determine if 4342 that state is consistent with its derived starting state or 4343 whether some significant state change has occurred at the partner 4344 that forces the server to start in another state. This is 4345 especially critical if significant time has elapsed while the 4346 server was down. 4348 9.3.1. Operation while in STARTUP state 4350 Whenever a server is in STARTUP state, it MUST be unresponsive to 4351 DHCP client requests, and so the time spent in the STARTUP state is 4352 necessarily short, typically on the order of a few seconds to a few 4353 tens of seconds. The exact time spent in the STARTUP state is imple- 4354 mentation dependent, and the primary and secondary server are not 4355 required to spend the same amount of time in the STARTUP state. 4357 Whenever a STATE message is sent to the partner while in STARTUP 4358 state the STARTUP bit MUST be set in the server-flags option and the 4359 previously recorded failover state MUST be placed in the server-state 4360 option. 4362 9.3.2. Transition out of STARTUP state 4364 Each server starts out in startup state every time it initializes 4365 itself, and performs the following algorithm as part of its initiali- 4366 zation: 4368 1. Is there any record in stable storage of a previous failover 4369 state? If yes, set previous-state to the last recorded state 4370 in stable storage, and continue with step 2. 4372 Is there any configuration information that indicates that 4373 this server was previously running but lost its stable 4374 storage? Such information must typically come from some 4375 administrative intervention, since it is difficult for a 4376 server to distinguish first startup from a startup after it 4377 has lost its stable storage. If yes, then set the previous- 4378 state to RECOVER, and set the time-of-failure to whatever time 4379 was configured, and go on to step 2. This time-of-failure 4380 will be used in the transition out of the RECOVER state into 4381 the RECOVER-DONE state, below. 4383 If there is no record of any previous failover state in stable 4384 storage nor of any previous operational activity for this 4385 server, then set the previous-state to PARTNER-DOWN if this 4386 server is a primary and RECOVER if this server is a secondary, 4387 and set the time-of-failure to a time before the maximum- 4388 client-lead-time before now. If using standard Posix times, 0 4389 would typically do quite well. 4391 2. Is the previous-state NORMAL? If yes, set the previous-state 4392 to COMMUNICATIONS-INTERRUPTED. 4394 3. Start the STARTUP state timer. The time that a server remains 4395 in the STARTUP state (absent any communications with its 4396 partner) is implementation dependent and SHOULD be configur- 4397 able. It SHOULD be long enough to for a TCP connection to be 4398 created to a heavily loaded partner across a slow network. 4400 4. Attempt to create a TCP connection to the failover partner. 4401 See section 8.2. 4403 5. Wait for "communications okay", i.e., the process discussed in 4404 section 8.2 "Creating the TCP Connection", to complete, 4405 including the receipt of a STATE message from the partner. 4407 When and if communications become "okay", clear the STARTUP 4408 flag, and set the current state to the previous-state. 4410 If the partner is in PARTNER-DOWN state, and if the time at 4411 which it entered PARTNER-DOWN state (as received in the 4412 start-time-of-state option in the STATE message) is later than 4413 the last recorded time of operation of this server, then set 4414 the current state to RECOVER. If the time at which it entered 4415 PARTNER-DOWN state is earlier than the last recorded time of 4416 operation of this server, then set the current state to 4417 POTENTIAL-CONFLICT. 4419 Then, transition to the current state and take the "communica- 4420 tions okay" state transition based on the current state of 4421 this server and the partner. 4423 7. If the startup time expires, take an implementation dependent 4424 action: The server MAY go to the previous-state, or the 4425 server MAY wait. 4427 Reasons to go to previous-state and begin processing: 4429 If the current server is the only operational server, then if 4430 it waits, there will be no operational DHCP servers. This 4431 situation could occur very easily where one server fails and 4432 then the other crashes and reboots. If the rebooting server 4433 doesn't start processing DHCP client requests without first 4434 being in communication with the other server, then the level 4435 of DHCP redundancy is not particularly high. This is an 4436 appropriate approach if the possibility of partition is low, 4437 or if the safe period expiration time is well beyond the time 4438 at which an operator would notice and react to a partition 4439 situation. It is also quite appropriate if the safe period 4440 will never expire. 4442 Reasons to wait: 4444 If the current server has been down for longer than the 4445 maximum-client-lead-time, and it is partitioned from the other 4446 server, then when it returns it will attempt to use its own 4447 available addresses to allocate to new DHCP clients, and the 4448 other server may well be in PARTNER-DOWN state and may have 4449 already allocated some of those available addresses to DHCP 4450 clients. In cases where the possibility of partition is high, 4451 and the safe period expiration time is less than the likely 4452 operator reaction time, this is a good approach to use. 4454 9.4. PARTNER-DOWN state 4456 PARTNER-DOWN state is a state either server can enter. When in this 4457 state, the server does not assume that the other server could still 4458 be operating and servicing a different set of clients, but instead 4459 assumes that it is the only server operating. If one server is in 4460 PARTNER-DOWN state, the other server MUST NOT be operating. 4462 9.4.1. Upon entry to PARTNER-DOWN state 4464 No special actions are required when entering PARTNER-DOWN state. 4466 The server should continue to attempt to connect to the partner 4467 periodically. 4469 9.4.2. Operation while in PARTNER-DOWN state 4471 A server in PARTNER-DOWN state MUST respond to DHCP client requests. 4472 It will allow renewal of all outstanding leases on IP addresses, and 4473 will allocate IP addresses from its own pool, and after a fixed 4474 period of time (the MCLT interval) has elapsed from entry into 4475 PARTNER-DOWN state, it will allocate IP addresses from the set of all 4476 available IP addresses. 4478 Once a server has entered NORMAL state, the PARTNER-DOWN state is 4479 entered only on command of an external agency (typically an adminis- 4480 trator of some sort) or after the expiration of an externally config- 4481 ured minimum safe-time after the beginning of COMMUNICATIONS- 4482 INTERRUPTED state. 4484 Any available IP address tagged as belonging to the other server (at 4485 entry to PARTNER-DOWN state) MUST NOT be used until the maximum- 4486 client-lead-time beyond the entry into PARTNER-DOWN state has 4487 elapsed. 4489 A server in PARTNER-DOWN state MUST NOT allocate an IP address to a 4490 DHCP client different from that to which it was allocated at the 4491 entrance to PARTNER-DOWN state until the maximum-client-lead-time 4492 beyond the maximum of the following times: client expiration time, 4493 most recently transmitted potential-expiration-time, most recently 4494 received ack of potential-expiration-time from the partner, and most 4495 recently acked potential-expiration-time to the partner. See section 4496 7.1.4 for details. If this time would be earlier than the current 4497 time plus the maximum-client-lead-time, then the time the server 4498 entered PARTNER-DOWN state plus the maximum-client-lead-time is used. 4500 Two options exist for lease times given out while in PARTNER-DOWN 4501 state, with different ramifications flowing from each. 4503 If the server wishes the Failover protocol to protect it from loss of 4504 stable storage in PARTNER-DOWN state, then it should ensure that the 4505 MCLT based lease time restrictions in Section 5.1 are maintained, 4506 even in PARTNER-DOWN state. 4508 If the server wishes to forego the protection of the Failover proto- 4509 col in the event of loss of stable storage, then it need recognize no 4510 restrictions on actual client lease times while in PARTNER-DOWN 4511 state. 4513 A server in PARTNER-DOWN state MUST continue to attempt to establish 4514 communications and synchronization with its partner. 4516 9.4.3. Transitions out of PARTNER-DOWN state 4518 When a server in PARTNER-DOWN state succeeds in establishing a con- 4519 nection to its partner, its actions are conditional on the state and 4520 flags received in the STATE message from the other server as part of 4521 the process of establishing the connection. 4523 If the STARTUP bit is set in the server-flags option of a received 4524 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 4525 transitions based on reestablishing communications. Essentially, if a 4526 server is in PARTNER-DOWN state, it ignores all STATE messages from 4527 its partner that have the STARTUP bit set in the server-flags option 4528 of the STATE message. 4530 If the STARTUP bit is not set in the server-flags option of a STATE 4531 message received from its partner, then a server in PARTNER-DOWN 4532 state takes the following actions based on the value of the server- 4533 state option in the received STATE message: 4535 o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN or 4536 POTENTIAL-CONFLICT state 4538 transition to POTENTIAL-CONFLICT state 4540 o partner in RECOVER state 4542 stay in PARTNER-DOWN state 4544 o partner in RECOVER-DONE state 4546 transition into NORMAL state 4548 9.5. RECOVER state 4550 This state indicates that the server has no information in its stable 4551 storage or that it is re-integrating with a server in PARTNER-DOWN 4552 state after it has been down. A server in this state will attempt to 4553 refresh its stable storage from the other server. 4555 9.5.1. Operation in RECOVER state 4557 A server in RECOVER MUST NOT respond to DHCP client requests. 4559 A server in RECOVER state will attempt to reestablish communications 4560 with the other server. 4562 9.5.2. Transitions out of RECOVER state 4564 If the other server is in POTENTIAL-CONFLICT state when communica- 4565 tions are reestablished, then the server in RECOVER state will move 4566 to POTENTIAL-CONFLICT state itself. 4568 If the other server is in RECOVER state, then this server SHOULD sig- 4569 nal an error and halt processing. 4571 If the other server is in any other state, then the server in RECOVER 4572 state will request an update of missing binding information by send- 4573 ing an UPDREQ message. If the server has been instructed (through 4574 configuration or other external agency) that it has lost its stable 4575 storage, it MUST send an UPDREQALL message, otherwise it MUST send an 4576 UPDREQ message. 4578 It will wait for an UPDDONE message, and upon receipt of that message 4579 it will start a timer whose expiration is set to a time equal to the 4580 time the server went down (if known) or the current time (if the 4581 down-time is unknown) plus the maximum-client-lead-time. When this 4582 timer goes off, the server will transition into RECOVER-DONE state. 4583 This is to allow any IP addresses that were allocated by this server 4584 prior to loss of its client binding information in stable storage to 4585 contact the other server or to time out. 4587 See Figure 9.5.2-1. 4589 DISCUSSION: 4591 The actual requirement on this wait period in RECOVER is that it 4592 start when the recovering server went down, not necessarily when 4593 it came back up. If the time when the recovering server failed is 4594 known, it could be communicated to the recovering server (perhaps 4595 through actions of the network administrator), and the wait period 4596 could be reduced to the maximum-client-lead-time less the differ- 4597 ence between the current time and the time the server failed. In 4598 this way, the waiting period could be minimized. 4600 If an UPDDONE message isn't received within an implementation depen- 4601 dent amount of time, and no BNDUPD message are being received, then 4602 the UPDREQ(ALL) message will be re-transmitted. 4604 A B 4605 Server Server 4607 | | 4608 RECOVER PARTNER-DOWN 4609 | | 4610 | >--UPDREQ--------------------> | 4611 | | 4612 | <---------------------BNDUPD--< | 4613 | >--BNDACK--------------------> | 4614 ... ... 4615 | | 4616 | <---------------------BNDUPD--< | 4617 | >--BNDACK--------------------> | 4618 | | 4619 | <--------------------UPDDONE--< | 4620 | | 4621 Wait MCLT from last known | 4622 time of operation | 4623 | | 4624 RECOVER-DONE | 4625 | | 4626 | >--STATE-(RECOVER-DONE)------> | 4627 | NORMAL 4628 | <-------------(NORMAL)-STATE--< | 4629 NORMAL | 4630 | | 4631 | | 4633 Figure 9.5.2-1: Transition out of RECOVER state 4635 9.6. NORMAL state 4637 NORMAL state is the state used by a server when it is communicating 4638 with the other server, and any required resynchronization has been 4639 performed. While some bindings database synchronization is performed 4640 in NORMAL state, potential conflicts are resolved prior to entry into 4641 NORMAL state as is binding database data loss. 4643 9.6.1. Upon Entry to NORMAL state 4645 When entering NORMAL state, a server will send to the other server 4646 all currently unacknowledged binding updates as BNDUPD messages. 4648 When the above process is complete, if the server entering NORMAL 4649 state is a secondary server, then it will request IP addresses for 4650 allocation using the POOLREQ message. 4652 9.6.2. Processing DHCP client requests and load balancing 4654 When in NORMAL state, each server MUST process all requests from some 4655 DHCP clients, and MUST NOT process any request other than a 4656 DHCPREQUEST/RENEWAL or a DHCPREQUEST/REBINDING request from some 4657 other DHCP clients. 4659 However, if the load balancing algorithm specified in [LOADB] is used 4660 with a pair of servers implementing the failover protocol, then each 4661 server needs to test each incoming DHCP client request to see if it 4662 should process that request. 4664 As discussed in section 5.3, each server will take the client- 4665 identifier from each DHCP client request (or the client-hardware- 4666 address, i.e., the htype concatenated to the front of the chaddr if 4667 no client-identifier is present in the request) and use it as the 4668 'Request ID' specified in [LOADB]. After applying the algorithm 4669 specified in [LOADB] and comparing the result with the hash bucket 4670 assignment (performed during connect processing between failover 4671 servers), each failover server will be able to unambiguously deter- 4672 mine if it should processes the DHCP client request. 4674 In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or 4675 DHCPREQUEST/REBINDING request it receives. 4677 9.6.3. Operation in NORMAL state 4679 When in NORMAL state, for every DHCP client request that it 4680 processes, as determined by the algorithm described in section 9.6.2, 4681 above, a server will operate in the following manner: 4683 o Lease time calculations 4685 As discussed in section 5.2.1, "Control of lease time", the 4686 lease interval given to a DHCP client can never be more than the 4687 MCLT greater than the most recently received potential- 4688 expiration-time from the failover partner or the current time, 4689 whichever is later. 4691 As long as a server adheres to this constraint, the specifics of 4692 the lease interval that it gives to a DHCP client or the value 4693 of the potential-expiration-time sent to its failover partner 4694 are implementation dependent. One possible approach is 4695 discussed in section 5.2.1, but that particular approach is in 4696 no way required by this protocol. 4698 See section 7.1.4 for details concerning the storage of time 4699 associated IP addresses and how to use these times when calcu- 4700 lating lease times for DHCP clients. 4702 o Lazy update of partner server 4704 After an ACK of a IP address binding, the server servicing a 4705 DHCP client request attempts to update its partner with the new 4706 binding information. The lease time used in the update of the 4707 secondary MUST be at that given to the DHCP client in the 4708 DHCPACK, and the potential-expiration-time MUST be at least the 4709 lease time, and SHOULD be longer. 4711 o Reallocation of IP addresses between clients 4713 Whenever a client binding is released or expires, a BNDUPD mes- 4714 sage must be sent to partner, setting the binding state to 4715 RELEASED or EXPIRED. However, until a BNDACK is received for 4716 this message, the IP address cannot be allocated to another 4717 client. It can be allocated to the same client again. 4719 In normal state, the each server receives binding updates from its 4720 partner server in BNDUPD messages. It records these in its client 4721 binding database in stable storage and then sends a corresponding 4722 BNDACK message to the primary server. It MUST ensure that the infor- 4723 mation is recorded in stable storage prior to sending the BNDACK mes- 4724 sage back to the primary server. 4726 9.6.4. Transitions out of NORMAL state 4728 If an external command is received by a server in NORMAL state 4729 informing it that its partner is down, then transition into PARTNER- 4730 DOWN state. 4732 If a server in NORMAL state fails to receive acks to messages sent to 4733 its partner for an implementation dependent period of time, it MAY 4734 move into COMMUNICATIONS-INTERRUPTED state. This situation might 4735 occur if the partner server was capable of maintaining the TCP con- 4736 nection between the server and also capable of sending a CONTACT mes- 4737 sage every tSend seconds, but was (for some reason) incapable of pro- 4738 cessing BNDUPD messages. 4740 If the communications is determined to not be "ok" (as defined in 4741 section 8), then transition into COMMUNICATIONS-INTERRUPTED state. 4743 If a server in NORMAL state receives any messages from its partner 4744 where the partner has changed state from that expected by the server 4745 in NORMAL state, then the server should transition into 4746 COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- 4747 sition from there. For example, it would be expected for the partner 4748 to transition from POTENTIAL-CONFLICT into NORMAL state, but not for 4749 the partner to transition from NORMAL into POTENTIAL-CONFLICT state. 4751 9.7. COMMUNICATIONS-INTERRUPTED State 4753 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 4754 unable to communicate with the other server. Primary and secondary 4755 servers cycle automatically (without administrative intervention) 4756 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 4757 connection between them fails and recovers, or as the partner server 4758 cycles between operational and non-operational. No duplicate IP 4759 address allocation can occur while the servers cycle between these 4760 states. 4762 9.7.1. Upon Entry to COMMUNICATIONS-INTERRUPTED state 4764 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 4765 configured to support an automatic transition out of COMMUNICATIONS- 4766 INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period" 4767 has been configured, see section 10), then a timer MUST be started 4768 for a the length of the configured safe period. 4770 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 4771 the NORMAL state SHOULD raise some alarm condition to alert adminis- 4772 trative staff to a potential problem in the DHCP subsystem. 4774 9.7.2. Operation in COMMUNICATIONS-INTERRUPTED State 4776 In this state a server MUST respond to all DHCP client requests, and 4777 the algorithm for load balancing described in section 5.3 MUST NOT be 4778 used. When allocating new IP addresses, each server allocates from 4779 its own IP address pool, where the primary MUST allocate only FREE IP 4780 addresses, and the secondary MUST allocate only BACKUP IP addresses. 4781 When responding to renewal requests, each server will allow continued 4782 renewal of a DHCP client's current lease on an IP address irrespec- 4783 tive of whether that lease was given out by the receiving server or 4784 not, although the renewal period MUST not exceed the maximum client 4785 lead time (MCLT) beyond the potential-expiration-time already ack- 4786 nowledged by the other server or the lease-expiration-time or 4787 potential-expiration-time received from the partner server. 4789 However, since the server cannot communicate with its partner in this 4790 state, the acknowledged-potential-expiration time will not be updated 4791 in any new bindings. This is likely to eventually cause the actual- 4792 client-lease-times to be the current time plus the maximum-client- 4793 lead-time (unless this is greater than the desired-client-lease- 4794 time). 4796 9.7.3. Transition out of COMMUNICATIONS-INTERRUPTED State 4798 If the safe period timer expires while a server is in the 4799 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 4800 PARTNER-DOWN state. 4802 If an external command is received by a server in COMMUNICATIONS- 4803 INTERRUPTED state informing it that its partner is down, it will 4804 transition immediately into PARTNER-DOWN state. 4806 If communications is restored with the other server, then the server 4807 in COMMUNICATIONS-INTERRUPTED state will transition into another 4808 state based on the state of the partner: 4810 o partner in NORMAL or COMMUNICATIONS-INTERRUPTED 4812 The partner really SHOULD NOT be in NORMAL state here, since 4813 upon restoration of communications is MUST have created a new 4814 TCP connection which would have forced it into COMMUNICATIONS- 4815 INTERRUPTED state. Still, we should account for every state 4816 just in case. 4818 Transition into the NORMAL state. 4820 o partner in RECOVER 4822 Stay in COMMUNICATIONS-INTERRUPTED state. 4824 o partner in RECOVER-DONE 4826 Transition into NORMAL state. 4828 o partner in PARTNER-DOWN or POTENTIAL-CONFLICT 4830 Transition into POTENTIAL-CONFLICT state. 4832 o partner in PAUSED 4834 Stay in COMMUNICATIONS-INTERRUPTED state. 4836 o partner in SHUTDOWN 4838 Transition into PARTNER-DOWN state. 4840 The following figure illustrates the transition from NORMAL to 4841 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 4843 Primary Secondary 4844 Server Server 4846 NORMAL NORMAL 4847 | >--CONTACT-------------------> | 4848 | <--------------------CONTACT--< | 4849 | [TCP connection broken] | 4850 COMMUNICATIONS : COMMUNICATIONS 4851 INTERRUPTED : INTERRUPTED 4852 | [attempt new TCP connection] | 4853 | [connection succeeds] | 4854 | | 4855 | >--CONNECT-------------------> | 4856 | <-----------------CONNECTACK--< | 4857 | <-------------------STATE-----< | 4858 | NORMAL 4859 | >--STATE---------------------> | 4860 NORMAL | 4861 | >--BNDUPD--------------------> | 4862 | <---------------------BNDACK--< | 4863 | | 4864 | <---------------------BNDUPD--< | 4865 | >------BNDACK----------------> | 4866 ... ... 4867 | | 4868 | <--------------------POOLREQ--< | 4869 | >--POOLRESP-(2)--------------> | 4870 | | 4871 | >--BNDUPD-(#1)---------------> | 4872 | <---------------------BNDACK--< | 4873 | | 4874 | <--------------------POOLREQ--< | 4875 | >--POOLRESP-(0)--------------> | 4876 | | 4877 | >--BNDUPD-(#2)---------------> | 4878 | <---------------------BNDACK--< | 4879 | | 4881 Figure 9.7.3-1: Transition from NORMAL to COMMUNICATIONS- 4882 INTERRUPTED and back (example with 2 4883 addresses allocated to secondary) 4885 9.8. POTENTIAL-CONFLICT state 4887 This state indicates that the two servers are attempting to re- 4888 integrate with each other, but at least one of them was running in a 4889 state that did not guarantee automatic reintegration would be 4890 possible. In POTENTIAL-CONFLICT state the servers may determine that 4891 the same IP address has been offered and accepted by two different 4892 DHCP clients. 4894 It is a goal of this protocol to minimize the possibility that 4895 POTENTIAL-CONFLICT state is ever entered. 4897 9.8.1. Upon Entry to POTENTIAL-CONFLICT 4899 When a primary server enters POTENTIAL-CONFLICT state it should 4900 request that the secondary send it all updates of which it is 4901 currently unaware by sending an UPDREQ message to the secondary 4902 server. 4904 A secondary server entering POTENTIAL-CONFLICT state will wait for 4905 the primary to send it an UPDREQ message. 4907 9.8.2. Operation in POTENTIAL-CONFLICT state 4909 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 4910 DHCP requests. 4912 9.8.3. Transitions out of POTENTIAL-CONFLICT state 4914 If communications fails with the partner while in POTENTIAL-CONFLICT 4915 state, then a primary server will transition to PARTNER-DOWN state 4916 and a secondary server will stay in POTENTIAL-CONFLICT state. 4918 Whenever either server receives an UPDDONE message from its partner 4919 while in POTENTIAL-CONFLICT state, it MUST transition to NORMAL 4920 state. This will cause the primary server to leave POTENTIAL- 4921 CONFLICT state prior to the secondary, since the primary sends an 4922 UPDREQ message and receives an UPDDONE before the secondary sends an 4923 UPDREQ message and receives its UPDDONE message. 4925 When a secondary server receives an indication that the primary 4926 server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it 4927 SHOULD send an UPDREQ message to the primary server. 4929 Primary Secondary 4930 Server Server 4932 | | 4933 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 4934 | | 4935 | >--UPDREQ--------------------> | 4936 | | 4937 | <---------------------BNDUPD--< | 4938 | >--BNDACK--------------------> | 4939 ... ... 4940 | | 4941 | <---------------------BNDUPD--< | 4942 | >--BNDACK--------------------> | 4943 | | 4944 | <--------------------UPDDONE--< | 4945 NORMAL | 4946 | >--STATE--(NORMAL)-----------> | 4947 | <---------------------UPDREQ--< | 4948 | | 4949 | >--BNDUPD--------------------> | 4950 | <---------------------BNDACK--< | 4951 ... ... 4952 | >--BNDUPD--------------------> | 4953 | <---------------------BNDACK--< | 4954 | | 4955 | >--UPDDONE-------------------> | 4956 | NORMAL 4957 | | 4958 | <--------------------POOLREQ--< | 4959 | >------POOLRESP-(n)----------> | 4960 | addresses | 4962 Figure 9.8.3-1: Transition out of POTENTIAL-CONFLICT 4964 9.9. RESOLUTION-INTERRUPTED state 4966 This state indicates that the two servers were attempting to re- 4967 integrate with each other in POTENTIAL-CONFLICT state, but 4968 communications failed prior to completion of re-integration. 4970 If the servers remained in POTENTIAL-CONFLICT while communications 4971 was interrupted, neither server would be responsive to DHCP client 4972 requests, and if one server had crashed, then there might be no 4973 server able to process DHCP requests. 4975 9.9.1. Upon Entry to RESOLUTION-INTERRUPTED state 4977 When a server enters RESOLUTION-INTERRUPTED SHOULD raise an alarm 4978 condition to alert administrative staff of a problem in the DHCP sub- 4979 system. 4981 9.9.2. Operation in RESOLUTION-INTERRUPTED state 4983 In this state a server MUST respond to all DHCP client requests, and 4984 any load balancing (described in section 5.3) MUST NOT be used. When 4985 allocating new IP addresses, each server SHOULD allocate from its own 4986 IP address pool (if that can be determined), where the primary MUST 4987 allocate only FREE IP addresses, and the secondary MUST allocate only 4988 BACKUP IP addresses. When responding to renewal requests, each 4989 server will allow continued renewal of a DHCP client's current lease 4990 on an IP address irrespective of whether that lease was given out by 4991 the receiving server or not, although the renewal period MUST not 4992 exceed the maximum client lead time (MCLT) beyond the potential- 4993 expiration-time already acknowledged by the other server or the 4994 lease-expiration-time or potential-expiration-time received from the 4995 partner server. 4997 However, since the server cannot communicate with its partner in this 4998 state, the acknowledged-potential-expiration time will not be updated 4999 in any new bindings. 5001 9.9.3. Transitions out of RESOLUTION-INTERRUPTED state 5003 If an external command is received by a server in RESOLUTION- 5004 INTERRUPTED state informing it that its partner is down, it will 5005 transition immediately into PARTNER-DOWN state. 5007 If communications is restored with the other server, then the server 5008 in RESOLUTION-INTERRUPTED state will transition into POTENTIAL- 5009 CONFLICT state. 5011 9.10. RECOVER-DONE state 5013 This state exists to allow an interlocked transition for one server 5014 from RECOVER state and another server from PARTNER-DOWN or 5015 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 5017 9.10.1. Operation in RECOVER-DONE state 5019 A server in RECOVER-DONE state MUST respond only to 5020 DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 5022 9.10.2. Transitions out of RECOVER-DONE state 5024 When a server in RECOVER-DONE state determines that its partner 5025 server has entered NORMAL state, then it will transition into NORMAL 5026 state as well. 5028 9.11. PAUSED state 5030 This state exists to allow one server to inform another that it will 5031 be out of service for what is predicted to be a relatively short 5032 time, and to allow the other server to transition to COMMUNICATIONS- 5033 INTERRUPTED state immediately and to begin servicing all DHCP clients 5034 with no interruption in service to new DHCP clients. 5036 A server which is aware that it is shutting down temporarily SHOULD 5037 send a STATE message with the server-state option containing PAUSED 5038 state and close the TCP connection. 5040 While a server may or may not transition internally into PAUSED 5041 state, the 'previous' state determined when it is restarted MUST be 5042 the state the server was in prior to receiving the command to shut- 5043 down and restart and which precedes its entry into the PAUSED state. 5044 See section 9.3.2 concerning the use of the previous state upon 5045 server restart. 5047 9.11.1. Upon entry to PAUSED state 5049 When entering PAUSED state, the server MUST store the previous state 5050 in stable storage, and use that state as the previous state when it 5051 is restarted. 5053 9.11.2. Transitions out of PAUSED state 5055 A server transitions out of PAUSED state by being restarted. At that 5056 time, the previous state MUST be the state the server was in prior to 5057 entering the PAUSED state. 5059 9.12. SHUTDOWN state 5061 This state exists to allow one server to inform another that it will 5062 be out of service for what is predicted to be a relatively long time, 5063 and to allow the other server to transition immediately to PARTNER- 5064 DOWN state, and take over completely for the server going down. 5066 A server which is aware that it is shutting down SHOULD send a STATE 5067 message with the server-state field containing SHUTDOWN. 5069 While a server may or may not transition internally into SHUTDOWN 5070 state, the 'previous' state determined when it is restarted MUST be 5071 the state active prior to the command to shutdown. See section 9.3.2 5072 concerning the use of the previous state upon server restart. 5074 9.12.1. Upon entry to SHUTDOWN state 5076 When entering SHUTDOWN state, the server MUST record the previous 5077 state in stable storage for use when the server is restarted. It 5078 also MUST record the current time as the last time operational. 5080 A server which is aware that it is shutting down SHOULD send a STATE 5081 message with the server-state field containing SHUTDOWN. 5083 9.12.2. Operation in SHUTDOWN state 5085 A server in SHUTDOWN state MUST NOT respond to any DHCP client input. 5087 If a server receives any message indicating that the partner has 5088 moved to PARTNER-DOWN state while it is in SHUTDOWN state then it 5089 MUST record RECOVER state as the previous state to be used when it is 5090 restarted. 5092 A server SHOULD wait for a few seconds after informing the partner of 5093 entry into SHUTDOWN state (if communications are okay) to determine 5094 if it will enter PARTNER-DOWN state. 5096 9.12.3. Transitions out of SHUTDOWN state 5098 A server transitions out of SHUTDOWN state by being restarted. 5100 10. Safe Period 5102 Due to the restrictions imposed on each server while in 5103 COMMUNICATIONS-INTERRUPTED state, long-term operation in this state 5104 is not feasible for either server. One reason that these states 5105 exist at all, is to allow the servers to easily survive transient 5106 network communications failures of a few minutes to a few days 5107 (although the actual time periods will depend a great deal on the 5108 DHCP activity of the network in terms of arrival and departure of 5109 DHCP clients on the network). 5111 Eventually, when the servers are unable to communicate, they will 5112 have to move into a state where they no longer can re-integrate 5113 without some possibility of a duplicate IP address allocation. There 5114 are two ways that they can move into this state (known as PARTNER- 5115 DOWN). 5117 They can either be informed by external command that, indeed, the 5118 partner server is down. In this case, there is no difficulty in mov- 5119 ing into the PARTNER-DOWN state since it is an accurate reflection of 5120 reality and the protocol has been designed to operate correctly (even 5121 during reintegration) if, when in PARTNER-DOWN state the partner is, 5122 indeed, down. 5124 The more difficult scenario is when the servers are running unat- 5125 tended for extended periods, and in this case an option is provided 5126 to configure something called a "safe-period" into each server. This 5127 OPTIONAL safe-period is the period after which either the primary or 5128 secondary server will automatically transition to PARTNER-DOWN from 5129 COMMUNICATIONS-INTERRUPTED state. If this transition is completed 5130 and the partner is not down, then the possibility of duplicate IP 5131 address allocations will exist. 5133 The goal of the "safe-period" is to allow network operations staff 5134 some time to react to a server moving into COMMUNICATIONS-INTERRUPTED 5135 state. During the safe-period the only requirement is that the net- 5136 work operations staff determine if both servers are still running -- 5137 and if they are, to either fix the network communications failure 5138 between them, or to take one of the servers down before the expira- 5139 tion of the safe-period. 5141 The length of the safe-period is installation dependent, and depends 5142 in large part on the number of unallocated IP addresses within the 5143 subnet address pool and the expected frequency of arrival of previ- 5144 ously unknown DHCP clients requiring IP addresses. Many environments 5145 should be able to support safe-periods of several days. 5147 During this safe period, either server will allow renewals from any 5148 existing client. The only limitation concerns the need for IP 5149 addresses for the DHCP server to hand out to new DHCP clients and the 5150 need to re-allocate IP addresses to different DHCP clients. 5152 The number of "extra" IP addresses required is equal to the expected 5153 total number of new DHCP clients encountered during the safe period. 5154 This is dependent only on the arrival rate of new DHCP clients, not 5155 the total number of outstanding leases on IP addresses. 5157 In the unlikely event that a relatively short safe period of an hour 5158 is all that can be used (given a dearth of IP addresses or a very 5159 high arrival rate of new DHCP clients), even that can provide sub- 5160 stantial benefits in allowing the DHCP subsystem to ride through 5161 minor problems that could occur and be fixed within that hour. In 5162 these cases, no possibility of duplicate IP address allocation 5163 exists, and re-integration after the failure is solved will be 5164 automatic and require no operator intervention. 5166 11. Security 5168 The Failover protocol communicates DHCP lease activity and this data 5169 is generally easily discovered via other means, such as by pinging 5170 addresses and doing DNS lookups. Therefore, the need to encrypt the 5171 data over the wire is likely not great (though some sites may feel 5172 differently). 5174 However, it is very desirable to assure the integrity of failover 5175 partners and to thus ensure proper operation of the servers. For 5176 example, denial of service attacks are possible by the communication 5177 of invalid state information to one or both servers. 5179 Therefore, the Failover protocol MUST be capable of being secured by 5180 using a simple shared secret message digest which covers each mes- 5181 sage. This provides authentication of the servers, but does not pro- 5182 vide encryption of the data exchange. 5184 The Failover protocol MAY also be secured by using TLS [TLS] (Tran- 5185 sport Layer Security) if encryption of the data exchange is desired. 5186 The use of the shared secret or TLS will not protect against TCP or 5187 IP layer attacks (such as someone sending fake TCP RST segments). 5188 IPsec SHOULD be used to protect against most (if not all) of these 5189 kinds of attacks. 5191 11.1. Simple shared secret 5193 Messages between the failover partners are authenticated through the 5194 use of a shared secret, which is never sent over the network and must 5195 be known by each server. How each server is told about this shared 5196 secret and secures its storage of the shared secret is outside the 5197 scope of this document. If a server is configured with a shared 5198 secret for a partner, it MUST send the message-digest option in ALL 5199 messages to that partner and it MUST treat any messages received from 5200 that partner without a message-digest option as failing authentica- 5201 tion. 5203 If a server is not configured with a shared secret for a partner, it 5204 MUST NOT send the message-digest option in any message to that 5205 partner and it MUST treat any messages received from that partner 5206 with a message-digest option as failing authentication. 5208 The shared secret is used to calculate a 16 octet message-digest 5209 which is sent in every failover message as the message-digest option. 5210 See section 6.2.25. The message-digest contains a one-way 16 octet 5211 MD5 [MD5] hash calculated over a stream of octets consisting of the 5212 entire message concatenated with the shared secret. 5214 For calculation, the message includes the message-digest option with 5215 the message-digest data zeroed (16-octets of zero). Once the calcula- 5216 tion is complete, these 16 octets of zero are replaced by the 16- 5217 octet MD5 hash and the message is sent. 5219 For verification, the 16-octet message-digest is saved and replaced 5220 with 16-octets of zero and calculated per above. The resulting MD5 5221 hash is compared to the received hash and if they match, the message 5222 is assumed authenticated. 5224 A failover partner that fails to authenticate a received message or 5225 receives a message without a message-digest option when configured 5226 with a shared secret MUST close the connection immediately and take 5227 steps to notify operators. 5229 This use of the shared secret is very similar to that used for RADIUS 5230 Accounting [RADIUS]. 5232 11.2. TLS 5234 TLS, Transport Layer Security, as specified in [TLS] MAY be used. 5235 The use of TLS would be similar to the way it is used with SMTP 5236 [SMTPTLS] and IMAP/POP3/ACAP [IPAMTLS]. 5238 To request the use of TLS, the server that successfully opened a con- 5239 nection to its peer MUST send the TLS option as part of the CONNECT 5240 message. The server receiving the TLS option MUST respond with a 5241 TLS-reply option indicating its acceptance or rejection of the TLS- 5242 request in the CONNECT message. 5244 If the CONNECTACK message contained a TLS-reply of 1 , then both 5245 servers begin TLS negotiation. 5247 Upon completion of this negotiation, the server which originally sent 5248 the CONNECT message MUST resent its CONNECT message without any TLS- 5249 request, and must wait for a corresponding CONNECTACK. 5251 Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [TLS] cipher 5252 suite is REQUIRED in Failover servers supporting TLS. This is impor- 5253 tant as it assures that any two compliant implementations can be con- 5254 figured to interoperate. 5256 12. Acknowledgments 5258 Ralph Droms started it all, by sketching out an initial interserver 5259 draft that embodied ideas from several past IETF meetings. In that 5260 draft, he acknowledged contributions by Jeff Mogul, Greg Minshall, 5261 Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group. 5263 Kim Kinnear and Bob Cole each extended that draft, separately and 5264 then together, until they created an interserver draft that supported 5265 any number of servers. The complexity of that approach was just too 5266 great, and that draft wasn't greeted with enthusiasm by many, includ- 5267 ing its authors. 5269 It did however lead to a much simpler approach embodied in the first 5270 Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph 5271 Droms. This draft posited only two servers -- a primary and a secon- 5272 dary. 5274 Kim Kinnear then wrote the Safe Failover draft to layer on top of the 5275 Failover Draft and increase its robustness in the face of certain 5276 rare network failures. 5278 At the spring 1998 IETF meeting in LA, the DHC working group said 5279 that they wanted a merged Failover and Safe Failover draft. Steve 5280 Gonczi and Bernie Volz stepped up and produced the raw material for 5281 such a merged draft, along with a new message format designed around 5282 DHCP options and other extensions and clarifications. Kim Kinnear 5283 edited their work into draft format and made other changes in time 5284 for the Summer Chicago IETF meeting. 5286 During the summer and fall of 1998, two groups worked on separate 5287 implementations of the UDP failover draft. Bernie Volz and Steve 5288 Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul 5289 Fox made up the other. These two groups worked together to produce 5290 considerable changes and simplifications of the protocol during that 5291 period, and Steve Gonczi and Kim Kinnear edited those changes into 5292 -03 draft in time for submission to the December 1998 Orlando IETF 5293 meeting. 5295 In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting on 5296 people interested in the failover draft. During that meeting a gen- 5297 eral agreement was reached to recast the failover protocol to use TCP 5298 instead of UDP. In addition, the group together brainstormed a work- 5299 able load-balancing technique. Kim Kinnear rewrote the entire draft 5300 to include the changes made at that meeting as well as to restructure 5301 the draft along guidelines suggested by Thomas Narten. The result 5302 was the -04 draft, submitted prior to the Oslo IETF meeting. 5304 The initial idea for a hash-based load balancing approach was offered 5305 by Ted Lemon, and the determination of an algorithm and its integra- 5306 tion into the draft was done by Steve Gonczi. The security section 5307 was spearheaded by Bernie Volz. Both contributed considerably to the 5308 ideas and text in the rest of the draft with several reviews. 5310 In early October of 1999, three conference calls were held to discuss 5311 the -04 draft. The current draft (-05) includes changes as a result 5312 of those calls, perhaps the largest of which was to remove the load- 5313 balancing approach into a separate draft. Thanks to all of the many 5314 people whoe participated in the conference calls. This current draft 5315 was changed because of contributions by: Ted Lemon, David Erdmann, 5316 Richard Jones, Rob Stevens, Thomas Narten, Diana Lane, and Andre Kos- 5317 tur. 5319 These most recent changes have been widely circulated among the other 5320 authors, but that does not preclude any of them from expressing 5321 disagreement with what is contained in this draft at any future time. 5323 Many people have reviewed the various earlier drafts that went into 5324 this result. At American Internet, ideas were contributed by Brad 5325 Parker. At Cisco Systems Paul Fox and Ellen Garvey contributed to 5326 the design of the protocol. 5328 Glenn Waters of Nortel Networks contributed ideas and enthusiasm to 5329 make a Failover protocol that was both "safe" and "lazy". 5331 13. References 5333 [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 5334 2131, March 1997. 5336 [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate 5337 Requirement Levels", RFC 2119. 5339 [RFC 2132] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor 5340 Extensions", Internet RFC 2132, March 1997. 5342 [TLS] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, January 5343 1999. 5345 [SMTPTLS] Hoffman, P., "SMTP Service Extension for Secure SMTP over 5346 TLS", RFC 2487, January 1999. 5348 [IMAPTLS] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC 5349 2595, June 1999. 5351 [NAMESPACE] Carney, M., "draft-ietf-dhc-option_review_and_namespace- 5352 00.txt", June 1999. 5354 [DDNS] Rekhter, Y., Stapp, M., "draft-ietf-dhc-dhcp-dns-11.txt", 5355 October, 1999. 5357 [MD5] Rivest, R., and Dusse, S., "The MD5 Message-Digest Algorithm", 5358 RFC 1321, MIT Laboratory for Computer Science, RSA Data Security 5359 Inc., April 1992. 5361 [RADIUS] Rigney, C., "Radius Accounting", RFC 2139, Livingston Enter- 5362 prises, April 1997. 5364 [LOADB] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "draft-ietf- 5365 dhc-loadb-00.txt", October, 1999. 5367 [RFC1035] Mockapetris, P., "Domain Names - Implementation and Specif- 5368 ication", November, 1987. 5370 [AGENTINFO] Patrick, M., "draft-ietf-dhc-agent-options-07.txt", 5371 August, 1999. [USERCLASS] Stump, G., Droms, R., "draft-ietf-dhc- 5372 userclass-04.txt", October, 1999. 5374 [RFC2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic 5375 Updates in the Domain Name System (DNS UPDATE)", RFC2136, April 5376 1997 5378 14. Author's information 5380 Ralph Droms 5381 323 Dana Engineering 5382 Bucknell University 5383 Lewisburg, PA 17837 5385 Phone: (717) 524-1145 5386 EMail: droms@bucknell.edu 5388 Greg Rabil, Mike Dooley, Arun Kapur 5389 Lucent Technologies 5390 10 Valley Stream Parkway, Suite 240 5391 Malvern, PA 19355 5393 Phone: (800) 208-2747 5395 EMail: grabil@lucent.com 5396 mdooley@lucent.com 5397 akapur@lucent.com 5399 Kim Kinnear 5400 Mark Stapp 5401 Cisco Systems 5402 250 Apollo Drive 5403 Chelmsford, MA 01824 5405 Phone: (978) 244-8000 5407 EMail: kkinnear@cisco.com 5408 mjs@cisco.com 5410 Bernie Volz 5411 Steve Gonczi 5412 Process Software Corporation 5413 959 Concord St. 5414 Framingham, MA 01701 5416 Phone: (508) 879-6994 5418 EMail: volz@process.com 5419 gonczi@process.com 5421 15. Full Copyright Statement 5423 Copyright (C) The Internet Society (1999). All Rights Reserved. 5425 This document and translations of it may be copied and furnished to oth- 5426 ers, and derivative works that comment on or otherwise explain it or 5427 assist in its implementation may be prepared, copied, published and dis- 5428 tributed, in whole or in part, without restriction of any kind, provided 5429 that the above copyright notice and this paragraph are included on all 5430 such copies and derivative works. However, this document itself may not 5431 be modified in any way, such as by removing the copyright notice or 5432 references to the Internet Society or other Internet organizations, 5433 except as needed for the purpose of developing Internet standards in 5434 which case the procedures for copyrights defined in the Internet Stan- 5435 dards process must be followed, or as required to translate it into 5436 languages other than English. 5438 The limited permissions granted above are perpetual and will not be 5439 revoked by the Internet Society or its successors or assigns. 5441 This document and the information contained herein is provided on an "AS 5442 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 5443 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 5444 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 5445 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT- 5446 NESS FOR A PARTICULAR PURPOSE. 5448 Open Issues 5450 These issues need to be resolved: 5452 1. Need to figure out how to get 16 bit options without referenc- 5453 ing the [NAMESPACE] draft, since it doesn't really define them 5454 anymore. 5456 2. We need to deal with the option space, and the procedures for 5457 managing it. Probably IANA. 5459 3. Figure out a better way to identify vendors. How about an 5460 SNMP Enterprise MIB value? 5462 4. Need to tie reject-reasons to text of draft, remove obsolete 5463 reject-reasons. 5465 5. Using tables, compress description of sending BNDUPD message 5466 to save duplicated words, enhance description of differences.