idnits 2.17.1 draft-ietf-dhc-failover-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 103 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([RFC2131]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1563 has weird spacing: '...od ends addre...' == Line 2068 has weird spacing: '...eserved not...' == Line 2603 has weird spacing: '... accept tim...' == Line 2604 has weird spacing: '... accept acc...' == Line 2605 has weird spacing: '... accept acc...' == (8 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests, and any load balancing (described in section 5.3) MUST NOT be used. When allocating new IP addresses, each server SHOULD allocate from its own IP address pool (if that can be determined), where the primary SHOULD allocate only FREE IP addresses, and the secondary SHOULD allocate only BACKUP IP addresses. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease on an IP address irrespective of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the latest of: 1) the potential-expiration-time already acknowledged by the other server or 2) the lease-expiration-time or 3) `potential-expiration-time received from the partner server. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2001) is 8499 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 3015 -- Looks like a reference, but probably isn't: '3' on line 2253 -- Looks like a reference, but probably isn't: '4' on line 2765 -- Looks like a reference, but probably isn't: '9' on line 2886 -- Looks like a reference, but probably isn't: '7' on line 2948 -- Looks like a reference, but probably isn't: '8' on line 2976 -- Looks like a reference, but probably isn't: '2' on line 3066 -- Looks like a reference, but probably isn't: '5' on line 3104 -- Looks like a reference, but probably isn't: '6' on line 3287 -- Looks like a reference, but probably isn't: '10' on line 3458 -- Looks like a reference, but probably isn't: '11' on line 3506 -- Looks like a reference, but probably isn't: '12' on line 3529 -- Possible downref: Non-RFC (?) normative reference: ref. 'AGENTINFO' -- Possible downref: Non-RFC (?) normative reference: ref. 'DDNS' -- Possible downref: Non-RFC (?) normative reference: ref. 'LOADB' ** Downref: Normative reference to an Informational RFC: RFC 1321 ** Obsolete normative reference: RFC 2139 (Obsoleted by RFC 2866) ** Obsolete normative reference: RFC 2246 (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 2487 (Obsoleted by RFC 3207) -- Possible downref: Non-RFC (?) normative reference: ref. 'USERCLASS' Summary: 11 errors (**), 0 flaws (~~), 9 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Ralph Droms 3 INTERNET DRAFT Kim Kinnear 4 Mark Stapp 5 Cisco Systems 7 Bernie Volz 8 IPWorks 10 Steve Gonczi 11 Network Engines 13 Greg Rabil 14 Mike Dooley 15 Arun Kapur 16 Lucent Technologies 18 July 2000 19 Expires January 2001 21 DHCP Failover Protocol 22 24 Status of this Memo 26 This document is an Internet-Draft and is in full conformance with 27 all provisions of Section 10 of RFC2026. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet- Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/ietf/1id-abstracts.txt 42 The list of Internet-Draft Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html. 45 Copyright Notice 47 Copyright (C) The Internet Society (2000). All Rights Reserved. 49 Abstract 51 DHCP [RFC 2131] allows for multiple servers to be operating on a 52 single network. Some sites are interested in running multiple 53 servers in such a way so as to provide redundancy in case of server 54 failure. In order for this to work reliably, the cooperating primary 55 and secondary servers must maintain a consistent database of the 56 lease information. This implies that servers will need to coordinate 57 any and all lease activity so that this information is synchronized 58 in case of failover. 60 This document defines a protocol to provide such synchronization 61 between two servers. One server is designated the "primary" server, 62 the other is the "secondary" server. This document also describes a 63 way to integrate the failover protocol with the DHCP load balancing 64 approach. 66 This document is a substantial reorganization as well as a technical 67 and editorial revision of draft-ietf-dhc-failover-05.txt. 69 Table of Contents 71 1. Introduction................................................. 4 72 2. Terminology.................................................. 5 73 2.1. Requirements terminology................................... 5 74 2.2. DHCP and failover terminology.............................. 5 75 3. Background and External Requirements......................... 9 76 3.1. Key aspects of the DHCP protocol........................... 9 77 3.2. BOOTP relay agent implementation........................... 11 78 3.3. What does it mean if a server can't communicate with its partner? 12 79 3.4. Challenging scenarios for a Failover protocol.............. 13 80 3.5. Using TCP to detect partner server failure................. 14 81 4. Design Goals................................................. 15 82 4.1. Design goals for this protocol............................. 15 83 4.2. Limitations of this protocol............................... 17 84 5. Protocol Overview............................................ 17 85 5.1. Messages and States........................................ 17 86 5.2. Fundamental guarantees..................................... 20 87 5.3. Load balancing............................................. 26 88 5.4. IP address allocations between servers..................... 27 89 5.5. Operating in NORMAL state.................................. 29 90 5.6. Operating in COMMUNICATIONS-INTERRUPTED state.............. 29 91 5.7. Operating in PARTNER-DOWN state............................ 30 92 5.8. Operating in RECOVER state................................. 30 93 5.9. Operating in STARTUP state................................. 30 94 5.10. Time synchronization between servers...................... 30 95 5.11. IP address binding-status................................. 31 96 5.12. DNS dynamic update considerations......................... 35 97 5.13. Reservations and failover................................. 39 98 5.14. Dynamic BOOTP and failover................................ 41 99 5.15. Guidelines for selecting MCLT............................. 41 100 5.16. What is sent in response to an UPDREQ or UPDREQALL message? 42 101 6. Common Message Format........................................ 43 102 6.1. Message header format...................................... 43 103 6.2. Common option format....................................... 46 104 6.3. Batching multiple binding update transactions in one BNDUPD mes- 47 105 7. Protocol Messages............................................ 49 106 7.1. BNDUPD message [3]......................................... 49 107 7.2. BNDACK message [4]......................................... 60 108 7.3. UPDREQ message [9]......................................... 63 109 7.4. UPDREQALL message [7]...................................... 64 110 7.5. UPDDONE message [8]........................................ 65 111 7.6. POOLREQ message [1]........................................ 65 112 7.7. POOLRESP message [2]....................................... 66 113 7.8. CONNECT message [5]........................................ 67 114 7.9. CONNECTACK message [6]..................................... 71 115 7.10. STATE message [10]........................................ 75 116 7.11. CONTACT message [11]...................................... 76 117 7.12. DISCONNECT message [12]................................... 76 118 8. Connection Management........................................ 77 119 8.1. Connection granularity..................................... 78 120 8.2. Creating the TCP connection................................ 78 121 8.3. Using the TCP connection for determining communications status 80 122 8.4. Using the TCP connection for binding data.................. 82 123 8.5. Using the TCP connection for control messages.............. 82 124 8.6. Losing the TCP connection.................................. 82 125 9. Failover Endpoint States..................................... 83 126 9.1. Server Initialization...................................... 83 127 9.2. Server State Transitions................................... 83 128 9.3. STARTUP state.............................................. 86 129 9.4. PARTNER-DOWN state......................................... 88 130 9.5. RECOVER state.............................................. 90 131 9.6. NORMAL state............................................... 93 132 9.7. COMMUNICATIONS-INTERRUPTED State........................... 95 133 9.8. POTENTIAL-CONFLICT state................................... 99 134 9.9. RESOLUTION-INTERRUPTED state............................... 100 135 9.10. CONFLICT-DONE state....................................... 101 136 9.12. PAUSED state.............................................. 102 137 9.13. SHUTDOWN state............................................ 103 138 10. Safe Period................................................. 104 139 11. Security.................................................... 105 140 11.1. Simple shared secret...................................... 106 141 11.2. TLS....................................................... 107 142 12. Failover Options............................................ 107 143 12.1. addresses-transferred..................................... 108 144 12.2. assigned-IP-address....................................... 108 145 12.3. binding-status............................................ 108 146 12.4. client-identifier......................................... 109 147 12.5. client-hardware-address................................... 109 148 12.6. client-last-transaction-time.............................. 109 149 12.7. client-reply-options...................................... 110 150 12.8. client-request-options.................................... 110 151 12.9. DDNS...................................................... 111 152 12.10. delayed-service-parameter................................ 112 153 12.11. hash-bucket-assignment................................... 112 154 12.12. IP-flags................................................. 113 155 12.13. lease-expiration-time.................................... 114 156 12.14. max-unacked-bndupd....................................... 114 157 12.15. MCLT..................................................... 114 158 12.16. message.................................................. 115 159 12.17. message-digest........................................... 115 160 12.18. potential-expiration-time................................ 115 161 12.19. receive-timer............................................ 116 162 12.20. protocol-version......................................... 116 163 12.21. reject-reason............................................ 117 164 12.22. sending-server-IP-address................................ 118 165 12.23. server-flags............................................. 118 166 12.24. server-state............................................. 119 167 12.25. start-time-of-state...................................... 119 168 12.26. TLS-reply................................................ 120 169 12.27. TLS-request.............................................. 120 170 12.28. vendor-class-identifier.................................. 120 171 12.29. vendor-specific-options.................................. 121 172 13. IANA Considerations......................................... 121 173 14. Acknowledgments............................................. 121 174 15. References.................................................. 123 175 16. Author's information........................................ 124 176 17. Full Copyright Statement.................................... 125 178 1. Introduction 180 DHCP [RFC 2131] allows for multiple servers to be operating on a sin- 181 gle network. Some sites are interested in running multiple servers 182 in such a way so as to provide redundancy in case of server failure 183 since the DHCP subsystem is in many cases a critical part of the net- 184 work infrastructure. 186 This document defines a protocol to provide synchronization between 187 two servers in order that each can take over for the other should 188 either one fail or become unreachable. 190 One server is designated the "primary" server, the other is the 191 "secondary" server, and most DHCP client requests are sent to each 192 server (see Section 3.1.1 for details). 194 In order to provide a high availability DHCP service, these 195 cooperating primary and secondary servers must maintain a consistent 196 database of lease information. This implies that servers will need 197 to coordinate all lease activity so that this information is syn- 198 chronized in case failover is required. The protocol messages and 199 processing techniques required to maintain a consistent database are 200 specified in the protocol described here. 202 The failover protocol also contains a way to integrate the DHCP load- 203 balancing algorithm described in [LOADB] with the failover protocol. 205 2. Terminology 207 This section discusses both the generic requirements terminology com- 208 mon to many IETF protocol specifications as well as specialized DHCP 209 and failover protocol specific terminology. 211 2.1. Requirements terminology 213 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 214 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 215 document are to be interpreted as described in RFC 2119 [RFC 2119]. 217 2.2. DHCP and failover terminology 219 This document uses the following terms: 221 o "available IP address" 223 An IP address is "available" if it may be allocated by a 224 specific DHCP server. An IP address is considered (for the 225 purposes of this document) to be available to a single server 226 for allocation unless otherwise noted. An IP address available 227 for allocation on a primary server has state FREE, and an IP 228 address available for allocation on a secondary server has 229 state BACKUP. 231 o "binding" 233 A binding is a collection of configuration parameters, includ- 234 ing at least an IP address, associated with or "bound to" a 235 DHCP client. Bindings are managed by DHCP servers. 237 o "binding database" 239 The collection of bindings managed by a primary and secondary. 241 o "binding update transaction" 243 A binding update transaction refers to the set of information 244 (contained in options) necessary to perform a binding update 245 for a single IP address. It will be comprised of the 246 assigned-IP-address option, the binding-status option, along 247 with other options as appropriate. 249 o "binding-status" 251 The binding-status is the status of an IP address with respect 252 to its association with a client. There are specific binding- 253 status values defined for use by the failover protocol, e.g., 254 ACTIVE, FREE, RELEASED, ABANDONED, etc. These are designed to 255 map more or less directly onto the binding-status values used 256 internally in most DHCP server implementations. The term 257 binding-status refers to the concept also sometimes known as 258 "lease state" or "IP address state", but in this document the 259 term "state" is reserved for the failover state of a failover 260 endpoint, and binding-status is always used to refer to the 261 state associated with an IP address or lease. 263 o "DHCP client" or "client" 265 A DHCP client is an Internet host using DHCP to obtain confi- 266 guration parameters such as a network address. The term 267 "client" used within this document always means a DHCP client, 268 and never one of the two failover servers. 270 o "DHCP server" or "server" 272 A DHCP server is an Internet host that returns configuration 273 parameters to DHCP clients. 275 o "DDNS" 277 An abbreviation for "Dynamic DNS", which refers to the capabil- 278 ity to update a DNS server's name (actually resource record) 279 database using an on-the-wire protocol defined in [RFC 2136]. 281 o "DNS" 282 An abbreviation for "Domain Name System", a scheme where a cen- 283 tral name repository is used to map names to IP addresses and IP 284 addresses to names. 286 o "failover endpoint" 288 The failover protocol allows for there to be a unique failover 289 endpoint per partner per role (where role is primary or secon- 290 dary). This failover endpoint can take actions and hold unique 291 states. There are thus a maximum of two failover endpoints per 292 server per partner (one for each partner as a primary and one 293 for that same partner as a secondary.) 295 o "FQDN" 297 An FQDN is a "fully qualified domain name". A fully qualified 298 domain name generally is a host name with at least one zone 299 name, for example "www.dhcp.org" is a fully qualified domain 300 name. 302 o "lazy update" 304 Lazy update refers to the requirement placed on a server imple- 305 menting a failover protocol to update its failover partner when- 306 ever the binding database changes. A failover protocol which 307 didn't support lazy update would require the failover partner 308 update to be complete before a DHCP server could respond to a 309 DHCP client request with a DHCPACK. A failover protocol which 310 does support lazy update places no such restriction on the 311 update of the failover partner server, and so a server can allo- 312 cate an IP address or extend a lease on an IP address and then 313 update its failover partner as time permits. A failover proto- 314 col which supports lazy update not only removes the requirement 315 to update the failover partner prior to responding to a DHCP 316 client with a DHCPACK, but also allows gathering up batches of 317 updates from one failover server to its partner. 319 o "MCLT" 321 The MCLT refers to maximum client lead time. This time is con- 322 figured on the primary server and transmitted from the primary 323 to the secondary server in the CONNECT message. It is the max- 324 imum amount of time that one server can extend a lease for a 325 client's binding beyond the time known by the partner server. 326 See section 5.2.1 for details. 328 o "partner" 329 A "partner", for the purposes of this document, refers to a 330 failover server, typically the other failover server. In many 331 (if not most) cases, the failover protocol is symmetric with 332 respect to the primary or secondary nature of the servers, and 333 so it is often appropriate to discuss "updating the partner 334 server", since it could be a primary server updating a secondary 335 server or a secondary server updating a primary server. 337 o "Primary server" or "Primary" 339 A DHCP server configured to provide primary service to a set of 340 DHCP clients for a particular set of subnet address pools. 342 o "RR" 344 "RR" is an abbreviation for "resource record". All records in 345 the DNS are resource records. The resource records of most 346 relevance to this document are the "A" resource record, which 347 maps a DNS name to a particular IP address, the "PTR" resource 348 record, which allows a "reverse map", from the IP address back 349 to a DNS name, and the "KEY" resource record, which is used in 350 ways defined in [DDNS] to tag a DNS name with the identity of 351 the DHCP client with which it is associated. 353 o "Secondary server" or "Secondary" 355 A DHCP server configured to act as backup to a primary server 356 for a particular set of subnet address pools. 358 o "stable storage" 360 Every DHCP server is assumed to have some form of what is called 361 "stable storage". Stable storage is used to hold information 362 concerning IP address bindings (among other things) so that this 363 information is not lost in the event of a server failure which 364 requires restart of the server. 366 o "state" 368 In this document, the term "state" refers exclusively to the 369 state of a failover endpoint, for example: NORMAL, 370 COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN. It is not used to 371 refer to any attributes of an IP address or a binding of an IP 372 address. See "binding-status". 374 o "subnet address pool" 376 A subnet address pool is the set of IP addresses which is 377 associated with a particular network number and subnet mask. In 378 the simple case, there is a single network number and subnet 379 mask and a set of IP addresses. In the more complex case (some- 380 times called "secondary subnets", sometimes "superscopes"), 381 several (apparently unrelated) network number and subnet mask 382 combinations with their associated IP addresses may all be con- 383 figured together into one subnet address pool. 385 3. Background and External Requirements 387 This section highlights key aspects of the DHCP protocol on which the 388 failover protocol depends. It also discusses the requirements that 389 the failover protocol places on other aspects of the network infras- 390 tructure, and some general issues surrounding server failure detec- 391 tion. Some failure scenarios that provide particular challenges to a 392 failover protocol are discussed. Finally, the challenges inherent in 393 using a TCP connection as a means to detect failure of a partner 394 server are elaborated. 396 3.1. Key aspects of the DHCP protocol 398 The failover protocol is designed to augment the DHCP protocol as 399 described in RFC 2131 [RFC 2131]. There are several key aspects of 400 the DHCP protocol which are required by the failover protocol in 401 order to successfully meet its design goals. 403 3.1.1. Broadcast behavior 405 There are two aspects of the broadcast behavior of the DHCP protocol 406 which are key to making the failover protocol operate successfully. 407 The first is simply that the DHCP protocol requires a DHCP client to 408 broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages. 409 Because of this requirement, a DHCP client who was communicating with 410 one server will automatically be able to communicate with another 411 server if one is available. 413 The second aspect of broadcast behavior is similar to the first, but 414 involves the distinction between a DHCPREQUEST/RENEW and 415 DHCPREQUEST/REBINDING. A DHCPREQUEST/RENEW is the message that a 416 DHCP client uses to extend its lease. It is unicast to the DHCP 417 server from which it acquired the lease. However, the DHCP protocol 418 (in a farsighted move), was explicitly designed so that in the event 419 that a DHCP client cannot contact the server from which it received a 420 lease on an IP address using a DHCPREQUEST/RENEW, the client is 421 required to broadcast its renewal using a DHCPREQUEST/REBINDING to 422 any available DHCP server. Since all DHCP clients were required to 423 implement this algorithm, the failover protocol can have a different 424 server from the one that initially granted a lease be the server to 425 renew a lease. Thus, one server can take over for another with no 426 interruption in the service as experienced by the DHCP client or its 427 associated applications software. 429 3.1.2. Client responsibility 431 In the DHCP protocol the DHCP clients are entrusted with a consider- 432 able responsibility. In particular, after they are granted a lease 433 on an IP address, they are enjoined to only use that IP address while 434 their lease is valid. Every DHCP client is expected to stop using an 435 IP address if the expiration time on the lease has passed and if it 436 cannot get an extension on the lease for that IP address from some 437 DHCP server. Thus, the correct behavior of every DHCP client in this 438 regard is required to ensure the integrity of the DHCP service. On 439 the other hand, incorrect behavior by a client in this area will tend 440 to adversely affect at most one other DHCP client. 442 Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or 443 DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or 444 broadcast for a REBINDING) MUST still have time to run on the lease 445 for that IP address. The DHCP server sends the DHCPACK back unicast 446 to the IP address from which the RENEW or REBINDING originated. 448 Given the existing responsibility placed on the client to only use an 449 IP address when the lease is valid, and to only send in a RENEW or 450 REBINDING if the lease is valid, the failover protocol relies on DHCP 451 clients to perform responsibly and will, in the absence of conflict- 452 ing information, believe a DHCP client that is attempting to RENEW or 453 REBIND a lease on an IP address is the legitimate owner of that IP 454 address. 456 If clients do not follow these rules, it is possible for an address 457 to be in use by more than one client. For a single server, this hap- 458 pens because the server has leased the expired address to another 459 client and the original client is also attempting to use the address. 460 The server would NAK the renewal request. This is made slightly worse 461 in the failover protocol if the two servers are unable to communicate 462 with each other and one server leases an available address to a new 463 client while the other server receives a renewal from a different 464 client. In this case, both servers lease the same address to dif- 465 ferent clients for the MCLT time. 467 One troublesome issue is that of the DHCP client responsibility when 468 sending in DHCPREQUEST/INIT-REBOOT requests. While the original DHCP 469 RFC was written to require a DHCP client to have time left to run on 470 the lease for an IP address if the client is sending an INIT-REBOOT 471 request, it was sufficiently unclear that some client vendors didn't 472 realize this until recently. Since the INIT-REBOOT request was sent 473 with the IP address in the dhcp-requested-address option and not in 474 the ciaddr (for perfectly good reasons), the similarity to the RENEW 475 and REBINDING case was lost on many people. 477 At present, the failover protocol does not assume that a client send- 478 ing in an INIT-REBOOT request necessarily has a valid lease on the IP 479 address appearing in the dhcp-requested-address option in the INIT- 480 REBOOT request. 482 The implications of this are as follows: Assume that there is a DHCP 483 client that gets a lease from one server while that server is unable 484 to communicate with its failover partner. Then, assume that after 485 that client reboots it is able only to communicate with the other 486 failover server. If the failover servers have not been able to com- 487 municate with each other during this process, then the DHCP client 488 will get a new IP address instead of being able to continue to use 489 its existing IP address. This will affect no applications on the DHCP 490 client, since it is rebooting. However, it will use up an additional 491 IP address in this marginal case. 493 3.1.3. Stable storage update before DHCPACK 495 The DHCP protocol allocates resources, and in order to operate 496 correctly it requires that a DHCP server update some form of stable 497 storage prior to sending a DHCPACK to a DHCP client in order to grant 498 that client a lease on an IP address. 500 One of the goals of the failover protocol is that it not add signifi- 501 cant additional time to this already time consuming requirement to 502 update stable storage prior to a DHCPACK. In particular, adding a 503 requirement to communicate with another server prior to sending a 504 DHCPACK would greatly simplify the failover protocol, but it would 505 unacceptably limit the potential scalability of any DHCP server which 506 employed the failover protocol. 508 3.2. BOOTP relay agent implementation 510 Many DHCP clients are not resident on the same network segment as a 511 DHCP server. In order to support this form of network architecture, 512 most contemporary routers implement something known as a BOOTP Relay 513 Agent. This capability inside of a router listens for all broadcasts 514 at the DHCP port, port 67, and will relay any broadcasts that it 515 receives on to a DHCP server. The IP address of the DHCP server must 516 have been previously configured into the router. As part of the 517 relay process, the relay agent will place the address of the inter- 518 face on which it received the broadcast into the giaddr field of the 519 DHCP packet. 521 Since the failover protocol requires two DHCP servers to receive any 522 broadcast DHCP messages, in order to work with DHCP clients which are 523 not local to the DHCP server, the BOOTP relay agent on the router 524 closest to the DHCP client must be configured to point at more than 525 one DHCP server. 527 Most BOOTP relay agent implementations allow this duplication of 528 packets. 530 If this is not possible, an administrator might be able to configure 531 the relay agent with a subnet broadcast address, but in this case the 532 primary and secondary DHCP servers in a failover pair must both 533 reside on the same subnet. 535 3.3. What does it mean if a server can't communicate with its partner? 537 In any protocol designed to allow one server to take over some 538 responsibilities from a partner server in the event of "failure" of 539 that partner server, there is an inherent difficulty in determining 540 when that partner server has failed. 542 In fact, it is fundamentally impossible for one server to distinguish 543 a network communications failure from the outright failure of the 544 server to which it is trying to communicate. In the case where each 545 server is handing out resources (in this case IP addresses) to a 546 client community, mistaking an inability to communicate with a 547 partner server for failure of that partner server could easily cause 548 both servers to be handing out the same IP addresses to different 549 clients. 551 One way that this is sometimes handled is for there to be more than 552 two servers. In the case of an odd number of servers, the servers 553 that can still communicate with a majority of other servers will con- 554 sider themselves operational, and any server which can't communicate 555 to a majority of other servers must immediately cease operations. 557 While this technique works in some domains, having the only server to 558 which a DHCP client can communicate voluntarily shut itself down 559 seems like something worth avoiding. 561 The failover protocol will operate correctly while both servers are 562 unable to communicate, whether they are both running or not. At some 563 point there may be resource contention, and if one of the servers is 564 actually down, then the operator can inform the operational server 565 and the operational server will be able to use all of the failed 566 server's resources. 568 The protocol also allows detection of an orderly shutdown of a 569 participating server. 571 3.4. Challenging scenarios for a Failover protocol 573 There exist two failure scenarios which provide particular challenges 574 to the correctness guarantees of a failover protocol. 576 3.4.1. Primary Server crash before "lazy" update: 578 In the case where the primary server sends a DHCPACK to a client for 579 a newly allocated IP address and then crashes prior to sending the 580 corresponding update to the secondary server, the secondary server 581 will have no record of the IP address allocation. When the secondary 582 server takes over, it may well try to allocate that IP address to a 583 different client. In the case where the first client to receive the 584 IP address is not on the net at the time (yet while there was still 585 time to run on its lease), an ICMP echo (i.e., ping) will not prevent 586 the secondary server from allocating that IP address to a different 587 client. 589 The failover protocol deals with this situation by having the primary 590 and secondary servers allocate addresses for new clients from dis- 591 joint address pools. See section 5.5 for details. 593 A more likely (in that DHCPRENEWs are presumably more common than 594 DHCPDISCOVERs) and more subtle version of this problem is where the 595 primary server crashes after extending a client's lease time, and 596 before updating the secondary with a new time using a lazy update. 597 After the secondary takes over, if the client is not connected to the 598 network the secondary will believe the client's lease has expired 599 when, in fact, it has not. In this case as well, the IP address 600 might be reallocated to a different client while the first client is 601 still using it. 603 This scenario is handled by the failover protocol through control of 604 the lease time and the use of the maximum client lead time (MCLT). 605 See section 5.2.1 for details. 607 3.4.2. Network partition where DHCP servers can't communicate but each 608 can talk to clients: 610 Several conditions are required for this situation to occur. First, 611 due to a network failure, the primary and secondary servers cannot 612 communicate. As well, some of the DHCP clients must be able to com- 613 municate with the primary server, and some of the clients must now 614 only be able to communicate with the secondary server. When this 615 condition occurs, both primary and secondary servers could attempt to 616 allocate IP addresses for new clients from the same pool of available 617 addresses. At some point, then, two clients will end up being allo- 618 cated the same IP address. This will cause problems when the network 619 failure that created this situation is corrected. 621 The failover protocol deals with this situation by having the primary 622 and secondary servers allocate addresses for new clients from dis- 623 joint address pools. See section 5.5 for details. 625 3.5. Using TCP to detect partner server failure 627 There are several characteristics of TCP that are important to the 628 functioning of the failover protocol, which uses one TCP connection 629 for both bulk data transfer as well as to assess communications 630 integrity with the other server. Reliable and ordered message 631 delivery are chief among these important characteristics. 633 It would be nice to use the capabilities built in to TCP to allow it 634 to determine if communications integrity exists to the failover 635 partner but this strategy contains some problems which require 636 analysis. There exist three fundamental cases for an open TCP con- 637 nection that must be examined. 639 1. When no data is being sent then no messages are traveling 640 across the TCP connection. 642 2. When data is queued to be sent, and the receiver has not 643 blocked the sending of additional data, then messages are 644 flowing across the TCP connection containing the applications 645 data. 647 3. When data is queued to be sent, and the receiver has blocked 648 the transmission of additional data, then persist messages are 649 flowing from the receiver to the sender to ensure that the 650 sender doesn't miss the receiver opening the window for 651 further transmissions. 653 The first case can be turned into the second case by sending 654 application-level keep-alive messages periodically when there is no 655 other data queued to be sent. Note TCP keep-alive messages might be 656 used as well, but they present additional problems. 658 Thus, we can ensure that the TCP connection has messages flowing 659 periodically across the connection fairly easily. The question 660 remains as to what TCP will do if the other end of the connection 661 fails to respond (either because of network partition or because the 662 receiving server crashes). TCP will attempt to retransmit a message 663 with an exponential backoff, and will eventually timeout that 664 retransmission. However, the length of that timeout cannot, in 665 general, be set on a per-connection basis, and is frequently as long 666 as nine minutes, though in some cases it may be as short as two 667 minutes. On some systems it can be set system-wide, while on other 668 systems it cannot be changed at all. 670 A value for this timeout that would be appropriate for the failover 671 protocol, say less than 1 minute, could have unpleasant side-effects 672 on other applications running on the same server, assuming that it 673 could be changed at all on the host operating system. 675 Nine minutes is a long time for the DHCP service to be unavailable to 676 any new clients that were being served by the server which has 677 crashed, when there is another server running that could respond to 678 them as soon as it determines that its partner is not operational. 680 The conclusion drawn from this analysis is that TCP provides very 681 useful support for the failover protocol in the areas of reliable and 682 ordered message delivery, but cannot by itself be relied upon to 683 detect partner server failure in a fashion acceptable to the needs of 684 the failover protocol. Additional failover protocol capabilities 685 have been created to support timely detection of partner server 686 failure. See section 8.3 for details on this mechanism. 688 4. Design Goals 690 This section lists the design goals and the limitations of the fail- 691 over protocol. 693 4.1. Design goals for this protocol 695 The following is a list of goals that are met by this protocol. They 696 are listed in priority order. 698 1. Implementations of this protocol must work with existing DHCP 699 client implementations based on the DHCP protocol [1]. 701 2. Implementations of the protocol must work with existing BOOTP 702 relay agent implementations. 704 3. The protocol must provide failover redundancy between servers 705 that are not located on the same subnet. 707 4. Provide for continued service to DHCP clients through an 708 automated mechanism in the event of failure of the primary 709 server. 711 5. Avoid binding an IP address to a client while that binding is 712 currently valid for another client. In other words, do not 713 allocate the same IP address to two clients. 715 6. Minimize any need for manual administrative intervention. 717 7. Introduce no additional delays in server response time as a 718 result of the network communications required to implement the 719 failover protocol, i.e., don't require communications with the 720 partner between the receipt of a DHCPREQUEST and the 721 corresponding DHCPACK. 723 8. Share IP address ranges between primary and secondary servers; 724 i.e., impose no requirement that the pool of available 725 addresses be manually or permanently divided between servers. 727 9. Continue to meet the goals and objectives of this protocol in 728 the event of server failure or network partition. 730 10. Provide graceful reintegration of full protocol service after 731 server failure or network partition. 733 11. Allow for one computer to act as a secondary server for multi- 734 ple primary servers. The protocol must allow failover primary 735 and secondary configuration choices to be made at a granular- 736 ity smaller than "all of the subnets served by a single 737 server", though individual implementations may not choose to 738 allow such flexibility. 740 12. Ensure that an existing client can keep its existing IP 741 address binding if it can communicate with either the primary 742 or secondary DHCP server implementing this protocol - not just 743 whichever server that originally offered it the binding. 745 13. Ensure that a new client can get an IP address from some 746 server. Ensure that in the face of partition, where servers 747 continue to run but cannot communicate with each other, the 748 above goals and requirements may be met. In addition, when 749 the partition condition is removed, allow graceful automatic 750 re-integration without requiring human intervention. 752 14. If either primary or secondary server loses all of the infor- 753 mation that it has stored in stable storage, ensure that it be 754 able to refresh its stable storage from the other server. 756 15. Support load balancing between the primary and secondary 757 servers, and allow configuration of the percentage of the 758 client population served by each with a moderately fine granu- 759 larity. 761 4.2. Limitations of this protocol 763 The following are explicit limitations of this protocol. 765 1. This protocol provides only one level of redundancy through a 766 single secondary server for each primary server. 768 2. A subset of the address pool is reserved for secondary server 769 use. In order to handle the failure case where both servers 770 are able to communicate with DHCP clients, but unable to com- 771 municate with each other, a subset of the IP address pool must 772 be set aside as a private address pool for the secondary 773 server. The secondary can use these to service newly arrived 774 DHCP clients during such a period. The required size of this 775 private pool is based only on the arrival rate of new DHCP 776 clients and the length of expected downtime, and is not influ- 777 enced in any way by the total number of DHCP clients supported 778 by the server pair. 780 The failover protocol can be used in a mode where both the 781 primary and secondary servers can share the load between them 782 when both are operating. In this load balancing mode, the 783 addresses allocated by the primary server to the secondary 784 server are not unused, but are used instead to service the 785 portion of the client base to which the secondary server is 786 required to respond. See section 5.3 for more information on 787 load balancing. 789 3. The primary and secondary servers do not respond to client 790 requests at all while recovering from a failure that could 791 have resulted in duplicate IP assignments. (When synchroniz- 792 ing in POTENTIAL-CONFLICT state). 794 5. Protocol Overview 796 This section will discuss the failover protocol at a relatively high 797 level of detail. In the event that a description in this section 798 conflicts (or appears to conflict due to the overview nature of this 799 section) with information in later sections of this draft, the infor- 800 mation in the later sections should be considered authoritative. 802 5.1. Messages and States 804 This protocol is centered around the message exchange used by one 805 server to update the other server of binding database changes result- 806 ing from DHCP client activity: 808 o Communication of binding database changes 810 The binding update (BNDUPD) message is used to send the binding 811 database changes to the partner server, and the partner server 812 responds with a binding acknowledgement (BNDACK) message when it 813 has successfully committed those changes to its own stable 814 storage. 816 All of the other messages involve ancillary issues: 818 o Management of available IP addresses 820 The pool request (POOLREQ) is used by the secondary server to 821 request an allocation of IP addresses from the primary server. 822 The pool response (POOLRESP) is used by the primary server to 823 inform the secondary server how many IP addresses were allocated 824 to the secondary server as the result of the pool request. 826 o Synchronization of the binding databases between the servers 827 after they've been out of communications 829 The update request (UPDREQ) message is used by one server to 830 request that its partner send it all binding database informa- 831 tion that it has not already seen. The update request all 832 (UPDREQALL) message is used by one server to request that all 833 binding database information be sent in order to recover from a 834 total loss of its binding database by the requesting server. 835 The update done (UPDDONE) message is used by the responding 836 server to indicate that all requested updates have been sent the 837 responding server and acked by the requesting server. 839 o Connection establishment 841 The connect (CONNECT) message is used by the primary server to 842 establish a high level connection with the other server, and to 843 transmit several important configuration data items between the 844 servers. The connect acknowledgement message (CONNECTACK) is 845 used by the secondary server to respond to a CONNECT message 846 from the primary server. The disconnect (DISCONNECT) message is 847 used by either server when closing a connection. 849 o Server synchronization 851 The state change (STATE) message is used by either server to 852 inform the other server of a change of failover state. 854 o Connection integrity management 855 The contact (CONTACT) message is used by either server to ensure 856 that the other server continues to see the connection as opera- 857 tional. It MUST be transmitted periodically over every esta- 858 blished connection if other message traffic is not flowing, and 859 it MAY be sent at any time. 861 5.1.1. Failover endpoints 863 The proper operation of the failover protocol requires more than the 864 transmission of messages between one server and the other. Each end- 865 point might seem to be a single DHCP server, but in fact there are 866 many situations where additional flexibility in configuration is use- 867 ful. 869 For instance, there might be several servers which are each primary 870 for a distinct set of address pools, and one server which is secon- 871 dary for all of those address pools. The situation with the pri- 872 maries is straightforward, but the secondary will need to maintain a 873 separate failover state, partner state, and communications up/down 874 status for each of the separate primary servers for which it is act- 875 ing as a secondary. 877 The failover protocol calls for there to be a unique failover end- 878 point per partner per role (where role is primary or secondary). 879 This failover endpoint can take actions and hold unique states. 880 There are thus a maximum of two failover endpoints per partner (one 881 for the partner as a primary and one for that same partner as a 882 secondary.) 884 Thus, in the case where there are two primary servers A and B each 885 backed up by a single common secondary server C, there is one fail- 886 over endpoint on each of A and B, and two different failover end- 887 points on C. The two different failover endpoints on C each have 888 unique states and independent TCP connections. 890 This document frequently describes the behavior of the protocol in 891 terms of primary and secondary servers, not primary and secondary 892 failover endpoints. However, it is important to remember that every 893 'server' described in this document is in reality a failover endpoint 894 that resides in a particular process, and that many failover end- 895 points may reside in the same process. 897 It is not the case that there is a unique failover endpoint for each 898 subnet address pool that participates in a failover relationship. On 899 one server, there is one failover endpoint per partner per role, 900 regardless of how many subnet address pools are managed by that com- 901 bination of partner and role. Conversely, on a particular server, 902 any given subnet address pool will be associated with exactly one 903 failover endpoint. 905 When a connection is received from the partner, the unique failover 906 endpoint to which the message is directed is determined solely by the 907 IP address of the partner and the port to which the connection is 908 directed by the partner. See section 8.2. 910 5.2. Fundamental guarantees 912 There a several fundamental restrictions this protocol places on what 913 one server can do in the absence of knowledge of the other server. 914 Operating within these restrictions allows certain guarantees to be 915 made to the partner server, and these are key to the correct opera- 916 tion of the protocol. 918 5.2.1. Control of lease time 920 The key problem with lazy update is that when a server fails after 921 updating a client with a particular lease time and before updating 922 its partner, the partner will believe that a lease has expired even 923 though the client still retains a valid lease on that IP address. 925 In order to handle this problem, a period of time known as the "Max- 926 imum Client Lead Time" (MCLT) is defined and must be known to both 927 the primary and secondary servers. Proper use of this time interval 928 places an upper bound on the difference allowed between the lease 929 time provided to a DHCP client by a server and the lease time known 930 by that server's partner. However, the MCLT is typically much less 931 than the lease time that a server has been configured to offer a 932 client, and so some strategy must exist to allow a server to offer 933 the configured lease time to a client. During a lazy update the 934 updating server typically updates its partner with a potential 935 expiration time which is longer than the lease time previously given 936 to the client and which is longer than the lease time that the server 937 has been configured to give a client. This allows that server to 938 give a longer lease time to the client the next time the client 939 renews its lease, since the time that it will give to the client will 940 not exceed the MCLT beyond the potential expiration time acknowledged 941 by its partner. 943 The PARTNER-DOWN state exists so that a server can be sure that its 944 partner is, indeed, down. Correct operation while in that state 945 requires (generally) that the server wait the MCLT after anything 946 that happened prior to its transition into PARTNER-DOWN state (or, 947 more accurately, when the other server went down if that is known). 948 Thus, the server MUST wait the MCLT after the partner server went 949 down before allocating any of the partner's addresses which were 950 available for allocation. In the event the partner was not in 951 communication prior to going down, it might have allocated one or 952 more of its FREE addresses to a DHCP client and been unable to inform 953 the server entering PARTNER-DOWN prior to going down itself. By 954 waiting the MCLT after the time the partner went down, the server in 955 PARTNER-DOWN state ensures that any clients which have a lease on one 956 of the partner's FREE addresses will either time out or contact the 957 server in PARTNER-DOWN by the time that period ends. 959 In addition, once a server has transitioned to PARTNER-DOWN state, it 960 MUST NOT reallocate an IP address from one client to another client 961 until an additional MCLT interval after the lease by the original 962 client expires. (Actually, until the maximum client lead time after 963 what it believes to be the lease expiration time of the client.) 965 Some optimizations exist for this restriction, in that it only 966 applies to leases that were issued BEFORE entering PARTNER-DOWN. Once 967 a server has entered PARTNER-DOWN and it leases out an address, it 968 need not wait this time as long as it has never communicated with the 969 partner since the lease was given out. 971 The fundamental relationship on which much of the correctness of this 972 protocol depends is that the lease expiration time known to a DHCP 973 client MUST NOT be more than the maximum client lead time greater 974 than the potential expiration time known to a server's partner. 976 The remainder of this section makes the above fundamental relation- 977 ship more explicit. 979 This protocol requires a DHCP server to deal with several different 980 lease intervals and places specific restrictions on their relation- 981 ships. The purpose of these restrictions is to allow the other server 982 in the pair to be able to make certain assumptions in the absence of 983 an ability to communicate between servers. 985 The different lease times are: 987 o desired lease interval 989 The desired lease interval is the lease interval that a DHCP 990 server would like to give to a DHCP client in the absence of any 991 restrictions imposed by the Failover protocol. Its determina- 992 tion is outside of the scope of this protocol. Typically this is 993 the result of external configuration of a DHCP server. 995 o actual lease interval 997 The actual lease internal is the lease interval that a DHCP 998 server gives out to a DHCP client in the dhcp-lease-time option 999 of a DHCPACK packet. It may be shorter than the desired client 1000 lease interval (as explained below). 1002 o potential lease interval 1004 The potential lease interval is the lease expiration interval 1005 the local server tells to its partner in the potential- 1006 expiration-time option of a BNDUPD message. 1008 o acknowledged potential lease interval 1010 The acknowledged potential lease interval is the potential lease 1011 interval the partner server has most recently acknowledged in 1012 the potential-expiration-time option of a BNDACK message. 1014 The key restriction (and guarantee) that any server makes with 1015 respect to lease intervals is that the actual client lease interval 1016 never exceeds the acknowledged potential lease interval (if any) by 1017 more than a fixed amount. This fixed amount is called the "Maximum 1018 Client Lead Time" (MCLT). 1020 The MCLT MAY be configurable on the primary server, but for correct 1021 server operation it MUST be the same and known to both the primary 1022 and secondary servers. The secondary server determines the MCLT from 1023 the MCLT option sent from the primary server to the secondary server 1024 in the CONNECT message. 1026 A server MUST record in its stable storage both the actual lease 1027 interval and the most recently acknowledged potential lease interval 1028 for each IP address binding. It is assumed that the desired client 1029 lease interval can be determined through techniques outside of the 1030 scope of this protocol. See section 7.1.5 for more details concern- 1031 ing the times that the server MUST record in its stable storage and 1032 the way that they interact with the lease time that may be offered to 1033 a DHCP client. 1035 Again, the fundamental relationship among these times which MUST be 1036 maintained is: 1038 actual lease interval < 1039 ( acknowledged potential lease interval + MCLT ) 1041 Figure 5.2.1-1 illustrates an initial lease to a client using the 1042 rules discussed in the example which follows it. Note that this is 1043 only one example -- as long as the fundamental relationship is 1044 preserved, the actual times used could be quite different. 1046 DHCP Primary Secondary 1047 time Client Server Server 1049 | (time in intervals) | (absolute time) | 1050 | | | 1051 | >-DHCPDISCOVER-> | | 1052 | <---DHCPOFFER-< | | 1053 | | | 1054 | >-DHCPREQUEST-> | | 1055 | (selecting) | | 1056 | | | 1057 t | <--------DHCPACK-< | | 1058 | lease-time=MCLT | | 1059 | | >-BNDUPD--> | 1060 | | lease-expiration=t+MCLT 1061 | | potential-expiration=t+(MCLT/2)+X 1062 | | | 1063 | | <-BNDACK-< | 1064 | | potential-expiration=t+(MCLT/2)+X 1065 ... ... ... 1066 | | | 1067 t+MCLT/2 | >-DHCPREQUEST-> | | 1068 | (renew) | | 1069 | | | 1070 t1 | <--------DHCPACK-< | | 1071 | lease-time=X | | 1072 | | >-BNDUPD--> | 1073 | | lease-expiration=t1+X 1074 | | potential-expiration=t1+(X/2)+X 1075 | | | 1076 | | <-BNDACK-< | 1077 | | potential-expiration=t1+(X/2)+X 1078 ... ... ... 1080 Figure 5.2.1-1: Lazy Update Message Traffic 1081 X = Desired Lease Interval 1082 Assumes renewal interval = lease interval / 2 1084 DISCUSSION: 1086 This protocol mandates only that the above fundamental relation- 1087 ship concerning lease intervals is preserved. 1089 In the interests of clarity, however, let's examine a specific 1090 example. The MCLT in this case is 1 hour. The desired lease 1091 interval is 3 days, and its renewal time is half the lease 1092 interval. 1094 The rules for this example are: 1096 o What to tell the client: 1098 Take the remainder of the acknowledged potential lease interval. 1099 If this is a new lease, then this value will be zero. If this 1100 remainder plus the MCLT is greater than the desired lease inter- 1101 val, give the client the desired lease interval else give the 1102 client the remainder plus the MCLT. 1104 o What to tell the failover partner server: 1106 Take the renewal interval (typically half of the actual client 1107 lease interval), add to it the desired lease interval, and add 1108 it to the current time to yield the value that goes into the 1109 potential-expiration-time option. 1111 Also tell the failover partner the actual lease interval by 1112 adding it to the current time to yield the value that goes into 1113 the lease-expiration option. 1115 In operation this might work as follows: 1117 When a server makes an offer for a new lease on an IP address to a 1118 DHCP client, it determines the desired lease interval (in this 1119 case, 3 days). It then examines the acknowledged potential lease 1120 interval (which in this case is zero) and determines the remainder 1121 of the time left to run, which is also zero. To this it adds the 1122 MCLT. Since the actual lease interval cannot be allowed to exceed 1123 the remainder of the current acknowledged potential lease interval 1124 plus the MCLT, the offer made to the client is for the remainder 1125 of the current acknowledged potential lease interval (i.e., zero) 1126 plus the MCLT. Thus, the actual lease interval is 1 hour. 1128 Once the server has performed the BNDACK to the DHCP client, it 1129 will update the secondary server with the lease information. How- 1130 ever, the desired potential lease interval will be composed of the 1131 one half of the current actual lease interval added to the desired 1132 lease interval. Thus, the secondary server is updated with a 1133 BNDUPD with a lease interval of 3 days + 1/2 hour specified in the 1134 potential-expiration-time option. 1136 When the primary server receives an ACK to its update of the 1137 secondary server's (partner's) potential lease interval, it 1138 records that as the acknowledged potential lease interval. A 1139 server MUST NOT send a BNDACK in response to a BNDUPD message 1140 until it is sure that the information in the BNDUPD message 1141 resides in its stable storage. Thus, the primary server in this 1142 case can be sure that the secondary server has recorded the poten- 1143 tial lease interval in its stable storage when the primary server 1144 receives a BNDACK message from the secondary server. 1146 When the DHCP client attempts to renew at T1 (approximately one 1147 half an hour from the start of the lease), the primary server 1148 again determines the desired lease interval, which is still 3 1149 days. It then compares this with the remaining acknowledged 1150 potential lease interval (3 days + 1/2 hour) and adjusts for the 1151 time passed since the secondary was last updated (1/2 hour). Thus 1152 the time remaining of the acknowledged potential lease interval is 1153 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which 1154 is more than the desired lease interval of 3 days. So the client 1155 is renewed for the desired lease interval -- 3 days. 1157 When the primary DHCP server updates the secondary DHCP server 1158 after the DHCP client's renewal ACK is complete, it will calculate 1159 the desired potential lease interval as the T1 fraction of the 1160 actual client lease interval (1/2 of 3 days this time = 1.5 days). 1161 To this it will add the desired client lease interval of 3 days, 1162 yielding a total desired partner server lease interval of 4.5 1163 days. In this way, the primary attempts to have the secondary 1164 always "lead" the client in its understanding of the client's 1165 lease interval so as to be able to always offer the client the 1166 desired client lease interval. 1168 Once the initial actual client lease interval of the MCLT is past, 1169 the protocol operates effectively like the DHCP protocol does 1170 today in its behavior concerning lease intervals. However, the 1171 guarantee that the actual client lease interval will never exceed 1172 the remaining acknowledged partner server lease interval by more 1173 than the MCLT allows full recovery from a variety of failures. 1175 5.2.2. Controlled re-allocation of IP addresses 1177 When in PARTNER-DOWN state there is a waiting period after which an 1178 IP address can be re-allocated to another client. For leases which 1179 are available when the server enters PARTNER-DOWN state, the period 1180 is the MCLT from entry into PARTNER-DOWN state. For IP addresses 1181 which are not available when the server enters PARTNER-DOWN state, 1182 the period is the MCLT after the lease becomes available. See sec- 1183 tion 9.4.2 for more details. 1185 In any other state, a server cannot reallocate an address from one 1186 client to another without first notifying its partner (through a 1187 BNDUPD message) and receiving acknowledgement (through a BNDACK 1188 message) that its partner is aware that that first client is not 1189 using the address. 1191 This could be modeled in the following way. Though this specific 1192 implementation is in no way required, it may serve to better illus- 1193 trate the concept. 1195 An "available" IP address on a server may be allocated to any client. 1196 An IP address which was leased to a client and which expired or was 1197 released by that client would take on a new state, EXPIRED or 1198 RELEASED respectively. The partner server would then be notified 1199 that this IP address was EXPIRED or RELEASED through a BNDUPD. When 1200 the sending server received the BNDACK for that IP address showing it 1201 was FREE, it would move the IP address from EXPIRED or RELEASED to 1202 FREE, and it would be available for allocation by the primary server 1203 to any clients. 1205 A server MAY reallocate an IP address in the EXPIRED or RELEASED 1206 state to the same client with no restrictions provided it has not 1207 sent a BNDUPD message to its partner. This situation would exist if 1208 the lease expired or was released after the transition into PARTNER- 1209 DOWN state, for instance. 1211 5.3. Load balancing 1213 In order to implement load balancing between a primary and secondary 1214 server pair, each server must respond to DHCPDISCOVER requests from 1215 some clients and not from other clients. In order to do this suc- 1216 cessfully, each server must be able to determine immediately upon 1217 receipt of a DHCP client request whether it is to service this 1218 request or to ignore it in order to allow the other server to service 1219 the request. 1221 In addition, it should be possible to configure the percentage of 1222 clients which will be serviced by either the primary or secondary 1223 server. This configuration should be more or less continuous, from 1224 all clients serviced by the primary through an even split with half 1225 serviced by each, to all clients serviced by the secondary. 1227 The technique chosen to support these goals is described in [LOADB]. 1229 A bitmap-style Hash Bucket Assignment (as described in [LOADB]) is 1230 used to determine which DHCP clients can be processed. There are two 1231 potential HBA's in a failover server -- a server HBA and a failover 1232 HBA. The way that a server acquires a server HBA is outside of the 1233 scope of the failover protocol, but both servers in a failover pair 1234 MUST have the same server HBA. The failover HBA (which specifies the 1235 clients that the secondary is supposed to process) is sent by the 1236 primary server to the secondary server whenever a connection is esta- 1237 blished, using the hash-bucket-assignment option defined in section 1238 12.11. 1240 When using the server HBA (if any) and the failover HBA (if any), to 1241 decide whether to process a DHCP request, the server HBA always 1242 applies in every failover state, and the failover HBA (which MUST be 1243 a subset of the server HBA) is used by the secondary server to decide 1244 which packets to process when in NORMAL state. 1246 5.4. IP address allocations between servers 1248 The failover protocol allows a DHCP server which implements it to 1249 operate correctly in spite of the uncertainty over whether its 1250 partner has failed or whether the communications link to its partner 1251 has failed. This is made possible in part by the existence of 1252 separate address pools on each server for allocation to newly arrived 1253 DHCP clients. 1255 Thus, each server has its own pool of available IP addresses. Note 1256 that an IP address is not "owned" by a particular server throughout 1257 its entire lifetime. Only an IP address which is available is 1258 "owned" by a particular server -- once it has been leased to a DHCP 1259 client, it is not owned by either failover partner. When it finally 1260 becomes available again, it will be owned initially by the primary 1261 server, and it may or may not be allocated to the secondary server by 1262 the primary server. 1264 So, the flow of IP address ownership is as follows: initially an IP 1265 address is owned by the primary server. It may be allocated to the 1266 secondary server if it is available, and then it is owned by the 1267 secondary server. Either server can allocate available IP addresses 1268 which they own to DHCP clients, in which case they cease to own them. 1269 When the DHCP client releases the address or the lease on it expires, 1270 it will again become available and will be owned by the primary. 1272 An IP address will not become owned by the server which allocated it 1273 initially when it is released or the lease expires because, in gen- 1274 eral, that server will have had to replenish its pool of available 1275 addresses well in advance of any likely lease expirations. Thus, 1276 having a particular IP address cycle back to the secondary might well 1277 put the secondary more out of balance with respect to the primary as 1278 it is to enhance the balance of available addresses between them. 1280 These address pools are used when in COMMUNICATIONS-INTERRUPTED state 1281 and while waiting for the MCLT expiration in PARTNER-DOWN state. In 1282 addition, when using load balancing, these pools are used when in 1283 NORMAL state as well. 1285 These allocation and maintenance of these address pools is an area of 1286 some sensitivity, since the goal is to maintain a more or less con- 1287 stant ratio of available addresses between the two servers. 1289 The initial allocation when the servers first integrate is triggered 1290 by the POOLREQ message from the secondary to the primary. This is 1291 followed by the POOLRESP message where the primary tells the secon- 1292 dary how many IP addresses it allocated to the secondary. Then, the 1293 primary sends the allocated IP addresses to the secondary. The 1294 POOLREQ/POOLRESP message is a trigger to the primary to perform a 1295 scan of its database and to ensure that the secondary has enough IP 1296 addresses (based on some configured ratio). 1298 The actual IP addresses are sent to the secondary using the BNDUPD 1299 message with a state of BACKUP, which indicates the IP address is now 1300 available for allocation by the secondary. 1302 The POOLREQ/POOLRESP message exchange initiated by the secondary is 1303 valid at any time, and the primary server SHOULD, whenever it 1304 receives the POOLREQ message, scan its database of address pools and 1305 determine if the secondary needs more IP addresses from any of the IP 1306 address pools. 1308 However, in order to support a reasonably dynamic balance of the IP 1309 addresses between the failover partners, the primary server needs to 1310 do additional work to ensure that the secondary server has as many IP 1311 addresses as it needs (but that it doesn't have *more* than it needs 1312 either). 1314 The primary server SHOULD examine the balance of available addresses 1315 between the primary and secondary for a particular address pool when- 1316 ever the number of available addresses for either the primary or 1317 secondary changes. The primary server SHOULD adjust the available 1318 address balance as required to ensure the configured address balance, 1319 excepting that the primary server SHOULD employ some threshold 1320 mechanism to such a balance adjustment in order to minimize the over- 1321 head of maintaining this balance. 1323 An example of a threshold approach is: do not attempt to re-balance 1324 the available pools on the primary and secondary until the out of 1325 balance value exceeds a configured value. 1327 The primary server can, at any time, send an available IP address to 1328 the secondary using a BNDUPD with the state BACKUP. The primary 1329 server can attempt to take an available IP address away from the 1330 secondary by sending a BNDUPD with the state FREE. If the secondary 1331 accepts the BNDUPD, then it is now available to the PRIMARY and not 1332 available to the secondary. Of course, the secondary MUST reject 1333 that BNDUPD if it has already used that IP address for a DHCP client. 1335 Whenever the primary server examines the possible available IP 1336 addresses which it could send to the secondary server, the primary 1337 server SHOULD take into account whether load balancing is in use, and 1338 if it is the primary server SHOULD attempt to send to the secondary 1339 any IP addresses whose most recent client would be processed by the 1340 secondary under the current load balancing regime in use. Likewise, 1341 when removing available IP addresses from the secondary server when 1342 load balancing is in use, the primary server SHOULD first remove 1343 those IP addresses whose most recent client would be processed by the 1344 primary server under the current load balancing regime in use. 1346 5.5. Operating in NORMAL state 1348 When in NORMAL state, each server services DHCPDISCOVER's and all 1349 other DHCP requests other than DHCPREQUEST/RENEWAL or 1350 DHCPREQUEST/REBINDING from the client set defined by the load balanc- 1351 ing algorithm [LOADB]. Each server services DHCPREQUEST/RENEWAL or 1352 DHCPDISCOVER/REBINDING requests from any client. 1354 In general, whenever the binding database is changed in stable 1355 storage (other than a change resulting from receiving a BNDUPD from 1356 the failover partner), then a BNDUPD message is sent with the con- 1357 tents of that change to the partner server. The partner server then 1358 writes the information about that binding in its bindings database in 1359 stable storage and replies with a BNDACK message. 1361 The binding database in a DHCP server would normally be changed as a 1362 result of DHCP protocol activity with a DHCP client (e.g., granting 1363 a lease to a DHCP client through the familiar 1364 DISCOVER/OFFER/REQUEST/ACK cycle or extending a lease due to a 1365 renewal from a DHCP client) or possibly (on some servers) because a 1366 lease has expired or undergone another state change that must be 1367 recorded in the DHCP binding database. These are the state changes 1368 that would be communicated to the partner server using a BNDUPD mes- 1369 sage. Of course, receipt of a BNDUPD message itself will normally 1370 cause an update of the binding database for all of the IP addresses 1371 contained in the BNDUPD, and a binding database change such as this 1372 MUST NOT trigger a corresponding BNDUPD message to the partner. 1374 5.6. Operating in COMMUNICATIONS-INTERRUPTED state 1376 When operating in COMMUNICATIONS-INTERRUPTED state, each server is 1377 operating independently, but does not assume that its partner is not 1378 operating. The partner server might be operating and simply unable 1379 to communicate with this server, or might not be operating. 1381 Each server responds to the full range of DHCP client messages that 1382 it receives, but in such a way that graceful reintegration is always 1383 possible when its partner comes back into contact with it. 1385 5.7. Operating in PARTNER-DOWN state 1387 When operating in PARTNER-DOWN state, a server assumes that its 1388 partner is not currently operating, but does make allowances for the 1389 possibility that that server was operating in the past, though possi- 1390 bly out of communications with this server. It responds to all DHCP 1391 client requests in PARTNER-DOWN state. 1393 5.8. Operating in RECOVER state 1395 A server operating in RECOVER state assumes that it is reintegrating 1396 with a server that has been operating in PARTNER-DOWN state, and that 1397 it needs to update its bindings database before it services DHCP 1398 client requests. 1400 A server may also operate in RECOVER state in order to fully recover 1401 its bindings database from its partner server. 1403 5.9. Operating in STARTUP state 1405 A server operating in STARTUP state assumes that failover is opera- 1406 tional, and it spends a short time whenever it comes up attempting to 1407 contact the partner. During this time (generally a few seconds), the 1408 server is unresponsive to DHCP client requests. This period exists 1409 in order to give a server a chance to determine that its partner has 1410 changed state since it was last in communications, and to react to 1411 that changed state (if any) prior to responding to DHCP client 1412 requests. 1414 The period of time a server remains in STARTUP state SHOULD be long 1415 enough to ensure that it will connect to the other server if that 1416 server is available for connections. 1418 5.10. Time synchronization between servers 1420 The failover protocol is designed to operate between two servers 1421 which have time values which differ by an arbitrarily large amount. 1422 A particular implementation MAY choose to only support servers whose 1423 time values differ by an arbitrarily small amount. 1425 In any event, whether large or only small differences in time values 1426 are supported, every message that is received MUST be tagged with a 1427 time value as soon as possible after receipt. This time value is 1428 used along with the time value that is sent in every message between 1429 the failover partners to develop a delta time between the servers. 1430 This delta time is used during the connection process to establish a 1431 baseline delta time between the servers, and upon receipt of each 1432 message, the delta time for that message is used to refine the delta 1433 time for the server pair. 1435 While the algorithm for this refinement of delta time is not speci- 1436 fied as part of this protocol, a server SHOULD allow the delta time 1437 value for a pair of failover servers to be periodically updated to 1438 account for time drift. In addition, the delta time value between 1439 servers SHOULD be smoothed in some fashion, so that transient network 1440 delays will not cause it to vary wildly. 1442 A server SHOULD recognize a drastic change in the delta time value as 1443 an event to be signaled to a network administrator, as well as reset- 1444 ting the time delta between the failover partners. 1446 The specific definitions of a minor or drastic change in delta time 1447 as well as the algorithm used to smooth minor changes into the run- 1448 ning delta time are implementation issues and are not further 1449 addresses in this document. 1451 5.11. IP address binding-status 1453 In most DHCP servers an IP address can take on several different 1454 binding-status values, sometimes also called states. While no two 1455 DHCP servers probably have exactly the same possible binding-status 1456 values, the DHCP RFC enforces some commonality among the general 1457 semantics of the binding-status values used by various DHCP server 1458 implementations. 1460 In order to transmit binding database updates between one server and 1461 another using the failover protocol, some common denominator 1462 binding-status values must be defined. It is not expected that these 1463 binding-status-values correspond with any actual implementation of 1464 the DHCP protocol in a DHCP server, but rather that the binding- 1465 status values defined in this document should be a common denominator 1466 of those in use by many DHCP server implementations. It is a goal of 1467 this protocol that any DHCP server can map the various IP address 1468 binding-status values that it uses internally into these failover IP 1469 address binding-status values on transmission of binding database 1470 updates to its partner, and likewise that it can map any failover IP 1471 address binding-status values it received in a binding update into 1472 its internal IP address binding-status values. 1474 The IP address binding-status values defined for the failover 1475 protocol are listed below. Unless otherwise noted below, there MAY 1476 be client information associated with each of these binding-status 1477 values. 1479 o 1481 o ACTIVE -- Lease is assigned to a client. Client identification 1482 MUST appear. 1484 o EXPIRED -- indicates that a client's binding on an IP address 1485 has expired. When the partner server ACK's the BNDUPD of an 1486 EXPIRED IP address, the server sets its internal state to FREE. 1487 It is then available for allocation to any client of the primary 1488 server. It may be allocated to the same client on the server 1489 where the lease expired if a BNDUPD containing the EXPIRED state 1490 has not yet been sent to the partner (e.g., in the event that 1491 the servers are not in communication). Client identification 1492 SHOULD appear. 1494 o RELEASED -- indicates that a DHCP client sent in a DHCPRELEASE 1495 message. When the partner server ACK's the BNDUPD of an 1496 RELEASED IP address, the server sets its internal state to FREE, 1497 and it is available for allocation by the primary server to any 1498 DHCP client. It may be allocated to the same client if a BNDUPD 1499 has not yet been sent to the partner. Client identification 1500 SHOULD appear. 1502 o FREE -- is used when a DHCP server needs to communicate that an 1503 IP address is unused by any DHCP client, but it was not just 1504 released, expired, or reset by a network administrator. When 1505 the partner server ACK's the BNDUPD of a FREE IP address, the 1506 server sets its internal state such that it is available for 1507 allocation by the primary DHCP server to any DHCP client. (Note 1508 that in PARTNER-DOWN state, after waiting the MCLT, the IP 1509 address MAY be allocated to a DHCP client by the secondary 1510 server.) 1512 Note that when an IP address that was allocated by the secondary 1513 reverts to the FREE state, it must (like any other IP address) 1514 be assigned to the secondary through the POOLREQ/BNDUPD process 1515 before the secondary can reallocate it. 1517 Client identification MAY appear. 1519 o ABANDONED -- indicates that an IP address is considered unusable 1520 by the DHCP subsystem. An IP address for which a valid PING 1521 response was received SHOULD be set to ABANDONED. An IP address 1522 for which a DHCPDECLINE was received should be set to ABANDONED. 1524 Client identification MUST NOT appear. 1526 o RESET -- indicates that this IP address was made available by 1527 operator command. This is a distinct state so that the reason 1528 that the IP address became FREE can be determined. Client iden- 1529 tification MAY appear. 1531 o BACKUP -- indicates that this IP address can be allocated by the 1532 secondary server to a DHCP client at any time. When the MCLT has 1533 passed after its time of entry into PARTNER-DOWN state, the IP 1534 address may be allocated by the primary to any DHCP client. 1535 Client identification MAY appear. 1537 These binding-status values are communicated from one failover 1538 partner to another using the binding-status option, see section 12.3 1539 for details of this option. Unless otherwise noted above there MAY 1540 be client information associated with each of these binding-status 1541 values. 1543 An IP address will move between these binding-status values using the 1544 following state transition diagram: 1546 DHCP client DECLINE or 1547 server detected problem 1548 from any state 1549 +----------+ V +---------+ 1550 External >---->| RESET | | |ABANDONED| 1551 command | | +-->| | 1552 +----------+ +---------+ 1553 | 1554 Comm w/Parter(1) 1555 V 1556 +---------+ Comm(1) +----------+ Comm(1) +---------+ 1557 | EXPIRED |--------->| FREE |<----------| RELEASED| 1558 | | w/Parter | | w/Partner | | 1559 +---------+ +----------+ +---------+ 1560 ^ ^ | | +-----------+ ^ 1561 | | | | | | 1562 | Exp. grace IP | IP addr alloc. IP addr | 1563 | period ends address to sec.(2) reserved | 1564 | | leasedy V | | 1565 | | by | +----------+ | | 1566 | | primary | BACKUP |<---+ | 1567 | wait for | | | | 1568 | grace period | +----------+ | 1569 | | | | | 1570 | | | IP addr leased by | 1571 | Expired grace | secondary | 1572 | period exists V V | 1573 | | +----------+ | 1574 | | Lease on | ACTIVE | DHCPRELEASE | 1575 +-----+-IP addr---| |------------------+ 1576 expires +----------+ 1578 Figure 5.10-1: Transitions between binding-status values. 1580 (1) This transition MAY also occur if the server is in 1581 PARTNER-DOWN state and the MCLT has passed since the entry 1582 in the RELEASED, EXPIRED, or RESET states. 1584 (2) This transition MAY occur if the server is the secondary 1585 and the MCLT has passed since its entry into PARTNER-DOWN state. 1587 Again, note that a DHCP server implementing the failover protocol 1588 does not have to implement either this state machine or use these 1589 particular binding-status values in its normal operation of allocat- 1590 ing IP addresses to DHCP clients. It only needs to map its internal 1591 binding-status-values onto these "standard" binding-status values, 1592 and map these "standard" binding-status values back into its internal 1593 binding-status values. For example, a server which implements a 1594 grace period for a IP address binding SHOULD simply wait to update 1595 its partner server until the grace period on that binding has run 1596 out. 1598 The process of setting an IP address to FREE deserves some detailed 1599 discussion. When an IP address is moved to the EXPIRED,RELEASED, or 1600 RESET binding-status on a server, it will send a BNDUPD with the 1601 binding-status of EXPIRED, RELEASED, or RESET to its partner. If its 1602 partner agrees that is acceptable (see sections 7.1.2 and 7.1.3 con- 1603 cerning why a server might not accept a BNDUPD) it will return a 1604 BNDACK with no reject-reason, signifying that it accepted the update. 1605 As part of the BNDUPD processing, the server returning the BNDACK 1606 will set the binding-status of the IP address to FREE, and upon 1607 receipt of the BNDACK the server which sent the BNDUPD will set the 1608 binding-status of the IP address to FREE. Thus, the EXPIRED, 1609 RELEASED, or RESET binding-status is something of a transitory state. 1610 This process is encoded in the transition diagram above by "Comm 1611 w/Partner". 1613 5.12. DNS dynamic update considerations 1615 DHCP servers (and clients) can use DNS Dynamic Updates as described 1616 in [RFC 2136] to maintain DNS name-mappings as they maintain DHCP 1617 leases. Many different administrative models for DHCP-DNS integra- 1618 tion are possible. Descriptions of several of these models, and 1619 guidelines that DHCP servers and clients should follow in carrying 1620 them out, are laid out in [DDNS]. The nature of the DHCP failover 1621 protocol introduces some issues concerning dynamic DNS updates that 1622 are not part of non-failover DHCP environments. This section 1623 describes these issues, and defines the information which failover 1624 partners should exchange and the protocol which they should follow in 1625 order to ensure consistent behavior. The presence of this section 1626 should not be interpreted as requiring that implementations of the 1627 DHCP failover protocol must also support DDNS updates. The purpose 1628 of this discussion is to clarify the areas where the DHCP failover 1629 and DHCP-DDNS protocols intersect for the benefit of implementations 1630 which support both protocols, not to introduce a new requirement into 1631 the DHCP failover protocol. Thus, a DHCP server which implements the 1632 failover protocol MAY also support dynamic DNS updates, but if it 1633 does support dynamic DNS updates it SHOULD utilize the techniques 1634 described here in order to correctly distribute them between the 1635 failover partners. 1637 From the standpoint of the failover protocol, there is no reason why 1638 a server which is utilizing the DDNS protocol to update a DNS server 1639 should not be a partner with a server which is not utilizing the DDNS 1640 protocol to update a DNS server. However, a server which is not able 1641 to support DDNS or is not configured to support DDNS SHOULD output a 1642 warning message when it receives BNDUPD messages which indicate that 1643 its failover partner is configured to support the DDNS protocol to 1644 update a DNS server. An implementation MAY consider this an error 1645 and refuse to operate, or it MAY choose to operate anyway, having 1646 warned the user of the problem in some way. 1648 5.12.1. Relationship between failover and dynamic DNS update 1650 The failover protocol describes the conditions under which each fail- 1651 over server may renew a lease to its current DHCP client, and 1652 describes the conditions under which it may grant a lease to a new 1653 DHCP client. An analogous set of conditions determines when a fail- 1654 over server should initiate a DDNS update, and when it should attempt 1655 to remove records from the DNS. The failover protocol's conditions 1656 are based on the desired external behavior: avoiding duplicate 1657 address assignments; allowing clients to continue using leases which 1658 they obtained from one failover partner even if they can only commun- 1659 icate with the other partner; allowing the backup DHCP server to 1660 grant new leases even if it is unable to communicate with the primary 1661 server. The desired external DDNS behavior for DHCP failover servers 1662 is: 1664 1. Allow timely DDNS updates from the server which grants a 1665 client a lease. Recognize that there is often a DDNS update 1666 lifecycle which parallels the DHCP lease lifecycle. This is 1667 likely to include the addition of records when the lease is 1668 granted, and the removal of DNS records when the lease is sub- 1669 sequently made available for allocation to a different client. 1671 2. Communicate enough information between the two failover 1672 servers to allow one to complete the DDNS update 'lifecycle' 1673 even if the other server originally granted the lease. 1675 3. Avoid redundant or overlapping DDNS updates, where both fail- 1676 over servers are attempting to perform DDNS updates for the 1677 same lease-client binding. Avoid situations where one partner 1678 is attempting to add RRs related to a lease binding while the 1679 other partner is attempting to remove RRs related to the same 1680 lease binding. 1682 5.12.2. Use of the DDNS option 1684 In order for either server to be able to complete a DDNS update, or 1685 to remove DNS records which were added by its partner, both servers 1686 need to know the FQDN associated with the lease-client binding. The 1687 FQDN associated with the client's A RR and PTR RR SHOULD be communi- 1688 cated from the server which adds records into the DNS to its partner. 1689 The initiating server SHOULD use the DDNS option in the BNDUPD mes- 1690 sages to inform the partner server of the status of any DDNS updates 1691 associated with a lease binding. Failover servers MAY choose not to 1692 include the DDNS option in BNDUPD messages if there has been no 1693 change in the status of any DDNS update related to the lease binding. 1694 The partner server receiving BNDUPD messages containing the DDNS 1695 option SHOULD compare the status flags and the FQDN contained in the 1696 option data with the current DDNS information it has associated with 1697 the lease binding, and update its notion of the DDNS status accord- 1698 ingly. 1700 The initiating server MAY send a BNDUPD to its partner before the 1701 DDNS update has been successfully completed. If it does so, it SHOULD 1702 leave the 'C' bit in the Flags field clear, to indicate to the 1703 partner that the DDNS update may not be complete. When the DDNS 1704 update has been successfully acknowledged by the DNS server, the ini- 1705 tiating DHCP server SHOULD include the DDNS option in its next BNDUPD 1706 message about the binding, so that the partner server will be able to 1707 record the final status of the DDNS update. The initiating server 1708 SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc- 1709 cessfully accepted by the DNS server. 1711 Some implementations will choose to send a BNDUPD without waiting for 1712 the DDNS update to complete, and then will send a second BNDUPD once 1713 the DDNS update is complete. Other implementations will delay sending 1714 the partner a BNDUPD until the DDNS update has been acknowledged by 1715 the DNS server, or until some time-limit has elapsed, in order to 1716 avoid sending a second BNDUPD. 1718 The Domain Name field in the DDNS option contains the FQDN that will 1719 be associated with the A RR (if the server is performing an A RR 1720 update for the client) and the PTR RR. This FQDN may be composed in 1721 any of several ways, depending on server configuration and the infor- 1722 mation provided by the client in its DHCP messages. The client may 1723 supply a hostname which it would like the server to use in forming 1724 the FQDN, or it may supply the entire FQDN. The server may be config- 1725 ured to attempt to use the information the client supplies, it may be 1726 configured with an FQDN to use for the client, or it may be config- 1727 ured to synthesize an FQDN. The responsive server SHOULD include the 1728 FQDN that it will be using in DDNS updates it initiates when it sends 1729 the DDNS option. 1731 Since the responsive server may not have completed the DDNS update at 1732 the time it sends the first BNDUPD about the lease binding, there may 1733 be cases where the FQDN in later BNDUPD messages does not match the 1734 FQDN included in earlier messages. For example, the responsive 1735 server may be configured to handle situations where two or more DHCP 1736 client FQDNs are identical by modifying the most-specific label in 1737 the FQDNs of some of the clients in an attempt to generate unique 1738 FQDNs for them (a process sometimes called "disambiguation"). Alter- 1739 natively, at sites which use some or all of the information which 1740 clients supply to form the FQDN, it's possible that a client's confi- 1741 guration may be changed so that it begins to supply new data. The 1742 responsive server may react by removing the DNS records which it ori- 1743 ginally added for the client, and replacing them with records that 1744 refer to the client's new FQDN. In such cases, the responsive server 1745 SHOULD include the actual FQDN that was used in subsequent DDNS 1746 options. The responsive server SHOULD include relevant client-option 1747 data in the client-request-options option in its BNDUPD messages. 1748 This information may be necessary in order to allow the non- 1749 responsive partner to detect client configuration changes that change 1750 the hostname or FQDN data which the client includes in its DHCP 1751 requests. 1753 5.12.3. Adding RRs to the DNS 1755 A failover server which is going to perform DDNS updates SHOULD ini- 1756 tiate the DDNS update when it grants a new lease to a client. The 1757 non-responsive partner SHOULD NOT initiate a DDNS update when it 1758 receives the BNDUPD after the lease has been granted. The failover 1759 protocol ensures that only one of the partners will grant a lease to 1760 any individual client, so it follows that this requirement will 1761 prevent both partners from initiating updates simultaneously. The 1762 server initiating the update SHOULD follow the protocol in [DDNS]. 1763 The server may be configured to perform an A RR update on behalf of 1764 its clients, or not. Ordinarily, a failover server will not initiate 1765 DDNS updates when it renews leases. In two cases, however, a failover 1766 server MAY initiate a DDNS update when it renews a lease to its 1767 existing client: 1769 1. When the lease was granted before the server was configured to 1770 perform DDNS updates, the server MAY be configured to perform 1771 updates when it next renews existing leases. Since both 1772 servers are responsive to renewals in NORMAL state, it is not 1773 enough to simply require the non-responsive server to avoid a 1774 DNS update in this case. The server which would be responsive 1775 to a DHCPDISCOVER from this client (even though the current 1776 request is a DHCPREQUEST/RENEW) is the server which should 1777 initiate the DDNS update. 1779 2. If a server is in PARTNER-DOWN state, it can conclude that its 1780 partner is no longer attempting to perform an update for the 1781 existing client. If the remaining server has not recorded that 1782 an update for the binding has been successfully completed, the 1783 server MAY initiate a DDNS update. It MAY initiate this 1784 update immediately upon entry to PARTNER-DOWN state, it may 1785 perform this in the background, or it MAY initiate this update 1786 upon next hearing from the DHCP client. 1788 5.12.4. Deleting RRs from the DNS 1790 The failover server which makes an IP address FREE SHOULD initiate 1791 any DDNS deletes, if it has recorded that DNS records were added on 1792 behalf of the client. 1794 A server not in PARTNER-DOWN state "makes an IP address FREE" when it 1795 initiates a BNDUPD with a binding-status of FREE, EXPIRED, or 1796 RELEASED. Its partner confirms this status by acking that BNDUPD, 1797 and upon receipt of the ACK the server has "made the IP address 1798 FREE". Conversely, a server in PARTNER-DOWN state "makes an IP 1799 address FREE" when it sets the binding-status to FREE, since in 1800 PARTNER-DOWN state not communications is required with the partner. 1802 It is at this point that it should initiate the DDNS operations to 1803 delete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS 1804 deletes for DNS records related to the lease binding as part of send- 1805 ing the BNDACK message. The partner MAY have issued BNDUPD messages 1806 with a binding-status of FREE, EXPIRED, or RELEASED previously, but 1807 the other server will have NAKed these BNDUPD messages. 1809 The failover protocol ensures that only one of the two partner 1810 servers will be able to make a lease FREE. The server making the 1811 lease FREE may be doing so while it is in NORMAL communication with 1812 its partner, or it may be in PARTNER-DOWN state. If a server is in 1813 PARTNER-DOWN state, it may be performing DDNS deletes for RRs which 1814 its partner added originally. This allows a single remaining partner 1815 server to assume responsibility for all of the DDNS activity which 1816 the two servers were undertaking. 1818 Another implication of this approach is that no DDNS RR deletes will 1819 be performed while either server is in COMMUNICATIONS-INTERRUPTED 1820 state, since no IP addresses are moved into the FREE state during 1821 that period. 1823 5.13. Reservations and failover 1825 Some DHCP servers support a capability to offer specific pre- 1826 configured IP addresses to DHCP clients. These are real DHCP 1827 clients, they do the entire DHCP protocol, but these servers always 1828 offer the client a specific pre-configured IP address -- and they 1829 offer that IP address to no other clients. Such a capability has 1830 several names, but it is sometimes called a "reservation", in that 1831 the IP address is reserved for a particular DHCP client. 1833 In a situation where there are two DHCP servers serving the same sub- 1834 net without using failover, the two DHCP server's need to have dis- 1835 joint IP address pools, but identical reservations for the DHCP 1836 clients. 1838 In a failover context, both servers need to be configured with the 1839 proper reservations in an identical manner, but if we stop there 1840 problems can occur around the edge conditions where reservations are 1841 made for an IP address that has already been leased to a different 1842 client. Different servers handle this conflict in different ways, 1843 but the goal of the failover protocol is to allow correct operation 1844 with any server's approach to the normal processing of the DHCP pro- 1845 tocol. 1847 The general solution with regards to reservations is as follows. 1848 Whenever a reserved IP address becomes FREE (i.e., when first config- 1849 ured or whenever a client frees it or it expires or is reset), the 1850 primary server MUST show that IP address as FREE (and thus available 1851 for its own allocation) and it MUST send it to the secondary server 1852 with the R bit set in the IP-flags option and the binding-status 1853 BACKUP. 1855 Note that this implies that a reserved IP address goes through the 1856 normal state changes from FREE to ACTIVE (and possibly back to FREE). 1857 The failover protcol supports this approach to reservations, i.e., 1858 where the IP address undergoes the normal state changes of any IP 1859 address, but it can only be offered to the client for which it is 1860 reserved. Other approaches to the support of reservations exist in 1861 some DHCP server implementations (e.g., where the IP address is 1862 apparently leased to a particular client forever, without any expira- 1863 tion). The goal is for the failover protocol to support any of the 1864 usual approaches to reservations, both those that allow an IP address 1865 to go through different states when reserved, and those that don't. 1867 From the above, it follows that a reservation soley on the secondary 1868 will not necessarily allow the secondary to offer that address to 1869 client to whom it is reserved. The reservation must also appear on 1870 the primary as well for the secondary to be able to offer the IP 1871 address to the client to which is is reserved. 1873 When the reservation on an IP address is cancelled, if the IP address 1874 is currently FREE and the server is the primary, or BACKUP and the 1875 server is the secondary, the server MUST send a BNDUPD to the other 1876 server with the binding-status FREE. 1878 5.14. Dynamic BOOTP and failover 1880 Some DHCP servers support a capability to offer IP addresses to BOOTP 1881 clients without having a particular address previously allocated for 1882 those clients. This capability is often called something like 1883 "dynamic BOOTP". It is discussed briefly in RFC 1534 [RFC 1534]. 1885 This capability has a negative interaction with the fundamental ele- 1886 ments of the failover protocol, in that an address handed out to a 1887 BOOTP device has no term (or effectively no term, in that usually 1888 they are considered leases for "forever"). There is no opportunity 1889 to hand out a lease which is only the MCLT long when first hearing 1890 from a BOOTP device, because they may only interact once with the 1891 DHCP server and they have no notion of a lease expiration time. Thus 1892 the entire concept of the MCLT and waiting the MCLT after entering 1893 PARTNER-DOWN state is defeated when dealing with BOOTP devices. 1895 With some restrictions, however, dynamic BOOTP devices can be sup- 1896 ported in a server on a subnet where failover is supported. The only 1897 restriction (and it is not small) is that on any portion of the sub- 1898 net (in any address pool) where dynamic BOOTP devices can be allo- 1899 cated IP addresses, a DHCP server MUST NOT ever use any of the IP 1900 addresses which were previously available for allocation by its fail- 1901 over partner. Thus, the addresses allocated by the primary to the 1902 secondary for allocation that might have been allocated to BOOTP dev- 1903 ices MUST NOT ever be used by the primary server even if it is in 1904 PARTNER-DOWN state and has waited the MCLT after entering that state. 1905 Conversely, addresses available for allocation by the primary MUST 1906 NOT be used by the secondary even it is in PARTNER-DOWN state. The 1907 reason for this is because one of those IP address could have been 1908 allocated by the secondary server to a BOOTP device, and the primary 1909 server would have no way of ever knowing that happened. 1911 Whenever a server sends BNDUPD message to its partner, if the client 1912 of associated with the IP address is a BOOTP client, then the server 1913 MUST set the B bit in the IP-flags option. 1915 5.15. Guidelines for selecting MCLT 1917 There is no one correct value for the MCLT. There is an explicit 1918 tradeoff between various factors in selecting an MCLT value. 1920 5.15.1. Short MCLT 1922 A short MCLT value will mean that after entering PARTNER-DOWN state, 1923 a server will only have to wait a short time before it can start 1924 allocating its partner's IP addresses to DHCP clients. Furthermore, 1925 it will only have to wait a short time after the expiration of a 1926 lease on an IP address before it can reallocate that IP address to 1927 another DHCP client. 1929 However the downside of a short MCLT value is that the initial lease 1930 interval that will be offered to every new DHCP client will be short, 1931 which will cause increased traffic as those clients will need to send 1932 in their first renew in a half of a short MCLT time. In addition, 1933 the lease extensions that a server in COMMUNICATIONS-INTERRUPTED 1934 state can give will be only the MCLT after the server has been in 1935 COMMUNICATIONS-INTERRUPTED for around the desired client lease 1936 period. If a server stays in COMMUNICATIONS-INTERRUPTED for that 1937 long, then the leases it hands out will be short and that will 1938 increase the load on that server, possibly causing difficulty. 1940 5.15.2. Long MCLT 1942 A long MCLT value will mean that the initial lease period will be 1943 longer and the time that a server in COMMUNICATIONS-INTERRUPTED state 1944 will be able to extend leases (after it has been in COMMUNICATIONS- 1945 INTERRUPTED state for around the desired client lease period) will be 1946 longer. 1948 However, a server entering PARTNER-DOWN state will have to wait the 1949 longer MCLT before being able to allocate its partner's IP addresses 1950 to new DHCP clients. This may mean that additional IP addresses are 1951 required in order to cover this time period. Further, the server in 1952 PARTNER-DOWN will have to wait the longer MCLT from every lease 1953 expiration before it can reallocate an IP address to a different DHCP 1954 client. 1956 5.16. What is sent in response to an UPDREQ or UPDREQALL message? 1958 In section 7.3, the UPDREQ message is defined, and it says that the 1959 receiving server sends to the requesting server "all of the binding 1960 database information that it has not already seen". In section 1961 7.4.2, the UPDREQALL message is defined, and it says that the receiv- 1962 ing server sends to the requesting server "all binding database 1963 information". 1965 Both of these statements need further elaboration. 1967 First, for the UPDREQ message, the information to be sent in BNDUPD 1968 messages concerns "all of the binding database information it has not 1969 already seen". Since every BNDUPD is acked by the receving server, 1970 the sending server need only keep track of which IP addresses have 1971 binding database changes not yet seen by the partner, and when they 1972 are finally acked by the partner it can record that. Thus, at any 1973 time, it knows which IP addresses have unacked binding database 1974 information. This is less simple when, across reconfigurations of 1975 the servers, an IP address can change the failover partner to which 1976 it is associated. In that case, it is important to reset the indica- 1977 tion that the partner has seen this binding information. 1979 Second, in the event that a failover server's binding database infor- 1980 mation is restored from a backup, it will be partially out of date. 1981 In this case, its partner's indication of which binding database 1982 information the restored server has seen will be also be out of date. 1984 The solution to this problem is for a server which is connecting with 1985 its partner to check the partner's last communicated time, and if it 1986 is very much ahead of its own last communicated time, to to into 1987 recover state and transmit an UPDREQALL to allow it to refresh its 1988 state. See section 9.3.2, step 5. If the partner's last communi- 1989 cated time is very much behind its own record of when it last commun- 1990 icated with the partner, then it SHOULD invalidate its information on 1991 which binding database information the partner server knows, so that 1992 it will send all of its relevant binding database information to the 1993 partner. 1995 Third, in the event that a server receives a UPDREQALL message, what 1996 constitutes "all binding database information"? At first glance this 1997 would seem to be information on every configured IP address in the 1998 server. While this would be technically correct, it may impose a 1999 serious and unacceptable performance penalty on servers which have 2000 millions of configured IP addresses. What can be done to lessen the 2001 data that must be sent for an UPDREQALL? 2003 When sending "all binding database information", if the sending 2004 server sends only information concerning IP addresses which have been 2005 at some time associated with clients, it will send enough information 2006 to satisfy the needs of the failover protocol. It need not send 2007 information on any IP addresses that have never been used, since 2008 presumably they will be initialized as available to the primary 2009 server (i.e. FREE) on any server employing failover. 2011 6. Common Message Format 2013 This section discusses the common message format that all failover 2014 messages have in common, including the message header format as well 2015 as the common option format. See section 12 for the the definitions 2016 of the specific options used in the failover protocol. 2018 6.1. Message header format 2020 The options contained in the payload data section of the failover 2021 message all use a two byte option number and two byte length format. 2023 All failover protocol messages are sent over the TCP connection 2024 between failover endpoints and encoded using a message format 2025 specific to the failover protocol. 2027 There exists a common message format for all failover messages, which 2028 utilizes the options in a way similar to the DHCP protocol. For each 2029 message type, some options are required and some are optional. In 2030 addition, when a message is received any options that are not under- 2031 stood by the receiving server MUST be ignored. 2033 All of the fields in the fixed portion of the message MUST be filled 2034 with correct data in every message sent. 2036 0 1 2 3 2037 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2039 | message length (2) | msg type (1) |payload off (1)| 2040 +---------------+---------------+---------------+---------------+ 2041 | time (4) | 2042 +---------------------------------------------------------------+ 2043 | xid (4) | 2044 +---------------------------------------------------------------+ 2045 | 0 or more additional header bytes (variable) | 2046 +---------------------------------------------------------------+ 2047 | payload data (variable) | 2048 | | 2049 | formatted as DHCP-style options | 2050 | using a two byte option code and two byte length | 2051 | See section 6.2 for details. | 2052 +---------------------------------------------------------------+ 2054 message length - 2 bytes, network byte order 2056 This is the length of the message. It includes the two byte message 2057 length itself. The maximum length is 2048 bytes. The minimum length 2058 is 12. 2060 msg type - 1 byte 2062 The message type field is used to distinguish between messages. 2064 The following message types are defined: 2066 Value Message Type 2067 ----- ------------ 2068 0 reserved not used 2069 1 POOLREQ request allocation of addresses 2070 2 POOLRESP respond with allocation count 2071 3 BNDUPD update partner with binding info 2072 4 BNDACK acknowledge receipt of binding update 2073 5 CONNECT establish connection with the secondary 2074 6 CONNECTACK respond to attempt to establish connection with partner 2075 7 UPDREQALL request full transfer of binding info 2076 8 UPDDONE ack send and ack of req'd binding info 2077 9 UPDREQ req transfer of un-acked binding info 2078 10 STATE inform partner of current state or state change 2079 11 CONTACT probe communications integrity with partner 2080 12 DISCONNECT close a connection 2082 New message types should be defined in one of two ranges, 0-127 or 2083 129-255. The range of 0-127 is used for messages that MUST be sup- 2084 ported by every server, and if a server receives a message in the 2085 range of 0-127 that it doesn't understand, it MUST close the TCP con- 2086 nection. The range of 128-255 is used for messages which MAY be sup- 2087 ported but are not required, and if a server receives a message in 2088 this range that it does not understand it SHOULD ignore the message. 2090 payload offset - 1 byte 2092 The byte offset of the Payload Data, from the beginning of the 2093 failover message header. The value for the current protocol version 2094 (version 1) is 8. 2096 time - 4 bytes, network byte order 2098 The absolute time in GMT when the message was transmitted, 2099 represented as seconds elapsed since Jan 1, 1970 (i.e., similar to 2100 the ANSI C time_t time value representation). While the ANSI C 2101 time_t value is signed, the value used in this specification is 2102 unsigned. 2104 A server SHOULD set this time as close to the actual transmission of 2105 the message as possible. 2107 xid - 4 bytes, network byte order 2109 This is the transaction id of the failover message. The sender of a 2110 failover protocol message is responsible for setting this number, and 2111 the receiver of the message copies the number over into any response 2112 message, treating it as opaque data. The sender MUST ensure that 2113 every message sent from a particular failover endpoint over the 2114 associated TCP connection has a unique transaction id. 2116 For failover messages that have no corresponding response message, 2117 the XID value is meaningless, but MUST be supplied. The XID value is 2118 used solely by the receiver of a response message to determine the 2119 corresponding request message. 2121 Requests messages where the XID is used in the corresponding response 2122 messages are: POOLREQ, BNDUPD, CONNECT, UPDREQALL, and UPDREQ. The 2123 corresponding response messages are POOLRESP, BNDACK, CONNECTACK, 2124 UPDDONE, and UPDDONE, respectively. 2126 As requests/responses don't survive connection reestablishment, XIDs 2127 only need to be unique during a specific connection. 2129 payload data - variable length 2131 The options are placed after the header, after skipping payload 2132 offset bytes from beginning of the message. The payload data options 2133 are not preceded by a "cookie" value. 2135 The payload data is formatted as DHCP style options using two byte 2136 option codes and two byte option lengths. The option codes are in a 2137 namespace which is unique to the failover protocol. 2139 The maximum length of the payload data in octets is 2048 less the 2140 size of the header, i.e., the maximum message length is 2048 octets. 2142 6.2. Common option format 2144 The options contained in the payload data section of the failover 2145 message all use a two byte option number and two byte length format. 2147 The option numbers are drawn from an option number space unique to 2148 the failover protocol. All of the message types share a common 2149 option number space and common options definitions, though not all 2150 options are required or meaningful for every message. 2152 In contrast to the options which appear in DHCP client and server 2153 messages, the options in failover message are ordered. That is, for 2154 some messages the order in which the options appear in the payload 2155 data area is significant. The messages for which option ordering is 2156 significant explicitly describe the ordering requirements. If no 2157 ordering requirements are mentioned, then the order is not signifi- 2158 cant for that message. 2160 For all options which refer to time, they all use an absolute time in 2161 GMT. Time synchronization has already been achieved between the 2162 source and the target server using the CONNECT message and is updated 2163 and refined using the time in every packet. 2165 The time value is an unsigned 32 bit integer in network byte order 2166 giving the number of seconds since 00:00 UTC, 1st January 1970. This 2167 can be converted to an NTP timestamp by adding decimal 2208988800. 2168 This time format will not wrap until the year 2106. Until sometime 2169 in 2038, it is equal to the ANSI C time_t value (which is a signed 32 2170 bit value and will overflow into a negative number in 2038). 2172 Options should appear once only in each message (except for BNDUPD 2173 and BNDACK messages where bulking is used, see section 6.3 for 2174 details.) An option that appears twice is not concatenated, but 2175 treated as an error. 2177 Specific option values are described in section 12. 2179 See section 13 for how to define additional options. 2181 6.3. Batching multiple binding update transactions in one BNDUPD mes- 2182 sage 2184 Implementations of this protocol MAY send multiple binding update 2185 transactions in one BNDUPD message, where a binding update transac- 2186 tion is defined as the set of options which are associated with the 2187 update of a single IP address. All implementations of this protocol 2188 MUST be prepared to receive BNDUPD messages which contain multiple 2189 binding update transactions and respond correctly to them, including 2190 replying with a BNDACK message which contains status for the multiple 2191 binding update transactions contained in the BNDUPD message. 2193 In the discussion of sending and receiving BNDUPD messages in section 2194 7.1 and BNDACK messages in section 7.2, each BNDUPD message and 2195 BNDACK message is assumed to contain a single binding update transac- 2196 tion in order to reduce the complexity of the discussions in section 2197 7. 2199 Multiple binding update transactions MAY be batched together in one 2200 BNDUPD protocol message with the data sets for the individual tran- 2201 sactions delimited by the assigned-IP-address option, which MUST 2202 appear first in the option set for each transaction. Ordering of 2203 options between the assigned-IP-address options is not significant. 2204 This is illustrated in the following schematic representation: 2206 Non-IP Address/Non-client specific options first 2207 assigned-IP-address option for the first IP address 2208 Options pertaining to first address, including at least the 2209 binding-status option and others as required. 2210 assigned-IP-address option for the second IP address 2211 Options pertaining to second address, including at least the 2212 binding-status option and others as required. 2213 ... 2214 Trailing options (message digest). 2216 There MUST be a one-to-one correspondence between BNDUPD and BNDACK 2217 messages, and every BNDACK message MUST contain status for all of the 2218 binding update transactions in the corresponding BNDUPD message. 2220 The BNDACK message corresponding to a BNDUPD message MUST contain 2221 assigned-IP-address options for all of the binding update transac- 2222 tions in the BNDUPD message. Thus, every BNDACK message contains 2223 exactly the same assigned-IP-address options as does its correspond- 2224 ing BNDUPD message. The order of the assigned-IP-address options 2225 MAY, however, be different. Here is a schematic representation of a 2226 BNDACK: 2228 Non-IP Address/Non-client specific options first 2229 assigned-IP-address option for the first IP address 2230 If rejected, reject-reason option and message option. 2231 assigned-IP-address option for the second IP address 2232 If rejected, reject-reason option and message option. 2233 ... 2234 Trailing options (message digest). 2236 In case the server chooses to reject some or all of the IP address 2237 binding information in a BNDUPD message in a BNDACK reply, the BNDACK 2238 message MUST contain a reject-reason option following every 2239 assigned-IP-address option in order to indicate that the binding 2240 update transaction for that IP address was not accepted and why. As 2241 with a BNDACK message containing a single binding update transaction, 2242 an assigned-IP-address option without any associated reject-reason 2243 option indicates a successful binding update transaction. 2245 7. Protocol Messages 2247 This section contains the detailed definition of the protocol mes- 2248 sages, including the information to include when sending the message, 2249 as well as the actions to take upon receiving the message. The mes- 2250 sage type for each message appears as [n] in the heading for the mes- 2251 sage (see section 6.1). 2253 7.1. BNDUPD message [3] 2255 The binding update (BNDUPD) message is used to send the binding data- 2256 base changes (known as binding update transactions) to the partner 2257 server, and the partner server responds with a binding acknowledge- 2258 ment (BNDACK) message when it has successfully committed those 2259 changes to its own stable storage. 2261 The rest of the failover protocol exists to determine whether the 2262 partner server is able to communicate or not, and to enable the 2263 partners to exchange BNDUPD/BNDACK messages in order to keep their 2264 binding databases in stable storage synchronized. 2266 The rest of this section is written as though every BNDUPD message 2267 contains only a single binding update transaction in order to reduce 2268 the complexity of the discussion. See section 6.3 for information on 2269 how to create and process BNDUPD and BNDACK messages which contain 2270 multiple binding update transactions. Note that while a server MAY 2271 generate BNDUPD messages with multiple binding update transactions, 2272 every server MUST be able to process a BNDUPD message which contains 2273 multiple binding update transactions and generate the corresponding 2274 BNDACK messages with status for multiple binding update transactions. 2276 The following table summarizes the various options for the BNDUPD 2277 message. 2279 binding-status BACKUP 2280 RESET 2281 ABANDONED 2282 Option ACTIVE EXPIRED RELEASED FREE 2283 ------ ------ ------- -------- ---- 2284 assigned-IP-address (3) MUST MUST MUST MUST 2285 IP-flags MUST(4) MUST(4) MUST(4) MUST(4) 2286 binding-status MUST MUST MUST MUST 2287 client-identifier MAY MAY MAY MAY(2) 2288 client-hardware-address MUST MUST MUST MAY(2) 2289 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 2290 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 2291 start-time-of-state SHOULD SHOULD SHOULD SHOULD 2292 client-last-trans.-time MUST SHOULD MUST MAY 2293 DDNS(1) SHOULD SHOULD SHOULD SHOULD 2294 client-request-options SHOULD SHOULD NOT SHOULD SHOULD NOT 2295 client-reply-options SHOULD SHOULD NOT SHOULD NOT SHOULD NOT 2297 (1) MUST if server is performing dynamic DNS for this IP address, else 2298 MUST NOT. 2299 (2) MUST NOT if binding-status is ABANDONED. 2300 (3) assigned-IP-address MUST be the first option for an IP address 2301 (4) IP-flags option MUST appear if any flags are non-zero, else it 2302 MAY appear. 2304 Table 7.1-1: Options used in a BNDUPD message 2306 7.1.1. Sending the BNDUPD message 2308 A BNDUPD message SHOULD be generated whenever any binding changes. A 2309 change might be in the binding-status, the lease-expiration-time, or 2310 even just the last-transaction-time. In general, any time a DHCP 2311 server writes its stable storage, a BNDUPD message SHOULD be gen- 2312 erated. This will often be the result of the processing of a DHCP 2313 client request, but it might also be the result of a successful 2314 dynamic DNS update operation. Stable storage updates due to BNDUPD 2315 or BNDACK messages SHOULD NOT result in additional BNDUPD messages. 2317 BNDUPD (and BNDACK) messages refer to the binding-status of the IP 2318 address, and this protocol defines a series of binding-statuses, dis- 2319 cussed in more detail below. Some servers may not support all of 2320 these binding-statuses, and so in those cases they will not be sent. 2321 Upon receipt of a BNDUPD message which contains an unsupported 2322 binding-status, a reasonable interpretation should be made (see sec- 2323 tion 5.10). 2325 All BNDUPD messages MUST contain the IP address of the binding update 2326 transaction in the assigned-IP-address option. 2328 All binding update transactions MUST contain an IP-flags option if 2329 the value of any of the flags would be non-zero. The IP-flags option 2330 MAY be omitted if all of the flags that it contains are zero. The 2331 IP-flags option contains a flag which indicates if the IP address is 2332 currently reserved on the server sending the BNDUPD message. It also 2333 contains a flag which indicates that the lease is associated with a 2334 client that used the BOOTP protocol (as opposed to the DHCP protocol) 2335 to interact with the DHCP server. 2337 All binding update transactions contain a binding-status option, and 2338 it will have one of the values found in section 5.11. Client infor- 2339 mation consists of client-hardware-address and possibly a client- 2340 identifier, and is explained in more detail later in this section. 2341 The following table indicates whether client information should or 2342 should not appear with each binding-status in a binding update tran- 2343 saction: 2345 binding-status includes client information 2346 ------------------------------------------------ 2347 ACTIVE MUST 2348 EXPIRED SHOULD 2349 RELEASED SHOULD 2350 FREE MAY 2351 ABANDONED MUST NOT 2352 RESET MAY 2353 BACKUP MAY 2355 Table 7.1.1-1: Client information required by various 2356 binding-status values. 2358 The ACTIVE binding-status requires some options to indicate the 2359 length of the binding: 2361 o lease-expiration-time 2363 The lease-expiration-time option MUST appear, and be set to the 2364 expiration time most recently ACKed to the DHCP client. Note 2365 that the time ACKed to a DHCP client is a lease duration in 2366 seconds, while the lease-expiration-time option in a BNDUPD mes- 2367 sage is an absolute time value. 2369 o potential-expiration-time 2370 The potential-expiration-time option MUST appear, and be set to 2371 a value beyond that of the lease-expiration time. This is the 2372 value that is ACKed by the BNDACK message. A server sending a 2373 BNDUPD message MUST be able to recover the potential- 2374 expiration-time sent in every BNDUPD, not just those that 2375 receive a corresponding BNDACK, in order to be able to protect 2376 against possible duplicate allocation of IP addresses after 2377 transitioning to PARTNER-DOWN state. See section 5.2.1 for 2378 details as to why the potential-expiration-time exists and 2379 guidelines for how to decide on the value. 2381 The following option information applies to all BNDUPD messages, 2382 regardless of the value of the binding-status, unless otherwise 2383 noted. 2385 o Identifying the client 2387 For many of the binding-status values a client MUST appear while 2388 for others a client MAY appear, and for some a client MUST NOT 2389 appear. 2391 A client is identified in a BNDUPD message by at least one and pos- 2392 sibly two options. The client-hardware-address option MUST appear 2393 any time that a client appears in a BNDUPD message, and contains 2394 the hardware type and chaddr information from the DHCP request 2395 packet. A failover client-identifier option MUST appear any time 2396 that a client appears in a BNDUPD message if and only if that 2397 client used a DHCP client-identifier option when communicating with 2398 the DHCP server. See section 12.5 and 12.4 for details of how to 2399 construct these two options from a DHCP request packet. 2401 o start-time-of-state 2403 The start-time-of-state SHOULD appear. It is set to the time at 2404 which this IP address first took on the state that corresponds to 2405 the current value of binding-status. 2407 o last-transaction-time 2409 The last-transaction-time value SHOULD appear. This is the time at 2410 which this DHCP server last received a packet from the DHCP client 2411 referenced by the client-identifier or client-hardware-address that 2412 was associated with the IP address referenced by the assigned-IP- 2413 address. 2415 o DDNS 2417 If the DHCP server is performing dynamic DNS operations on behalf 2418 of the DHCP client represented by the client-identifier or client- 2419 hardware-address, then it should include a DDNS option containing 2420 the domain name and status of any dynamic DNS operations enabled. 2422 o client-request-options 2424 If the BNDUPD was triggered by a request from a DHCP client (typi- 2425 cally those with binding-status of ACTIVE and RELEASED), then the 2426 server SHOULD include options of interest to a failover partner 2427 from the client's request packet in the client-request-options for 2428 transmission to its partner (see section 12.8). 2430 A server sending a BNDUPD SHOULD remember the "interesting" options 2431 or the information that would appear in an "interesting" option for 2432 transmission at a time when the BNDUPD is not closely associated 2433 with a DHCP client request. 2435 A server SHOULD send the following "interesting" options. It MAY 2436 send any DHCP client options. As new options are defined, the RFC 2437 defining these options SHOULD include information that they are 2438 "interesting to failover servers" if they should be sent as part of 2439 a BNDUPD. 2441 option option 2442 number name 2443 ----------------------------------------- 2445 12 host-name 2446 81 client-FQDN [DDNS] 2447 82 relay-agent-information [AGENTINFO] 2448 TBD user-class [USERCLASS] 2449 60 vendor-class-identifier 2451 Table 7.1.1-2: Options which SHOULD be sent in 2452 the client-request-options option in a BNDUPD message. 2454 o client-reply-options 2456 If the BNDUPD was triggered by a request from a DHCP client (typi- 2457 cally those with binding-status of ACTIVE and RELEASED), then the 2458 server SHOULD include options of interest to a failover partner 2459 from the server's DHCP reply packet in the client-reply-options for 2460 transmission to its partner (see section 12.7). 2462 A server sending a BNDUPD SHOULD remember the "interesting" options 2463 or the information that would appear in an "interesting" option for 2464 transmission at a time when the BNDUPD is not closely associated 2465 with a DHCP client request. 2467 A server SHOULD send the following "interesting" options. It MAY 2468 send any DHCP client options. As new options are defined, the RFC 2469 defining these options SHOULD include information that they are 2470 "interesting to failover servers" if they should be sent as part of 2471 a BNDUPD. 2473 option option 2474 number name 2475 ----------------------------------------- 2477 58 renewal-time 2478 59 rebinding-time 2480 Table 7.1.1-3: Options which SHOULD be sent in 2481 the client-reply-options option in a BNDUPD message. 2483 The BNDUPD message SHOULD be sent as soon as possible from the time 2484 that the DHCP client received a response and the lease bindings data- 2485 base is written on stable storage. 2487 7.1.2. Receiving the BNDUPD message 2489 When a server receives a BNDUPD message, it needs to decide how to 2490 process the binding update transaction it contains and whether that 2491 transaction represents a conflict of any sort. The conflict resolu- 2492 tion process MUST be used on the receipt of every BNDUPD message, not 2493 just those that are received while in POTENTIAL-CONFLICT state, in 2494 order to increase the robustness of the protocol. 2496 There are three sorts of conflicts: 2498 o Two clients, one IP address conflict 2500 This is the duplicate IP address allocation conflict. There are 2501 two different clients each allocated the same address. See sec- 2502 tion 7.1.3 for how to resolve this conflict. 2504 o Two IP addresses, one client conflict 2506 This conflict exists when a client on one server is associated 2507 with a one IP address, and on the other server with a different 2508 IP address in the same or a related subnet. This does not refer 2509 to the case where a single client has addresses in multiple 2510 different subnets or administrative domains, but rather the case 2511 where on the same subnet the client has as lease on one IP 2512 address in one server and on a different IP address on the other 2513 server. 2515 This conflict may or may not be a problem for a given DHCP 2516 server implementation. In the event that a DHCP server requires 2517 that a DHCP client have only one outstanding lease for an IP 2518 address on one subnet, this conflict should be resolved by 2519 accepting the update which has the latest client-last- 2520 transaction-time. 2522 o binding-status conflict 2524 This is normal conflict, where one server is updating the other 2525 with newer information. See section 7.1.3 for details of how to 2526 resolve these conflicts. 2528 7.1.3. Deciding whether to accept the binding update transaction in a 2529 BNDUPD message 2531 When analyzing a BNDUPD message from a partner server, if there is 2532 insufficient information in the BNDUPD to process it, then reject the 2533 BNDUPD with reject-reason 3: "Missing binding information". 2535 If the IP address in the BNDUPD is not an IP address associated with 2536 the failover endpoint which received the BNDUPD message, then reject 2537 it with reject-reason 1: "Illegal IP address (not part of any address 2538 pool)". 2540 IP addresses undergo binding status changes for several reasons, 2541 including receipt and processing of DHCP client requests, administra- 2542 tive inputs and receipt of BNDUPD messages. Every DHCP server needs 2543 to respond to DHCP client requests and administrative inputs with 2544 changes to its internal record of the binding-status of an IP 2545 address, and this response is not in the scope of the failover proto- 2546 col. However, the receipt of BNDUPD messages implies at least a pos- 2547 sible change of the binding-status for an IP address, and must be 2548 discussed here. See section 7.1.2 for general actions to take upon 2549 receipt of a BNDUPD message. 2551 When receiving a BNDUPD message, it is important to note that it may 2552 not be current, in that the server receiving the BNDUPD message may 2553 have had a more recent interaction with the DHCP client than its 2554 partner who sent the BNDUPD message. In this case, the receiving 2555 server MUST reject the BNDUPD message. The reject reason SHOULD be 2556 15: "Outdated binding information". In addition, it is worth noting 2557 that two (and possibly three) binding-status values are the direct 2558 result of interaction with a DHCP client, ACTIVE and RELEASED (and 2559 possibly ABANDONED). All other binding-status values are either the 2560 result of the expiration of a time period or interaction with an 2561 external agency (e.g., a network administrator). 2563 Every BNDUPD message SHOULD contain a client-last-transaction-time 2564 option, which MUST, if it appears, be the time that the server last 2565 interacted with the DHCP client. It MUST NOT be, for instance, the 2566 time that the lease on an IP address expired. If there has been no 2567 interaction with the DHCP client in question (or there is no DHCP 2568 client presently associated with this IP address), then there will be 2569 no client-last-transaction-time option in the BNDUPD message. 2571 The list in Figure 7.1.3-1 is indexed by the binding-status that a 2572 server receives in a BNDUPD message. In many cases, the binding- 2573 status of an IP address within the receiving server's data storage 2574 will have an affect upon the checks performed prior to accepting the 2575 new binding-status in a BNDUPD message. 2577 In Figure 7.1.3-1, to "accept" a BNDUPD means to update the server's 2578 bindings database with the information contained in the BNDUPD and 2579 once that update is complete, send a BNDACK message corresponding to 2580 the BNDUPD message. To "reject" a BNDUPD means to respond to the 2581 BNDUPD with a BNDACK with a reject-reason option included. 2583 When interpreting the rules in the following list, if a BNDUPD 2584 doesn't have a client-last-transaction-time value, then it MUST NOT 2585 be considered later than the client-last-transaction-time in the 2586 receiving server's binding. If the BNDUPD contains a client-last- 2587 transaction-time value and the receiving server's binding does not, 2588 then the client-last-transaction-time value in the BNDUPD MUST be 2589 considered later than the server's. 2591 The second rule concerns clients and IP addresses. If the clients in 2592 a BNDUPD message and in a receiving server's binding differ, then if 2593 the receiving server's binding-status is ACTIVE and the binding- 2594 status in the BNDUPD is ACTIVE, then if the receiving server is a 2595 secondary server accept it, else reject it with a reject reason of 2: 2596 "Fatal conflict exists: address in use by other client". 2598 binding-status in received BNDUPD 2599 binding-status 2600 in receiving FREE RESET 2601 server ACTIVE EXPIRED RELEASED BACKUP ABANDONED 2603 ACTIVE accept time(2) time(1) time(2) accept 2604 EXPIRED time(1) accept accept accept accept 2605 RELEASED time(1) time(1) accept accept accept 2606 FREE/BACKUP accept accept accept accept accept 2607 RESET time(3) accept accept accept accept 2608 ABANDONED reject(4) reject(4) reject(4) reject(4) accept 2610 time(1): If the client-last-transaction-time in the BNDUPD 2611 is later than the client-last-transaction-time in the 2612 receiving server's binding, accept it, else reject it. 2614 time(2): If the current time is later than the receiving 2615 servers' lease-expiration-time, accept it, else reject it. 2617 time(3): If the client-last-transaction-time in the BNDUPD 2618 is later than the start-time-of-state in the receiving server's 2619 binding, accept it, else reject it. 2621 (1,2,3): If rejecting, use reject reason 15: "Outdated binding 2622 information". 2624 (4): Use reject reason 16: "Less critical binding information". 2626 Figure 7.1.3-1: Accepting BNDUPD messages 2628 If the IP address in the BNDUPD message has the R flag set in the 2629 IP-flags option, indicating it is a reserved IP address, and if the 2630 binding-status in the BNDUPD is BACKUP, then if the receiving server 2631 does not show the IP address as reserved, the receiving server SHOULD 2632 reject the BNDUPD using reject reason 19: "IP not reserved on this 2633 server". 2635 7.1.4. Accepting the BNDUPD message 2637 When accepting a BNDUPD message, the information contained in the 2638 client-request-options and client-reply-options SHOULD be examined 2639 for any information of interest to this server. For instance, a 2640 server which wished to detect changes in client specified host names 2641 might want to examine and save information from the host-name or 2642 client-FQDN options. Servers which expect to utilize information 2643 from the relay-agent-information option would want to store this 2644 information. 2646 7.1.5. Time values related to the BNDUPD message 2648 There are four time values that MAY be sent in a BNDUPD message. 2650 o lease-expiration-time 2652 The time that the server gave to the client, i.e., the time that 2653 the server believes that the client's lease will expire. 2655 o potential-expiration-time 2657 The time that the server wants to be sure its partner waits 2658 (added to the MCLT) before assuming that this lease has expired. 2659 Typically some time beyond the desired client lease time. 2661 o client-last-transaction-time 2663 The time that the client last interacted with this server. 2665 o start-time-of-state 2667 The time at which the binding first went into the current state. 2669 As discussed in section 5.2, each server knows what its partner has 2670 ACKed with regard to potential-expiration time. In addition, each 2671 server needs to remember what it has told its partner as the 2672 potential-expiration-time. Moreover, each server must remember what 2673 it has acked to the *other* server as the most recent potential- 2674 expiration-time from that server. 2676 Remember that each server sends a potential-expiration-time and 2677 receives an ACK for that as well as receiving a potential- 2678 expiration-time and needing to remember what it has acked for that. 2680 While they don't have to be named in any particular way, the times 2681 that a server needs to remember for every IP address in order to 2682 implement the failover protocol are: 2684 o lease-expiration-time 2686 The time that a server gave to the DHCP client. A DHCP server 2687 needs to remember this time already, just to be a DHCP server. 2688 A server SHOULD update this time with the lease-expiration time 2689 received from a partner in a BNDUPD if the received lease- 2690 expiration time is later than the lease-expiration time recorded 2691 for this binding. 2693 o sent-potential-expiration-time 2695 The latest time sent to the partner for a potential-expiration- 2696 time. 2698 o acked-potential-expiration-time 2700 The latest time that the partner has acked for a potential 2701 expiration time. Typically the same as sent-potential- 2702 expiration-time if there is not a BNDUPD outstanding. 2704 o received-potential-expiration-time 2706 The latest time that this server has ever received as a 2707 potential-expiration-time from its partner in a BNDUPD that this 2708 server ACKed. 2710 So, a server has to remember two additional times concerning BNDUPD 2711 messages that it has initiated, and one additional time concerning 2712 BNDUPD message that it has received. How are these times used? 2714 First, let's look at the time that a DHCP server can offer to a DHCP 2715 client. A server can offer to a DHCP client a time that is no longer 2716 than the MCLT beyond the max( received-potential-expiration-time, 2717 acked-potential-expiration-time). One might think that the server 2718 should be able to offer only the MCLT beyond the acked-potential- 2719 expiration-time, and while that is certainly simple and easy to 2720 understand, it has negative consequences in actual operation. 2722 To illustrate this, in the simple case where the primary updates the 2723 secondary for a while and then fails, if the secondary can then renew 2724 the client for only the MCLT beyond the acked-potential-expiration- 2725 time, then the secondary will only be able to renew the client for 2726 the MCLT, because the secondary has never sent a BNDUPD packet to the 2727 primary concerning this IP address and client, and so its acked- 2728 potential-expiration-time is zero. 2730 However, since the secondary is allowed to renew the client with the 2731 MCLT beyond the max( received-potential-expiration-time, acked- 2732 potential-expiration-time), then the secondary can usually renew the 2733 client for the full lease period, at least for the first renew it 2734 sees from the client, since the received-potential-expiration-time is 2735 generally longer than the client's desired lease interval. The 2736 difference in renew times could make a big difference in server load 2737 on the secondary in this case. 2739 What are the consequences of allowing a server to offer a DHCP client 2740 a lease term of the MCLT beyond the max( received-potential- 2741 expiration-time, acked-potential-expiration-time)? The consequences 2742 appear whenever a server enters PARTNER-DOWN state, and affect how 2743 long that server has to wait before reallocating expired leases. 2744 With this approach, when a server goes into PARTNER-DOWN state, it 2745 must wait the MCLT beyond the max( lease-expiration-time, sent- 2746 potential-expiration-time, acked-potential-expiration-time, 2747 received-potential-expiration-time ) for each IP address before it 2748 can reallocate that IP address to another DHCP client. One might 2749 normally think that it needed to wait only the MCLT beyond the max( 2750 lease-expiration-time, received-potential-expiration-time ), i.e., 2751 beyond what it has told the client and what it has explicitly acked 2752 to the other server. But with the optimization discussed above -- 2753 where either server can offer the DHCP client a lease term of the 2754 MCLT beyond the max( received-potential-expiration-time, acked- 2755 potential-expiration-time), then the additional times sent- 2756 potential-expiration-time and acked-potential-expiration-time must be 2757 added into the expression, since the partner could have used those 2758 times as part of its own lease time calculation. 2760 Thus this optimization may require a longer waiting time when enter- 2761 ing PARTNER-DOWN state, but will generally allow servers to operate 2762 considerably more effectively when running in COMMUNICATIONS- 2763 INTERRUPTED state. 2765 7.2. BNDACK message [4] 2767 A server sends a binding acknowledgement (BNDACK) message when it has 2768 processed a BNDUPD message and after it has successfully committed to 2769 stable storage any binding database changes made as a result of pro- 2770 cessing the BNDUPD message. A BNDACK message is used to both accept 2771 or reject a BNDUPD message. A BNDACK message which contains a 2772 reject-reason option is a rejection of the corresponding BNDUPD mes- 2773 sage. 2775 In order to reduce the complexity of the discussion, the rest of this 2776 section is written as though every BNDUPD message contains only a 2777 single binding update transaction and thus every corresponding BNDACK 2778 message would also contain reply information about only a single 2779 binding update transaction. See section 6.3 for information on how 2780 to create and process BNDUPD and BNDACK messages which contain multi- 2781 ple binding update transactions. 2783 Note that while a server MAY generate BNDUPD messages with multiple 2784 binding update transactions, every server MUST be able to process a 2785 BNDUPD message which contains multiple binding update transactions 2786 and generate the corresponding BNDACK messages with status for 2787 multiple binding update transactions. If a server does not ever 2788 create BNDUPD messages which contain multiple binding update transac- 2789 tions, then it does not need to be able to process a received BNDACK 2790 message with multiple binding update transactions. However, all 2791 servers MUST be able to create BNDACK messages which deal with multi- 2792 ple binding update transactions received in a BNDUPD message. 2794 Every BNDUPD message that is received by a server MUST be responded 2795 to with a corresponding BNDACK message. The receiving server SHOULD 2796 respond quickly to every BNDUPD message but it MAY choose to respond 2797 preferentially to DHCP client requests instead of BNDUPD messages, 2798 since there is no absolute time period within which a BNDACK must be 2799 sent in response to a BNDUPD message, while DHCP clients frequently 2800 have strict time constraints. 2802 A BNDACK message can only be sent in response to a BNDUPD message 2803 using the same TCP connection from which the BNDUPD message was 2804 received, since the XID's in BNDUPD messages are guaranteed unique 2805 only during the life of a single TCP connection. When a connection 2806 to a partner server goes down, a server with unprocessed BNDUPD mes- 2807 sages MAY simply drop all of those messages, since it can be sure 2808 that the partner will resend them when they are next in communica- 2809 tions (albeit with a different XID), or it MAY instead choose to pro- 2810 cess those BNDUPD messages, but it MUST NOT send any BNDACK messages 2811 in response. 2813 The following table summarizes the options for the BNDACK message. 2815 Option accept reject 2816 ------ ------ ------ 2817 assigned-IP-address (1) MUST MUST 2818 IP-flags SHOULD NOT SHOULD NOT 2819 binding-status SHOULD NOT SHOULD NOT 2820 client-identifier SHOULD NOT SHOULD NOT 2821 client-hardware-address SHOULD NOT SHOULD NOT 2822 reject-reason SHOULD NOT MUST 2823 message SHOULD NOT SHOULD 2824 lease-expiration-time SHOULD NOT SHOULD NOT 2825 potential-expiration-time SHOULD NOT SHOULD NOT 2826 start-time-of-state SHOULD NOT SHOULD NOT 2827 client-last-trans.-time SHOULD NOT SHOULD NOT 2828 DDNS(1) SHOULD NOT SHOULD NOT 2830 (1) assigned-IP-address MUST be the first option for an IP address 2832 Table 7.2-1: Options used in a BNDACK message 2834 7.2.1. Sending the BNDACK message 2836 The BNDACK message MUST contain the same xid as the corresponding 2837 BNDUPD message. 2839 The assigned-IP-address option from the BNDUPD message MUST be 2840 included in the BNDACK message. Any additional options from the 2841 BNDUPD message SHOULD NOT appear in the BNDACK message. Note that 2842 any information sent in options (e.g, a later lease-expiration time) 2843 in the BNDACK message MUST NOT be assumed to necessarily be recorded 2844 in the stable storage of the server who receives the BNDACK message 2845 because there is no corresponding ACK of the BNDACK message. Any 2846 information that SHOULD be recorded in the partner server's stable 2847 storage MUST be transmitted in a subsequent BNDUPD. 2849 If the server is accepting the BNDUPD, the BNDACK message includes 2850 only the assigned-IP-address option. If the server is rejecting the 2851 BNDUPD, the additional option reject-reason MUST appear in the BNDACK 2852 message, and the message option SHOULD appear in this case containing 2853 a human-readable error message describing in some detail the reason 2854 for the rejection of the BNDUPD message. 2856 If the server rejects the BNDUPD message with a BNDACK and a reject- 2857 reason option, it may be because the server believes that it has 2858 binding information that the other server should know. A server 2859 which is rejecting a BNDUPD may initiate a BNDUPD of its own in order 2860 to update its partner with what it believes is better binding infor- 2861 mation, but it MUST ensure through some means that it will not end up 2862 in a situation where each server is sending BNDUPD messages as fast 2863 as possible because they can't agree on which server has better bind- 2864 ing data. Placing a considerable delay on the initiation of a BNDUPD 2865 message after sending a BNDACK with a reject-reason would be one way 2866 to ensure this situation doesn't occur. 2868 7.2.2. Receiving the BNDACK message 2870 When a server receives a BNDACK message, if it doesn't contain a 2871 reject-reason option that means that the BNDUPD message was accepted, 2872 and the server which sent the BNDUPD SHOULD update its stable storage 2873 with the potential-expiration-time value sent in the BNDUPD message 2874 and returned in the BNDACK message. Other values sent in the BNDUPD 2875 message MAY be used as desired. 2877 If the BNDACK message contains a reject-reason option, that means 2878 that the BNDUPD was rejected. There SHOULD be a message option in 2879 the BNDACK giving a text reason for the rejection, and the server 2880 SHOULD log the message in some way. The server MUST NOT immediately 2881 try to resend the BNDUPD message as there is no reason to believe the 2882 partner won't reject it a second time. However a server MAY choose 2883 to send another BNDUPD at some future time, for instance when the 2884 server next processes an update request from its partner. 2886 7.3. UPDREQ message [9] 2888 The update request (UPDREQ) message is used by one server to request 2889 that its partner send it all of the binding database information that 2890 it has not already seen. Since each server is required to keep 2891 track at all times of the binding information the other server has 2892 received and ACKed, one server can request transmission of all un- 2893 ACKed binding database information held by the other server by using 2894 the UPDREQ message. 2896 The UPDREQ message is used whenever the sending server cannot proceed 2897 before it has processed all previously un-ACKed binding update infor- 2898 mation, since the UPDREQ message should yield a corresponding UPDDONE 2899 message. The UPDDONE message is not sent until the server that sent 2900 the UPDREQ message has responded to all of the BNDUPD messages gen- 2901 erated by the UPDREQ message with BNDACK messages (they may either be 2902 accepted or rejected by the BNDACK messages, but they MUST have been 2903 responded to). Thus, the sender of the UPDREQ message can be sure 2904 upon receipt of an UPDDONE message that it has received and committed 2905 to stable storage all outstanding binding database updates. 2907 See section 9, Failover Endpoint States, for the details of when the 2908 UPDREQ message is sent. 2910 7.3.1. Sending the UPDREQ message 2912 The UPDREQ message has no message specific options. 2914 7.3.2. Receiving the UPDREQ message 2916 A server receiving an UPDREQ message MUST send all binding database 2917 changes that have not yet been ACKed by the sending server. These 2918 changes are sent as undistinguished BNDUPD messages. 2920 However, the server which received and is processing the UPDREQ mes- 2921 sage MUST track the BNDACK messages that correspond to the BNDUPD 2922 messages triggered by the UPDREQ message and, when they are all 2923 received, the server MUST send an UPDDONE message. 2925 The server processing the UPDREQ message and sending BNDUPD messages 2926 to its partner SHOULD only track the BNDUPD and BNDACK message pairs 2927 for unACKed binding database changes that were present upon the 2928 receipt of the UPDREQ message. A server which has received an UPDREQ 2929 message SHOULD send BNDUPD messages for binding database changes that 2930 occur after receipt of the UPDREQ message, but it SHOULD NOT include 2931 those additional BNDUPD messages and their corresponding BNDACK mes- 2932 sages in the accounting necessary to consider the UPDREQ complete and 2933 subsequently send the UPDDONE message. If some additional binding 2934 database changes end up becoming part of the set of BNDUPD messages 2935 considered as part of the UPDREQ (due to whatever algorithm the 2936 server uses to scan its bindings database for unacked changes) it 2937 will probably not cause any difficulty, but a server MUST NOT attempt 2938 to include all such later BNDUPD messages in the accounting for the 2939 UPDREQ in order to be able to transmit an UPDDONE message. 2941 When queuing up the BNDUPD messages for transmission to the sender of 2942 the UPDREQ message, the server processing the UPDREQ message MUST 2943 honor the value returned in the max-unacked-bndupd option in the CON- 2944 NECT or CONNECTACK message that set up the connection with the send- 2945 ing server. It MUST NOT send more BNDUPD messages without receiving 2946 corresponding BNDACKs than the value returned in max-unacked-bndupd. 2948 7.4. UPDREQALL message [7] 2950 The update request all (UPDREQALL) message is used by one server to 2951 request that its partner send it all of the binding database informa- 2952 tion. This message is used to allow one server to recover from a 2953 failure of stable storage and to restore its binding database in its 2954 entirety from the other server. 2956 A server which sends an UPDREQALL message cannot proceed until all of 2957 its binding update information is restored, and it knows that all of 2958 that information is restored when an UPDDONE message is received. 2960 See section 9, Protocol state transitions, for the details of when 2961 the UPDREQALL message is sent. 2963 The UPDREQALL message has no message specific options. 2965 7.4.1. Sending the UPDREQALL message 2967 The UPDREQALL is sent. 2969 7.4.2. Receiving the UPDREQALL message 2971 A server receiving an UPDREQALL message MUST send all binding data- 2972 base information to the sending server. These changes are sent as 2973 undistinguished BNDUPD messages. Otherwise the processing is the same 2974 as for the UPDREQ message. See section 7.3.2 for details. 2976 7.5. UPDDONE message [8] 2978 The update done (UPDDONE) message is used by a server receiving an 2979 UPDREQ or UPDREQALL message to signify that it has sent all of the 2980 BNDUPD messages requested by the UPDREQ or UPDREQALL request and that 2981 it has received a BNDACK for each of those messages. 2983 While a BNDACK message MUST have been received for each BNDUPD mes- 2984 sage prior to the transmission of the UPDDONE message, this doesn't 2985 necessarily mean that all of the BNDUPD messages were accepted, only 2986 that all of them were responded to with a BNDACK message. Thus, a 2987 NAK (comprised of a BNDACK message containing a reject-reason option) 2988 could be used to reject a BNDUPD, but for the purposes of the UPDDONE 2989 message, such NAK would count as a response to the associated BNDUPD 2990 message, and would not block the eventual transmission of the UPDDONE 2991 message. 2993 The xid in an UPDDONE message MUST be identical to the xid in the 2994 UPDREQ or UPDREQALL message that initiated the update process. 2996 The UPDDONE message has no message specific options. 2998 7.5.1. Sending the UPDDONE message 3000 The UPDDONE message SHOULD be sent as soon as the last BNDACK message 3001 corresponding to a BNDUPD message requested by the UPDREQ or 3002 UPDREQALL is received from the server which sent the UPDREQ or 3003 UPDREQALL. The XID of the UPDDONE message MUST be the same as the 3004 XID of the corresponding UPDREQ or UPDREQALL message. 3006 7.5.2. Receiving the UPDDONE message 3008 A server receiving the UPDDONE message knows that all of the informa- 3009 tion that it requested by sending an UPDREQ or UPDREQALL message has 3010 now been sent and that it has recorded this information in its stable 3011 storage. It typically uses the receipt of an UPDDONE message to move 3012 to a different failover state. See sections 9.5.2 and 9.8.3 for 3013 details. 3015 7.6. POOLREQ message [1] 3017 The pool request (POOLREQ) message is used by the secondary server to 3018 request an allocation of IP addresses from the primary server. It 3019 MUST be sent by a secondary server to a primary server to request IP 3020 address allocation by the primary. The IP addresses allocated are 3021 transmitted using normal BNDUPD messages from the primary to the 3022 secondary. 3024 The POOLREQ message SHOULD be sent from the secondary to the primary 3025 whenever the secondary transitions into NORMAL state. It SHOULD 3026 periodically be resent in order that any change in the number of 3027 available IP addresses on the primary be reflected in the pool on the 3028 secondary. The period may be influenced by the secondary server's 3029 leasing activity. 3031 The POOLREQ message has no message specific options. 3033 7.6.1. Sending the POOLREQ message 3035 The POOLREQ message is sent. 3037 7.6.2. Receiving the POOLREQ message 3039 When a primary server receives a POOLREQ message it SHOULD examine 3040 the binding database and determine how many IP addresses the secon- 3041 dary server should have, and set these IP addresses to BACKUP state. 3042 It SHOULD then send BNDUPD messages concerning all of these IP 3043 addresses to the secondary server. 3045 Servers frequently have several kinds of IP addresses available on a 3046 particular network segment. The failover protocol assumes that both 3047 primary and secondary servers are configured in such a way that each 3048 knows the type and number of IP addresses on every network segment 3049 participating in the failover protocol. The primary server is 3050 responsible for allocating the secondary server the correct propor- 3051 tion of available IP addresses of each kind, and the secondary server 3052 is responsible for being configured in such a way that it can tell 3053 the kind of every IP address based solely on the IP address itself. 3055 A primary server MUST keep track of how many IP addresses were allo- 3056 cated as a result of processing the POOLREQ message, and send that 3057 number in the POOLRESP message. 3059 A primary server MAY choose to defer processing a POOLREQ message 3060 until a more convenient time to process it, but it should not depend 3061 on the secondary server to resend the POOLREQ message in that case. 3063 If a secondary server receives a POOLREQ message it SHOULD report an 3064 error. 3066 7.7. POOLRESP message [2] 3068 A primary server sends a POOLRESP message to a secondary server after 3069 the allocation process for available addresses to the secondary 3070 server is complete. Typically this message will precede some of the 3071 BNDUPD messages that the primary uses to send the actual allocated IP 3072 addresses to the secondary. 3074 The xid in the POOLRESP message MUST be identical to the xid in the 3075 POOLREQ message for which this POOLRESP is a response. 3077 7.7.1. Sending the POOLRESP message 3079 The POOLRESP message MUST contain the same xid as the corresponding 3080 POOLREQ message. 3082 Only one option MUST appear in a POOLREQ message: 3084 o addresses-transferred 3086 The number of addresses allocated to the secondary server by the 3087 primary server as a result of a POOLREQ is contained in the 3088 addresses-transferred option in a POOLRESP message. Note this 3089 is the number of addresses that are transferred to the secondary 3090 in the primary's binding database as a result of the correspond- 3091 ing POOLREQ message, and that it may be some time before they 3092 can all be transmitted to the secondary server through the use 3093 of BNDUPD messages. 3095 7.7.2. Receiving the POOLRESP message 3097 When a secondary server receives a POOLRESP message, it SHOULD send 3098 another POOLREQ message if the value of the addresses-transferred 3099 option is non-zero. 3101 Typically, no other action is taken on the reception of a POOLRESP 3102 message. 3104 7.8. CONNECT message [5] 3106 The connect message is used to establish an applications level con- 3107 nection over a newly created TCP connection. It gives the source 3108 information for the connection, and critical configuration informa- 3109 tion. It MUST be sent only by the primary server. Either server can 3110 initiate a TCP connection, but the CONNECT message is only sent by 3111 the primary server. 3113 The CONNECT message MUST be the first message sent down a newly esta- 3114 blished connection, and it MUST be sent only by the primary server. 3116 The following table summarizes the options that are associated with 3117 the CONNECT message: 3119 Option 3120 ------ 3121 sending-server-IP-address MUST 3122 max-unacked-bndupd MUST 3123 receive-timer MUST 3124 vendor-class-identifier MUST 3125 protocol-version MUST 3126 TLS-request MUST (1) 3127 MCLT MUST 3128 hash-bucket-assignment MUST 3130 (1) MUST NOT if CONNECT is being sent over a TLS connection 3132 Table 7.8-1: Options used in a CONNECT message 3134 7.8.1. Sending the CONNECT message 3136 The CONNECT message MUST be the first message sent by the primary 3137 server after the establishment of a new TCP connection with a secon- 3138 dary server participating in the failover protocol. 3140 The xid of the CONNECT message must be unique. 3142 The IP address of the primary server MUST be placed in the sending- 3143 server-IP-address option. This information is placed in an option 3144 inside of the message in order to allow the identity of the sender to 3145 be covered by a shared secret. 3147 The number of BNDUPD messages the primary server can accept without 3148 blocking the TCP connection MUST be placed in the max-unacked-bndupd 3149 option. This MUST be a number equal to or greater than 1, SHOULD be 3150 a number greater than 10, and SHOULD be a number less than 100. 3152 The length of the receive timer (tReceive, see section 8.3) MUST be 3153 placed in the receive-timer option. 3155 The MCLT MUST be placed in the MCLT option. 3157 The hash-bucket-assignment option MUST be included in the CONNECT 3158 message. In the event that load balancing is not configured for this 3159 server, the hash-bucket-assignment option will indicate that. The 3160 value of the hash-bucket-assignment option is determined from the 3161 specific buckets that the primary server has determined that the 3162 secondary server MUST service as part of the load-balancing algo- 3163 rithm. The way in which the primary server determines this 3164 information is outside the scope of this protocol definition. The 3165 primary server SHOULD be configured with a percentage of clients that 3166 the secondary server will be instructed to service, and the primary 3167 server SHOULD use the algorithm in [LOADB] to generate a Hash Bucket 3168 Assignment which it sends to the secondary server. 3170 The vendor class identifier MUST be placed in the vendor-class- 3171 identifier option. 3173 The protocol-version option MUST be included in every CONNECT mes- 3174 sage. The current value of the protocol version is 1. 3176 The TLS-request option MUST be sent and contains the desired TLS con- 3177 nection request as well as information concerning whether TLS is sup- 3178 ported. If this CONNECT message is being sent over a already 3179 created TLS connection, the TLS-request MUST NOT appear. 3181 7.8.2. Receiving the CONNECT message 3183 When a server receives a TCP connection on the failover port, if it 3184 is a PRIMARY server it should send a CONNECT message, and if it is a 3185 secondary server it should wait for a CONNECT message before sending 3186 any messages. To avoid denial of service attacks, a secondary should 3187 only wait for a CONNECT message on a new connection for a limited 3188 amount of time and close the connection if none is received during 3189 that time. 3191 When a secondary server receives a CONNECT message it should: 3193 1. Record the time at which the message was received. 3195 2. Examine the protocol-version option, and decide if this server 3196 is capable of interoperating with another server running that 3197 protocol version. If not, send the CONNECTACK message with 3198 the reject reason 14: "Protocol version mismatch". The server 3199 MUST include its protocol-version in the CONNECTACK message. 3201 3. Examine the TLS-request option. Figure out the TLS-reply 3202 value based on the capabilities and configuration of this 3203 server. If the result for the TLS-reply value is a 1 and the 3204 connection is accepted, indicating use of TLS, then immedi- 3205 ately send the CONNECTACK message and go into TLS negotiation. 3206 If the TLS-reply value implies rejection of the connection, 3207 then immediately send the CONNECTACK message with the TLS- 3208 reply value and the appropriate reject-reason option value. 3209 In all other cases, save the TLS-reply option information for 3210 the eventual CONNECTACK message. 3212 The possibilities for TLS-request and TLS-reply are: 3214 CONNECT CONNECTACK 3215 TLS TLS 3216 request reply 3217 Reject 3218 t1 t1 Reason Comments 3219 -- -- ------ -------- 3220 0 0 no TLS used 3221 0 1 11 primary won't use TLS, secondary requires TLS 3222 1 0 primary desires TLS, secondary doesn't 3223 1 1 primary desires TLS, secondary will use TLS 3224 2 0 9, 10 primary requires TLS and secondary won't 3225 2 1 primary requires TLS and secondary will use TLS 3227 4. Check to see if there is a message-digest option in the CON- 3228 NECT message. If there was, and the server does not support 3229 message-digests, then reject the connection with reject reason 3230 12: "Message digest not supported" in the CONNECTACK. If the 3231 server does support message-digests, then check this message 3232 for validity based on the message-digest, and reject it if the 3233 digest indicates the message was altered with reject reason 3234 20: "Message digest failed to compare". 3236 5. Determine if the sender (from the sending-server-IP-address 3237 option) and the implicit role of the sender (i.e., primary) 3238 represents a server with which the receiver was configured to 3239 engage in failover activity. This is performed after any TLS 3240 or message digest processing so that it occurs after a secure 3241 connection is created, to ensure that there is no tampering 3242 with the IP address of the partner. 3244 If not, then the receiving server should reject the CONNECT 3245 request by sending a CONNECTACK message with a reject-reason 3246 value of: 8, invalid failover partner. 3248 If it is, then the receiving failover endpoint should be 3249 determined. 3251 6. Decide if the time delta between the sending of the message, 3252 in the time field, and the receipt of the message, recorded in 3253 step 1 above, is acceptable. A server MAY require an arbi- 3254 trarily small delta in time values in order to set up a fail- 3255 over connection with another server. See section 5.10 for 3256 information on time synchronization. 3258 If the delta between the time values is too great, the server 3259 should reject the CONNECT request by sending a CONNECTACK mes- 3260 sage with a reject-reason of 4, time mismatch too great. 3262 If the time mismatch is not considered too great then the 3263 receiving server MUST record the delta between the servers. 3264 The receiving server MUST use this delta to correct all of the 3265 absolute times received from the other server in all time- 3266 valued options. Note that servers can participate in failover 3267 with arbitrarily great time mismatches, as long as it is more 3268 or less constant. 3270 7. Examine the MCLT option in the CONNECT request and use the 3271 value of the MCLT as the MCLT for this failover endpoint. 3273 The secondary server SHOULD be able to operate with any MCLT 3274 sent by the primary, but if it cannot, then it should send a 3275 CONNECTACK with a reject-reason of 5, MCLT mismatch. 3277 8. The server MUST store hash-bucket-assignment option for use 3278 during processing during NORMAL state. If this hash bucket 3279 assignment conflicts with the secondary server's configured 3280 hash bucket assignment for use in other than NORMAL state, the 3281 secondary server should send a CONNECTACK with a reject reason 3282 of 19, Hash bucket assignment conflict. 3284 9. The receiving server MAY use the vendor-class-identifier to do 3285 vendor specific processing. 3287 7.9. CONNECTACK message [6] 3289 The CONNECTACK message is sent to accept or reject a CONNECT message. 3290 It is sent by the secondary server which received a CONNECT message. 3292 Attempting immediately to reconnect after either receiving a CONNEC- 3293 TACK with a reject-reason or after sending a CONNECTACK with a 3294 reject-reason could yield unwanted looping behavior, since the reason 3295 that the connection was rejected may well not have changed since the 3296 last attempt. A simple suggested solution is to wait a minute or two 3297 after sending or receiving a CONNECTACK message with a reject-reason 3298 before attempting to reestablish communication. 3300 The following table summarizes the options associated with the CON- 3301 NECTACK message: 3303 Option accept reject 3304 ------ 3305 sending-server-IP-address MUST MUST 3306 max-unacked-bndupd MUST MUST NOT 3307 receive-timer MUST MUST NOT 3308 vendor-class-identifier MUST MUST NOT 3309 protocol-version MUST MUST 3310 TLS-reply (1) (2) 3311 reject-reason MUST NOT MUST 3312 message MUST NOT SHOULD 3313 MCLT MUST NOT MUST NOT 3314 hash-bucket-assignment MUST NOT MUST NOT 3316 (1) MUST NOT if sending CONNECTACK after TLS negotiation, MUST 3317 if TLS-request in CONNECT, else MUST NOT. 3318 (2) MUST if TLS-request in CONNECT message, else MUST NOT. 3320 Table 7.9-1: Options used in a CONNECTACK message 3322 7.9.1. Sending the CONNECTACK message 3324 The xid of the CONNECTACK message MUST be that of the corresponding 3325 CONNECT message. 3327 The IP address of the sending server MUST be placed in the sending- 3328 server-IP-address option. This information is placed in an option 3329 inside of the message in order to allow the identity of the sender to 3330 be covered by a shared secret. 3332 The protocol-version option MUST be included in every CONNECTACK mes- 3333 sage. The current value of the protocol version is 1. 3335 If the connection has been rejected, the reject-reason option MUST be 3336 placed in the CONNECTACK message with an appropriate reason, and a 3337 message option SHOULD be included with a human-readable error message 3338 describing the reason for the rejection in some detail. If the 3339 reject-reason option appears, then the remaining options listed below 3340 do not appear. The sending server should close the connection after 3341 sending the CONNECTACK if the connection was rejected. 3343 The results of the TLS negotiation MUST be placed in the TLS-reply 3344 option. If this CONNECTACK message is being sent over an already TLS 3345 secured connection, then there MUST NOT be a TLS-reply option. 3347 If there was a message-digest option in the CONNECT message, then 3348 there MUST be a message-digest in the CONNECTACK message and any sub- 3349 sequent messages if the CONNECTACK does not contain a reject-reason. 3351 The number of BNDUPD messages the server can accept without blocking 3352 the TCP connection MUST be placed in the max-unacked-bndupd option. 3353 This SHOULD be a number greater than 10, and SHOULD be a number less 3354 than 100. 3356 The length of the receive timer (tReceive, see section 8.3) MUST be 3357 placed in the receive-timer option. 3359 The vendor class identifier MUST be placed in the vendor-class- 3360 identifier option. 3362 After a connection is created (either by sending a CONNECTACK message 3363 to the first CONNECT message, or sending a CONNECTACK message to a 3364 CONNECT message received over a TLS connection), the server MUST send 3365 a STATE message. 3367 After a connection is created, the server MUST start two timers for 3368 the connection: tSend and tReceive. The tSend timer SHOULD be 3369 approximately 33 percent of the time in the receiver-timer option in 3370 the corresponding CONNECT message. The tReceive timer SHOULD be the 3371 time sent in the receiver-timer option in the CONNECTACK message. 3373 The tReceive timer is reset whenever a message is received from this 3374 TCP connection. If it ever expires, the TCP connection is dropped 3375 and communications with this partner is considered not ok. The 3376 reject reason 17: "No traffic within sufficient time" is placed in 3377 the DISCONNECT message sent prior to dropping the TCP connection. 3379 The tSend timer is reset whenever a message is sent over this connec- 3380 tion. When it expires, a CONTACT message MUST be sent. 3382 7.9.2. Receiving the CONNECTACK message 3384 If a CONNECTACK message is received with a different XID from the one 3385 in the CONNECT that was sent, it SHOULD be ignored. 3387 When a CONNECTACK message is received, the following actions should 3388 be taken: 3390 1. Record the time the message was received. 3392 2. Check to see if the xid on the CONNECTACK matches an outstand- 3393 ing CONNECT message on this TCP connection. 3395 3. Check to see if there is a reject-reason option in the 3396 CONNECTACK message. If not, continue with step 3. If there 3397 is a reject-reason option, the server SHOULD report the error 3398 code. If a message option appears a server SHOULD display the 3399 string from the message option in a user visible way. The 3400 server MUST close the connection if a reject-reason option 3401 appears. 3403 4. Check the value of the TLS-reply option (if any, which there 3404 won't be if this CONNECT is taking place utilizing TLS), and 3405 if it was 1, then skip processing of the rest of the CONNEC- 3406 TACK message, and immediately enter into TLS connection setup. 3408 This step occurs prior to steps 5 and 6 in order to allow 3409 creation of a secure connection (if required) prior to pro- 3410 cessing the protocol version and IP address information. 3412 5. Examine the value of the protocol-version option. If this 3413 server is able to establish connections with another server 3414 running this protocol version, then continue, else close the 3415 connection. 3417 6. Decide if the time delta between the sending of the message, 3418 in the time field, and the receipt of the message, recorded in 3419 step 1 above, is acceptable. A server MAY require an arbi- 3420 trarily small delta in time values in order to set up a fail- 3421 over connection with another server. 3423 If the delta between the time values is too great, the server 3424 should drop the TCP connection. 3426 If the time mismatch is not considered too great then the 3427 receiving server MUST record the delta between the servers. 3428 The receiving server MUST use this delta to correct all of the 3429 absolute times received from the other server in all time- 3430 valued options. Note that the failover protocol is con- 3431 structed so that two servers can be failover partners with 3432 arbitrarily great time mismatches. 3434 7. The receiving server MAY use the vendor-class-identifier to do 3435 vendor specific processing. 3437 8. After accepting a CONNECTACK message, the server MUST send a 3438 STATE message. 3440 After receiving a CONNECTACK message, the server MUST start 3441 two timers for the connection: tSend and tReceive. The tSend 3442 timer SHOULD be approximately 20 percent of the time in the 3443 receiver-timer option in the corresponding CONNECTACK message. 3445 The tReceive timer SHOULD be set to the time sent in the 3446 receiver-timer option in the CONNECT message. 3448 The tReceive timer is reset whenever a message is received 3449 from this TCP connection. If it ever expires, the TCP connec- 3450 tion is dropped and communications with this partner is con- 3451 sidered not ok. The reject reason 17: "No traffic within suf- 3452 ficient time" is placed in the DISCONNECT message sent prior 3453 to dropping the TCP connection. 3455 The tSend timer is reset whenever a message is sent over this 3456 connection. When it expires, a CONTACT message MUST be sent. 3458 7.10. STATE message [10] 3460 The state (STATE) message is used to communicate the current failover 3461 state to the partner server. 3463 The STATE message MUST be sent after sending a CONNECTACK message 3464 that didn't contain a reject-reason option, and MUST be sent after 3465 receiving a CONNECTACK message without a reject-reason option. 3467 A STATE message MUST be sent whenever the failover endpoint changes 3468 its failover state and a connection exists to the partner. 3470 The STATE message requires no response from the failover partner. 3472 The following table shows the options that MUST appear in a STATE 3473 message: 3475 Option 3476 ------ 3477 sending-state MUST 3478 server-flags MUST 3479 start-time-of-state MUST 3481 Table 7.10-1: Options used in a STATE message 3483 7.10.1. Sending the STATE message 3485 The current failover state is placed in the server-state option and 3486 the current state of the STARTUP flag is placed in the server-flags 3487 option. 3489 The message is sent with a unique xid. 3491 A server SHOULD only send the STATE message either when the connec- 3492 tion is created (i.e, after sending or receiving a CONNECTACK message 3493 with no reject-reason option), or when there is a change from the 3494 values sent in a previous STATE message. 3496 7.10.2. Receiving the STATE message 3498 Every STATE message SHOULD indicate a change in state or a change in 3499 the flags. 3501 When a STATE message is received, any state transitions specified in 3502 section 9 are taken. 3504 No response to a STATE message is required. 3506 7.11. CONTACT message [11] 3508 The contact (CONTACT) message is sent to verify communications 3509 integrity with a failover partner. The CONTACT message is sent when 3510 no messages have been sent to the failover partner for a specified 3511 period of time. This is determined by the tSend timer expiring (see 3512 section 8.3). 3514 The CONTACT message has no message specific options. 3516 7.11.1. Sending the CONTACT message 3518 The CONTACT message is sent. 3520 7.11.2. Receiving the CONTACT message 3522 When a CONTACT message is received, the tReceive timer is reset (as 3523 it is with any message that is received). 3525 A server SHOULD use the time in the time field and the time the mes- 3526 sage was received to refine the delta time calculations between the 3527 servers. 3529 7.12. DISCONNECT message [12] 3531 The DISCONNECT is the last message sent over a connection before 3532 dropping an established connection (note that an established connec- 3533 tion is one where a CONNECTACK has been sent without a reject rea- 3534 son). 3536 After sending or receiving a DISCONNECT message, a server needs to 3537 have some mechanism to prevent an error loop. Simply reconnecting to 3538 the partner immediately is not the best option, especially after 3539 several consecutive attempts. 3541 A simple suggested solution is to wait a minute or two after sending 3542 or receiving a DISCONNECT before attempting to reestablish communica- 3543 tion. 3545 The DISCONNECT message MUST be the last message sent down a connec- 3546 tion before it is closed. 3548 The following table summarizes the options that are associated with 3549 the DISCONNECT message: 3551 Option 3552 ------ 3553 reject-reason MUST 3554 message SHOULD 3556 Table 7.12-1: Options used in a DISCONNECT message 3558 7.12.1. Sending the DISCONNECT message 3560 The DISCONNECT message MUST be the last message sent by the a server 3561 which is dropping a TCP connection. 3563 The xid of the DISCONNECT message must be unique. 3565 The reject-reason option MUST appear giving a reason why the connec- 3566 tion was dropped. A message option SHOULD appear giving a human 3567 readable error message with possibly more details. 3569 7.12.2. Receiving the DISCONNECT message 3571 When a server receives a DISCONNECT message it should log the message 3572 if there was one and possibly raise an alarm of some sort if the 3573 reject reason was one that was sufficiently serious. 3575 8. Connection Management 3577 Servers participating in the failover protocol communicate over TCP 3578 connections. These TCP connections are used both to transmit bind- 3579 ing information from one server to another as well as to allow each 3580 server to determine whether communications is possible with the other 3581 server. 3583 Central to the operation of the failover protocol is a notion of 3584 "communications okay" or "communications failed". Failover state 3585 transitions are taken in many cases when the status of communications 3586 with the partner changes, and the existence or non-existence of a TCP 3587 connections between failover endpoints is used to determine if com- 3588 munications is "okay" or "failed". 3590 A single TCP connection exists which connects two failover endpoints. 3592 8.1. Connection granularity 3594 There exists one TCP connection between each set of failover end- 3595 points. See section 5.1.1 for an explanation of failover endpoints. 3597 There are a maximum of two TCP connections between any two servers 3598 implementing the failover protocol, one for each of the possible 3599 failover endpoints between these two servers. There is a minimum of 3600 one TCP connection between one server and every other failover server 3601 with which it implements the failover protocol. 3603 8.2. Creating the TCP connection 3605 There are two ports used for initiating TCP connections, correspond- 3606 ing to the two roles that a server can fill with respect to another 3607 server. Every server implementing the failover protcol MUST listen 3608 on at least one of these ports. Port 647 is the port to which pri- 3609 mary servers will attempt a connection, and port 847 is the port to 3610 which secondary servers will attempt a connection. When a connection 3611 attempt is received on port 647 it is therefore from a primary 3612 server, and it is attempting to connect to this server to become a 3613 secondary server for it. Likewise, when an attempt to connect is 3614 received on port 847 the connection attempt is from a secondary 3615 server, and it is attempting to connect to this server to be a pri- 3616 mary server. The source port of any TCP connection is unimportant. 3617 See the schematic representation below: 3619 Primary Server 3620 -------------- 3621 Listens on port 847 for secondary server to connect to it 3622 Periodically connects on port 647 to contact secondary 3624 Secondary Server 3625 -------------- 3626 Listens on port 647 for primary server to connect to it 3627 Periodically connects on port TDB to contact primary 3629 Every server implementing the failover protocol SHOULD attempt to 3630 connect to all of its partners periodically, where the period is 3631 implementation dependent and SHOULD be configurable. In the event 3632 that a connection has been rejected by a CONNECTACK message with a 3633 reject-reason option contained in it or a DISCONNECT message, a 3634 server SHOULD reduce the frequency with which it attempts to connect 3635 to that server but it SHOULD continue to attempt to connect periodi- 3636 cally. 3638 If a connection attempt has been received from another server in a 3639 particular role (i.e., from a specific failover endpoint) then the 3640 receiving server MUST NOT initiate a connection attempt to the 3641 partner server in that same role. 3643 If both servers happen to attempt to connect simultaneously, the 3644 secondary server MUST drop its attempt in favor of the primary's 3645 attempt. Thus, in the event that a secondary server receives a con- 3646 nection attempt to port 647 from a primary server when it has already 3647 initiated a connection attempt to port 847 on the same primary 3648 server, it MUST accept the connection to port 647 and it MUST drop 3649 drop the connection attempt to port 847. In the event that a primary 3650 server receives a connection attempt to port 847 from a secondary 3651 server when it has already initiated a connection attempt to port 647 3652 on that same server, it MUST reject the connection attempt to port 3653 847 and continue to pursue the connection attempt on port 647. 3655 Once a connection is established, the primary server MUST send a CON- 3656 NECT message across the connection. A secondary server MUST wait for 3657 the CONNECT message from a primary server. 3659 Every CONNECT message includes a TLS-request option, and if the CON- 3660 NECTACK message does not reject the CONNECT message and the TLS-reply 3661 option says TLS MUST be used, then the servers will immediately enter 3662 into TLS negotiation. 3664 Once TLS negotiation is complete, the primary server MUST resend the 3665 CONNECT message on the newly secured TLS connection and then wait for 3666 the CONNECTACK message in response. The TLS-request and TLS-reply 3667 options MUST NOT appear in either this second CONNECT or its associ- 3668 ated CONNECTACK message as they had in the first messages. 3670 The second message sent over a new connection (either a bare TCP con- 3671 nection or a connection utilizing TLS) is a STATE message. Upon the 3672 receipt of this message, the receiver can consider communications up. 3674 It is entirely possible that two servers will attempt to make connec- 3675 tions to each other essentially simultaneously, and in this case the 3676 secondary server will be waiting for a CONNECT message on each con- 3677 nection. The primary server MUST send a CONNECT message over one 3678 connection and it MUST close the other connection. 3680 A secondary server MUST NOT respond to the closing of a TCP connec- 3681 tion with a blind attempt to reconnect -- there may be another TCP 3682 connection to the same failover partner already in use. 3684 8.3. Using the TCP connection for determining communications status 3686 The TCP connection is used to determine the communications status of 3687 the other server, i.e., communications-ok, or communications- 3688 interrupted. 3690 Three things must happen for a server to consider that communications 3691 are ok with respect to another server: 3693 1. A TCP connection must be established to the other server. 3695 2. A CONNECT message must be received and a CONNECTACK message 3696 sent in response. The CONNECT message is used to determine 3697 the identify of the failover endpoint of the other end of the 3698 TCP connection -- without it, the failover endpoint cannot be 3699 uniquely determined. Without knowledge of the failover end- 3700 point, then the entity with which communications is ok is 3701 undetermined. 3703 3. A STATE message must be received from the other server over 3704 the connection. This STATE message initializes important 3705 information necessary to the operation of the state machine 3706 the governs the behavior of this failover endpoint. 3708 There are two ways that a server can determine that communications 3709 has failed: 3711 1. The TCP connection can go down, yielding an error when 3712 attempting to send or receive a message. This will happen at 3713 least as often as the period of the tSend timer. 3715 2. The tReceive timer can expire. 3717 In either of these cases, communications is considered interrupted. 3719 If the tReceive timer expires, the connnection MUST be dropped. The 3720 reject reason 17: "No traffic within sufficient time" is placed in 3721 the DISCONNECT message sent prior to dropping the TCP connection. 3723 Several difficulties arise when trying to use one TCP connection for 3724 both bulk data transfer as well as to sense the communications status 3725 of the other server. One aspect of the problem stems from the dif- 3726 ferent requirements of both uses. The bulk data transfer is of 3727 course critically important to the protocol, but the speed with which 3728 it is processed is not terribly significant. It might well be 3729 minutes before a BNDUPD message is processed, and while not optimal, 3730 such an occasional delay doesn't compromise the correctness of the 3731 protocol. However, the speed with which one server detects the other 3732 server is up (or, more importantly, down) is more highly constrained. 3733 Generally one server should be able to detect that the other server 3734 is not communicating within a minute or less. 3736 These differing time constraints makes it difficult to use the same 3737 TCP connection for data transfer as well as to sense communications 3738 integrity. See section 3.5 for additional details on TCP. 3740 The solution to this problem is to require that some message be 3741 received by each end of the connection within a limited time or that 3742 the connection will be considered down. If no messages have been 3743 sent recently, then a CONTACT message is sent. 3745 In the case where there is no data queued to be sent, this is not a 3746 problem, but in the case where there is data queued to be sent to the 3747 partner, then the CONTACT message will not actually be transmitted 3748 until the queued data is sent. Section 3.5 explains why waiting for 3749 TCP to determine that the connection is down is not acceptable, and 3750 leads a requirement that the receiving server never block the sending 3751 server from sending CONTACT messages. 3753 In order to meet this requirement, each server tells the other server 3754 the number of outstanding BNDUPD messages that it will accept. The 3755 receiving server is required to always be able to accept that many 3756 BNDUPD messages off of the connection's input queue even if it cannot 3757 process them immediately, and to accept all other messages immedi- 3758 ately. 3760 Thus, the sending server's TCP is never blocked from sending a mes- 3761 sage except for very short periods, less than a few seconds unless 3762 the network connection itself has problems. In this case, if the 3763 CONTACT messages don't make it to the partner then the partner will 3764 close the connection. 3766 DISCUSSION: 3768 When implementing this capability, one needs to be careful when 3769 sending any message on the TCP connection as TCP can easily block 3770 the server if the local TCP send buffers are full. This can't be 3771 prevented because if the receiver is not reachable (via the 3772 network), the sending TCP can't send and thus it will be unable to 3773 empty the local TCP send buffers. So, all send operations either 3774 need to assume they may block for some time or non-blocking sends 3775 must be used. 3777 8.4. Using the TCP connection for binding data 3779 Binding data, in the form of BNDUPD messages and BNDACK messages to 3780 respond to them, are sent across the TCP connection. 3782 In order to support timely detection of any failure in the partner 3783 server, the TCP connection MUST NOT block for more than a very short 3784 time, on the order of a few seconds. Therefore, a server that is 3785 sending BNDUPD messages MUST send only a restricted number before 3786 receiving BNDACK messages about previous messages sent. 3788 The number of outstanding BNDUPD messages that each server will 3789 accept without causing TCP to block transmission of additional data 3790 (i.e, CONTACT messages) is sent by each server in the CONNECT and 3791 CONNECTACK messages in the max-unacked-bndupd option. 3793 8.5. Using the TCP connection for control messages 3795 The TCP connection is used for control messages: POOLREQ, UPDREQ, 3796 STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL- 3797 RESP, UPDDONE. A server MUST immediately accept all of these mes- 3798 sages from the TCP connection. A server MUST immediately accept any 3799 BNDACK which is received as well. 3801 8.6. Losing the TCP connection 3803 When the TCP connection is lost, then communications is not ok with 3804 the other server. A server which has lost communications SHOULD 3805 immediately attempt to reconnect to the other server, and should 3806 retry these connection attempts periodically. 3808 An acknowledgement message (BNDACK, POOLRESP, UPDDONE) message can 3809 only be sent in response to a request message (BNDUPD, POOLREQ, 3810 UPDREQ, UPDREQALL) on the same TCP connection from which the request 3811 was received, in part since the XID's in the request messages are 3812 guaranteed unique only during the life of a single TCP connection. 3814 When a connection to a partner server goes down, a server with unpro- 3815 cessed request messages MAY simply drop all of those messages, since 3816 it can be sure that the partner will resend them when they are next 3817 in communications. A server with unprocessed BNDUPD messages when a 3818 TCP connection goes down MAY instead choose to process those BNDUPD 3819 messages, but it MUST NOT send any BNDACK messages in response (again 3820 because of the issues surrounding XID uniqueness). 3822 When the TCP connection is closed explicitly, the DISCONNECT message 3823 with a reject-reason option (and, ideally, a message option) MUST be 3824 sent over the TCP connection. 3826 9. Failover Endpoint States 3828 This section discusses the various states that a failover endpoint 3829 may take, and the server actions required when entering the state, 3830 operating in the state, and leaving the state, as well as the events 3831 that cause transitions out of the state into another state. 3833 The state transition diagram in Figure 9.2-1 is relevant for this 3834 section. This is the common state transition diagram for both servers 3835 in a failover pair. In the event that the textual description of a 3836 state differs from the state transition diagram, the textual descrip- 3837 tion is to be considered authoritative. 3839 9.1. Server Initialization 3841 When a server starts it starts out in STARTUP state. See section 9.3 3842 below for details. 3844 9.2. Server State Transitions 3846 Whenever a server transitions into a new state, it MUST record the 3847 state and the time at which it entered that state in stable storage. 3848 If communications is "ok", it MUST also send a STATE message to its 3849 failover partner. 3851 Figure 9.2-1 is the diagram of the server state transitions. The 3852 remainder of this section contains information important to the 3853 understanding of that diagram. 3855 The server stays in the current state until all of the actions speci- 3856 fied on the state transition are complete. If communications fails 3857 during one of the actions, the server simply stays in the current 3858 state and attempts a transition whenever the conditions for a transi- 3859 tion are later fulfilled. 3861 In the state transition diagram below, the "+" or "-" in the upper 3862 right corner of each state is a notation about whether communication 3863 is ongoing with the other server. 3865 The legend "responsive", "balanced", or "unresponsive" in each state 3866 indicates whether the server is responsive to all DHCP client 3867 requests, running in load balanced mode, or totally unresponsive in 3868 the respective state. The terms "responsive" and "unresponsive" have 3869 the obvious meanings, while "balanced" means that a DHCP server may 3870 respond to all DHCPREQUEST messages that are RENEWAL or REBINDING, 3871 and to all other messages from clients for which the load balancing 3872 algorithm indicates that it MUST respond to. See sections 5.3 and 3873 9.6.2 for details on load balancing. 3875 In the state transition diagram below, when communication is reesta- 3876 blished between the two servers, each must record the state of the 3877 partner when communication was restored. State transitions on one 3878 server in some cases imply state transitions on the partner server, 3879 so a record of the current state of the partner server must be kept 3880 by each server. 3882 If the state of the partner changes while communicating a server 3883 moves through the communications-failed transition and into whatever 3884 state results. It then immediately moves through whatever state 3885 transition is appropriate given the current state of the partner 3886 server. A server performing this operation SHOULD NOT close the TCP 3887 connection to its partner. 3889 DISCUSSION: 3891 The point of this technique is simplicity, both in explanation of 3892 the protocol and in its implementation. The alternative to this 3893 technique of memory of partner state and automatic state transi- 3894 tion on change of partner state is to have every state in the fol- 3895 lowing diagram have a state transition for every possible state of 3896 the partner. With the approach adopted, only the states in which 3897 communications are reestablished require a state transition for 3898 each possible partner state. 3900 The current state of a server MUST be recorded in stable storage and 3901 thus be available to the server after a server restart. 3903 A transition into SHUTDOWN or PAUSED state is not represented in the 3904 following figure, since other than sending that state to its partner, 3905 the remaining actions involved look just like the server halting in 3906 its otherwise current state, which then becomes the previous state 3907 upon server restart. 3909 +---------------+ V +--------------+ 3910 | RECOVER -|+| | | STARTUP - | 3911 +->+(unresponsive) | +->+(unresponsive)| 3912 | +------+--------+ +--------------+ 3913 | Comm. OK +-----------------+ 3914 Comm. Other State:RECOVER | PARTNER DOWN - +<----------------------+ 3915 Fail | RESOLUTION-INTER. | (responsive) | ^ 3916 | All POTENTIAL- +----+------------+ +--------------+ | 3917 | Others CONFLICT------------ | --------+ | RESOLUTION -| | 3918 | | CONFLICT-DONE Comm. OK | | INTERRUPTED | | 3919 [UPDREQALL Other State: | +-+ (responsive) | | 3920 [Wait UPDDONEE | | | | +------+------++ | 3921 [Wait MCLT from fail RECOVER All Others| Comm.OK ^ | | 3922 +---+----------+ | V V V Comm. Ext. | 3923 |RECOVER-DONE +| +--+ +---+-----+--+-+ Failed Cmd----->+ 3924 |(unresponsive)| | | POTENTIAL + +------+ | 3925 +------+-------+ Wait for +>+ CONFLICT +-Pri. Resolve Comm. | 3926 Comm. OK Other | |(unresponsive)| Conflict CHANGED | 3927 +--Other State:-+ State: | +----+--------++ V V | | 3928 | | | RECOVER | | ^ ++----------+---++ | 3929 | All POTENT. DONE | Sec. Resolve | |CONFLICT-DONE-|+| | 3930 | Others: CONFLICT-- | ----+ Conflict(9.8) | | (responsive) | | 3931 | Wait for V V | +------+---------+ | 3932 | Other State: NORMAL ++------------+---+ Other State: NORMAL | 3933 | V | NORMAL + +<--------------+ | 3934 | +--+----------+-->+ (balanced) +-------External Command--->+ 3935 | ^ ^ +--------+--------+ or Other State: | 3936 | | | | | SHUTDOWN | 3937 | Wait for Comm. OK Comm. Failed or | | 3938 | Other Other Other State: PAUSED | External 3939 | State: State: | | Command 3940 |RECOVER-DONE NORMAL Start Safe Comm. OK or 3941 | | COMM. INT. Period Timer Other State: Safe 3942 | Comm. OK. | V All Others Period 3943 | Other State: | +---------+--------+ | expiration 3944 | RECOVER +--+ COMMUNICATIONS - +----+ | 3945 V +-------------+ INTERRUPTED | | 3946 RECOVER | (responsive) +-------------------------->+ 3947 RECOVER-DONE--------->+------------------+ 3949 Figure 9.2-1: Server state diagram. 3951 9.3. STARTUP state 3953 The STARTUP state affords an opportunity for a server to probe its 3954 partner server, before starting to service DHCP clients. 3956 DISCUSSION: 3958 Without the STARTUP state, a server would likely start in a state 3959 derived from its previously stored state (held in stable storage), 3960 if any. However, this may be inconsistent with the current state 3961 of the partner. The STARTUP state affords the opportunity for a 3962 server to potentially learn the partner's state and determine if 3963 that state is consistent with its derived starting state or 3964 whether some significant state change has occurred at the partner 3965 that forces the server to start in another state. This is 3966 especially critical if significant time has elapsed while the 3967 server was down. 3969 9.3.1. Operation while in STARTUP state 3971 Whenever a server is in STARTUP state, it MUST be unresponsive to 3972 DHCP client requests, and so the time spent in the STARTUP state is 3973 necessarily short, typically on the order of a few seconds to a few 3974 tens of seconds. The exact time spent in the STARTUP state is imple- 3975 mentation dependent, and the primary and secondary server are not 3976 required to spend the same amount of time in the STARTUP state. 3978 Whenever a STATE message is sent to the partner while in STARTUP 3979 state the STARTUP bit MUST be set in the server-flags option and the 3980 previously recorded failover state MUST be placed in the server-state 3981 option. 3983 9.3.2. Transition out of STARTUP state 3985 Each server starts out in startup state every time it initializes 3986 itself, and performs the following algorithm as part of its initiali- 3987 zation: 3989 1. Is there any record in stable storage of a previous failover 3990 state? If yes, set previous-state to the last recorded state 3991 in stable storage, and continue with step 2. 3993 Is there any configuration information that indicates that 3994 this server was previously running but lost its stable 3995 storage? Such information must typically come from some 3996 administrative intervention, since it is difficult for a 3997 server to distinguish first startup from a startup after it 3998 has lost its stable storage. If yes, then set the previous- 3999 state to RECOVER, and set the time-of-failure to whatever time 4000 was configured, and go on to step 2. This time-of-failure 4001 will be used in the transition out of the RECOVER state into 4002 the RECOVER-DONE state, below. 4004 If there is no record of any previous failover state in stable 4005 storage nor of any previous operational activity for this 4006 server, then set the previous-state to PARTNER-DOWN if this 4007 server is a primary and RECOVER if this server is a secondary, 4008 and set the time-of-failure to a time before the maximum- 4009 client-lead-time before now. If using standard Posix times, 0 4010 would typically do quite well. 4012 2. If the previous state is one where communications was "OK", 4013 then set the previous state to the state that is the result of 4014 the communiations failed state transition in Figure 9.2-1 (if 4015 any -- some states both). 4017 3. Start the STARTUP state timer. The time that a server remains 4018 in the STARTUP state (absent any communications with its 4019 partner) is implementation dependent and SHOULD be configur- 4020 able. It SHOULD be long enough for a TCP connection to be 4021 created to a heavily loaded partner across a slow network. 4023 4. Attempt to create a TCP connection to the failover partner. 4024 See section 8.2. 4026 5. Wait for "communications okay", i.e., the process discussed in 4027 section 8.2 "Creating the TCP Connection", to complete, 4028 including the receipt of a STATE message from the partner. 4030 When and if communications become "okay", clear the STARTUP 4031 flag, and set the current state to the previous-state. 4033 If the partner is in PARTNER-DOWN state, and if the time at 4034 which it entered PARTNER-DOWN state (as received in the 4035 start-time-of-state option in the STATE message) is later than 4036 the last recorded time of operation of this server, then set 4037 the current state to RECOVER. If the time at which it entered 4038 PARTNER-DOWN state is earlier than the last recorded time of 4039 operation of this server, then set the current state to 4040 POTENTIAL-CONFLICT. 4042 Then, transition to the current state and take the "communica- 4043 tions okay" state transition based on the current state of 4044 this server and the partner. 4046 6. If the startup time expires, take an implementation dependent 4047 action: The server MAY go to the previous-state, or the 4048 server MAY wait. 4050 Reasons to go to previous-state and begin processing: 4052 If the current server is the only operational server, then if 4053 it waits, there will be no operational DHCP servers. This 4054 situation could occur very easily where one server fails and 4055 then the other crashes and reboots. If the rebooting server 4056 doesn't start processing DHCP client requests without first 4057 being in communication with the other server, then the level 4058 of DHCP redundancy is not particularly high. This is an 4059 appropriate approach if the possibility of partition is low, 4060 or if the safe period expiration time is well beyond the time 4061 at which an operator would notice and react to a partition 4062 situation. It is also quite appropriate if the safe period 4063 will never expire. 4065 Reasons to wait: 4067 If the current server has been down for longer than the 4068 maximum-client-lead-time, and it is partitioned from the other 4069 server, then when it returns it will attempt to use its own 4070 available addresses to allocate to new DHCP clients, and the 4071 other server may well be in PARTNER-DOWN state and may have 4072 already allocated some of those available addresses to DHCP 4073 clients. In cases where the possibility of partition is high, 4074 and the safe period expiration time is less than the likely 4075 operator reaction time, this is a good approach to use. 4077 9.4. PARTNER-DOWN state 4079 PARTNER-DOWN state is a state either server can enter. When in this 4080 state, the server does not assume that the other server could still 4081 be operating and servicing a different set of clients, but instead 4082 assumes that it is the only server operating. If one server is in 4083 PARTNER-DOWN state, the other server MUST NOT be operating. 4085 9.4.1. Upon entry to PARTNER-DOWN state 4087 No special actions are required when entering PARTNER-DOWN state. 4089 The server should continue to attempt to connect to the partner 4090 periodically. 4092 9.4.2. Operation while in PARTNER-DOWN state 4094 A server in PARTNER-DOWN state MUST respond to DHCP client requests. 4095 It will allow renewal of all outstanding leases on IP addresses, and 4096 will allocate IP addresses from its own pool, and after a fixed 4097 period of time (the MCLT interval) has elapsed from entry into 4098 PARTNER-DOWN state, it will allocate IP addresses from the set of all 4099 available IP addresses. 4101 Once a server has entered NORMAL state, the PARTNER-DOWN state is 4102 entered only on command of an external agency (typically an adminis- 4103 trator of some sort) or after the expiration of an externally config- 4104 ured minimum safe-time after the beginning of COMMUNICATIONS- 4105 INTERRUPTED state. 4107 Any available IP address tagged as available for allocation by the 4108 other server (at entry to PARTNER-DOWN state) MUST NOT be allocated 4109 to a new client until the maximum-client-lead-time beyond the entry 4110 into PARTNER-DOWN state has elapsed. 4112 A server in PARTNER-DOWN state MUST NOT allocate an IP address to a 4113 DHCP client different from that to which it was allocated at the 4114 entrance to PARTNER-DOWN state until the maximum-client-lead-time 4115 beyond the maximum of the following times: client expiration time, 4116 most recently transmitted potential-expiration-time, most recently 4117 received ack of potential-expiration-time from the partner, and most 4118 recently acked potential-expiration-time to the partner. See section 4119 7.1.5 for details. If this time would be earlier than the current 4120 time plus the maximum-client-lead-time, then the time the server 4121 entered PARTNER-DOWN state plus the maximum-client-lead-time is used. 4123 Two options exist for lease times given out while in PARTNER-DOWN 4124 state, with different ramifications flowing from each. 4126 If the server wishes the Failover protocol to protect it from loss of 4127 stable storage in PARTNER-DOWN state, then it should ensure that the 4128 MCLT based lease time restrictions in Section 5.1 are maintained, 4129 even in PARTNER-DOWN state. 4131 If the server wishes to forego the protection of the Failover proto- 4132 col in the event of loss of stable storage, then it need recognize no 4133 restrictions on actual client lease times while in PARTNER-DOWN 4134 state. 4136 A server in PARTNER-DOWN state MUST continue to attempt to establish 4137 communications and synchronization with its partner. 4139 9.4.3. Transitions out of PARTNER-DOWN state 4141 When a server in PARTNER-DOWN state succeeds in establishing a con- 4142 nection to its partner, its actions are conditional on the state and 4143 flags received in the STATE message from the other server as part of 4144 the process of establishing the connection. 4146 If the STARTUP bit is set in the server-flags option of a received 4147 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 4148 transitions based on reestablishing communications. Essentially, if a 4149 server is in PARTNER-DOWN state, it ignores all STATE messages from 4150 its partner that have the STARTUP bit set in the server-flags option 4151 of the STATE message. 4153 If the STARTUP bit is not set in the server-flags option of a STATE 4154 message received from its partner, then a server in PARTNER-DOWN 4155 state takes the following actions based on the value of the server- 4156 state option in the received STATE message: 4158 o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN, 4159 POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE 4160 state 4162 transition to POTENTIAL-CONFLICT state 4164 o partner in RECOVER, SHUTDOWN, PAUSED state 4166 stay in PARTNER-DOWN state 4168 o partner in RECOVER-DONE state 4170 transition into NORMAL state 4172 9.5. RECOVER state 4174 This state indicates that the server has no information in its stable 4175 storage or that it is re-integrating with a server in PARTNER-DOWN 4176 state after it has been down. A server in this state MUST attempt to 4177 refresh its stable storage from the other server. 4179 9.5.1. Operation in RECOVER state 4181 A server in RECOVER MUST NOT respond to DHCP client requests. 4183 A server in RECOVER state will attempt to reestablish communications 4184 with the other server. 4186 9.5.2. Transitions out of RECOVER state 4188 If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, 4189 or CONFLICT-DONE state when communications are reestablished, then 4190 the server in RECOVER state will move to POTENTIAL-CONFLICT state 4191 itself. 4193 If the other server is in RECOVER state, then this server SHOULD sig- 4194 nal an error and halt processing. 4196 If the other server is in any other state, then the server in RECOVER 4197 state will request an update of missing binding information by send- 4198 ing an UPDREQ message. If the server has been instructed (through 4199 configuration or other external agency) that it has lost its stable 4200 storage, or if it has deduced that from the fact that it has no 4201 record of ever having talked to its partner, while its partner does 4202 have a record of communicating with it, it MUST send an UPDREQALL 4203 message, otherwise it MUST send an UPDREQ message. 4205 It will wait for an UPDDONE message, and upon receipt of that message 4206 it will start a timer whose expiration is set to a time equal to the 4207 time the server went down (if known) or the time the server started 4208 (if the down-time is unknown) plus the maximum-client-lead-time. 4209 When this timer goes off, the server will transition into RECOVER- 4210 DONE state. This is to allow any IP addresses that were allocated by 4211 this server prior to loss of its client binding information in stable 4212 storage to contact the other server or to time out. 4214 See Figure 9.5.2-1. 4216 DISCUSSION: 4218 The actual requirement on this wait period in RECOVER is that it 4219 start not before the recovering server went down, not necessarily 4220 when it came back up. If the time when the recovering server 4221 failed is known, it could be communicated to the recovering server 4222 (perhaps through actions of the network administrator), and the 4223 wait period could be reduced to the maximum-client-lead-time less 4224 the difference between the current time and the time the server 4225 failed. In this way, the waiting period could be minimized. 4226 Various heuristics could be used to estimate this time, for exam- 4227 ple if the recovering server periodically updates stable storage 4228 with a time stamp, the wait period could be calculated to start at 4229 the time of the last update of stable storage plus the time 4230 required for the next update (which never occurred). This esti- 4231 mate is later than the server went down, but probably not too much 4232 later. 4234 If an UPDDONE message isn't received within an implementation depen- 4235 dent amount of time, and no BNDUPD messages are being received, the 4236 connection SHOULD be dropped. 4238 A B 4239 Server Server 4241 | | 4242 RECOVER PARTNER-DOWN 4243 | | 4244 | >--UPDREQ--------------------> | 4245 | | 4246 | <---------------------BNDUPD--< | 4247 | >--BNDACK--------------------> | 4248 ... ... 4249 | | 4250 | <---------------------BNDUPD--< | 4251 | >--BNDACK--------------------> | 4252 | | 4253 | <--------------------UPDDONE--< | 4254 | | 4255 Wait MCLT from last known | 4256 time of operation | 4257 | | 4258 RECOVER-DONE | 4259 | | 4260 | >--STATE-(RECOVER-DONE)------> | 4261 | NORMAL 4262 | <-------------(NORMAL)-STATE--< | 4263 NORMAL | 4264 | >---- State-(NORMAL)---------------> 4265 | | 4266 | | 4268 Figure 9.5.2-1: Transition out of RECOVER state 4270 If, at any time while a server is in RECOVER state communications fails, 4271 the server will stay in RECOVER state. When communications are 4272 restored, it will restart the process of transitioning out of RECOVER 4273 state. 4275 9.6. NORMAL state 4277 NORMAL state is the state used by a server when it is communicating 4278 with the other server, and any required resynchronization has been 4279 performed. While some bindings database synchronization is performed 4280 in NORMAL state, potential conflicts are resolved prior to entry into 4281 NORMAL state as is binding database data loss. 4283 9.6.1. Upon entry to NORMAL state 4285 When entering NORMAL state, a server will send to the other server 4286 all currently unacknowledged binding updates as BNDUPD messages. 4288 When the above process is complete, if the server entering NORMAL 4289 state is a secondary server, then it will request IP addresses for 4290 allocation using the POOLREQ message. 4292 9.6.2. Processing DHCP client requests and load balancing 4294 In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or 4295 DHCPREQUEST/REBINDING request it receives. And, it processes other 4296 requests only for those clients as dictated by the load balancing 4297 algorithm specified in [LOADB]. 4299 As discussed in section 5.3, each server will take the client- 4300 identifier from each DHCP client request (or the client-hardware- 4301 address, i.e., the htype concatenated to the front of the chaddr if 4302 no client-identifier is present in the request) and use it as the 4303 'Request ID' specified in [LOADB]. After applying the algorithm 4304 specified in [LOADB] and comparing the result with the hash bucket 4305 assignment (performed during connect processing between failover 4306 servers), each failover server will be able to unambiguously deter- 4307 mine if it should process the DHCP client request. 4309 9.6.3. Operation in NORMAL state 4311 When in NORMAL state, for every DHCP client request that it 4312 processes, as determined by the algorithm described in section 9.6.2, 4313 above, a server will operate in the following manner: 4315 o Lease time calculations 4317 As discussed in section 5.2.1, "Control of lease time", the 4318 lease interval given to a DHCP client can never be more than the 4319 MCLT greater than the most recently received potential- 4320 expiration-time from the failover partner or the current time, 4321 whichever is later. 4323 As long as a server adheres to this constraint, the specifics of 4324 the lease interval that it gives to a DHCP client or the value 4325 of the potential-expiration-time sent to its failover partner 4326 are implementation dependent. One possible approach is dis- 4327 cussed in section 5.2.1, but that particular approach is in no 4328 way required by this protocol. 4330 See section 7.1.5 for details concerning the storage of time 4331 associated IP addresses and how to use these times when calcu- 4332 lating lease times for DHCP clients. 4334 o Lazy update of partner server 4336 After an ACK of a IP address binding, the server servicing a 4337 DHCP client request attempts to update its partner with the new 4338 binding information. The lease time used in the update of the 4339 secondary MUST be at least that given to the DHCP client in the 4340 DHCPACK, and the potential-expiration-time MUST be at least the 4341 lease time, and SHOULD be considerably longer. 4343 o Reallocation of IP addresses between clients 4345 Whenever a client binding is released or expires, a BNDUPD mes- 4346 sage must be sent to partner, setting the binding state to 4347 RELEASED or EXPIRED. However, until a BNDACK is received for 4348 this message, the IP address cannot be allocated to another 4349 client. It can be allocated to the same client again. 4351 In normal state, each server receives binding updates from its 4352 partner server in BNDUPD messages. It records these in its client 4353 binding database in stable storage and then sends a corresponding 4354 BNDACK message to the primary server. It MUST ensure that the infor- 4355 mation is recorded in stable storage prior to sending the BNDACK mes- 4356 sage back to its partner. 4358 9.6.4. Transitions out of NORMAL state 4360 If an external command is received by a server in NORMAL state 4361 informing it that its partner is down, then transition into PARTNER- 4362 DOWN state. Generally, this would be an unusual situation, where 4363 some external agency knew the partner server was down. Using the 4364 command in this case would be appropriate if the polling interval and 4365 timeout were long. 4367 If a server in NORMAL state fails to receive acks to messages sent to 4368 its partner for an implementation dependent period of time, it MAY 4369 move into COMMUNICATIONS-INTERRUPTED state. This situation might 4370 occur if the partner server was capable of maintaining the TCP con- 4371 nection between the server and also capable of sending a CONTACT mes- 4372 sage every tSend seconds, but was (for some reason) incapable of pro- 4373 cessing BNDUPD messages. 4375 If the communications is determined to not be "ok" (as defined in 4376 section 8), then transition into COMMUNICATIONS-INTERRUPTED state. 4378 If a server in NORMAL state receives any messages from its partner 4379 where the partner has changed state from that expected by the server 4380 in NORMAL state, then the server should transition into 4381 COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- 4382 sition from there. For example, it would be expected for the partner 4383 to transition from POTENTIAL-CONFLICT into NORMAL state, but not for 4384 the partner to transition from NORMAL into POTENTIAL-CONFLICT state. 4386 If a server in NORMAL state receives any messages from its partner 4387 where the PARTNER has changed into PAUSED state, the server should 4388 transition into COMMUNICATIONS-INTERRUPTED state. If a server in 4389 NORMAL state receives any messages from its partner where the PARTNER 4390 has changed into SHUTUDOWN state, the server should transition into 4391 PARTNER-DOWN state. 4393 9.7. COMMUNICATIONS-INTERRUPTED State 4395 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 4396 unable to communicate with the other server. Primary and secondary 4397 servers cycle automatically (without administrative intervention) 4398 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 4399 connection between them fails and recovers, or as the partner server 4400 cycles between operational and non-operational. No duplicate IP 4401 address allocation can occur while the servers cycle between these 4402 states. 4404 9.7.1. Upon entry to COMMUNICATIONS-INTERRUPTED state 4406 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 4407 configured to support an automatic transition out of COMMUNICATIONS- 4408 INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period" 4409 has been configured, see section 10), then a timer MUST be started 4410 for the length of the configured safe period. 4412 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 4413 the NORMAL state SHOULD raise some alarm condition to alert adminis- 4414 trative staff to a potential problem in the DHCP subsystem. 4416 9.7.2. Operation in COMMUNICATIONS-INTERRUPTED State 4418 In this state a server MUST respond to all DHCP client requests, and 4419 the algorithm for load balancing described in section 5.3 MUST NOT be 4420 used. When allocating new IP addresses, each server allocates from 4421 its own IP address pool, where the primary MUST allocate only FREE IP 4422 addresses, and the secondary MUST allocate only BACKUP IP addresses. 4423 When responding to renewal requests, each server will allow continued 4424 renewal of a DHCP client's current lease on an IP address irrespec- 4425 tive of whether that lease was given out by the receiving server or 4426 not, although the renewal period MUST NOT exceed the maximum client 4427 lead time (MCLT) beyond the latest of: 1) the potential-expiration- 4428 time already acknowledged by the other server, or 2) the lease- 4429 expiration-time, or 3) the potential-expiration-time received from 4430 the partner server. 4432 However, since the server cannot communicate with its partner in this 4433 state, the acknowledged-potential-expiration time will not be updated 4434 in any new bindings. This is likely to eventually cause the actual- 4435 client-lease-times to be the current time plus the maximum-client- 4436 lead-time (unless this is greater than the desired-client-lease- 4437 time). 4439 9.7.3. Transition out of COMMUNICATIONS-INTERRUPTED State 4441 If the safe period timer expires while a server is in the 4442 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 4443 PARTNER-DOWN state. 4445 If an external command is received by a server in COMMUNICATIONS- 4446 INTERRUPTED state informing it that its partner is down, it will 4447 transition immediately into PARTNER-DOWN state. 4449 If communications is restored with the other server, then the server 4450 in COMMUNICATIONS-INTERRUPTED state will transition into another 4451 state based on the state of the partner: 4453 o partner in NORMAL or COMMUNICATIONS-INTERRUPTED 4454 The partner SHOULD NOT be in NORMAL state here, since upon res- 4455 toration of communications it MUST have created a new TCP con- 4456 nection which would have forced it into COMMUNICATIONS- 4457 INTERRUPTED state. Still, we should account for every state 4458 just in case. 4460 Transition into the NORMAL state. 4462 o partner in RECOVER 4464 Stay in COMMUNICATIONS-INTERRUPTED state. 4466 o partner in RECOVER-DONE 4468 Transition into NORMAL state. 4470 o partner in PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or 4471 RESOLUTION-INTERRUPTED 4473 Transition into POTENTIAL-CONFLICT state. 4475 o partner in PAUSED 4477 Stay in COMMUNICATIONS-INTERRUPTED state. 4479 o partner in SHUTDOWN 4481 Transition into PARTNER-DOWN state. 4483 The following figure illustrates the transition from NORMAL to 4484 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 4486 Primary Secondary 4487 Server Server 4489 NORMAL NORMAL 4490 | >--CONTACT-------------------> | 4491 | <--------------------CONTACT--< | 4492 | [TCP connection broken] | 4493 COMMUNICATIONS : COMMUNICATIONS 4494 INTERRUPTED : INTERRUPTED 4495 | [attempt new TCP connection] | 4496 | [connection succeeds] | 4497 | | 4498 | >--CONNECT-------------------> | 4499 | <-----------------CONNECTACK--< | 4500 | <-------------------STATE-----< | 4501 | NORMAL 4502 | >--STATE---------------------> | 4503 NORMAL | 4504 | >--BNDUPD--------------------> | 4505 | <---------------------BNDACK--< | 4506 | | 4507 | <---------------------BNDUPD--< | 4508 | >------BNDACK----------------> | 4509 ... ... 4510 | | 4511 | <--------------------POOLREQ--< | 4512 | >--POOLRESP-(2)--------------> | 4513 | | 4514 | >--BNDUPD-(#1)---------------> | 4515 | <---------------------BNDACK--< | 4516 | | 4517 | <--------------------POOLREQ--< | 4518 | >--POOLRESP-(0)--------------> | 4519 | | 4520 | >--BNDUPD-(#2)---------------> | 4521 | <---------------------BNDACK--< | 4522 | | 4524 Figure 9.7.3-1: Transition from NORMAL to COMMUNICATIONS- 4525 INTERRUPTED and back (example with 2 4526 addresses allocated to secondary) 4528 9.8. POTENTIAL-CONFLICT state 4530 This state indicates that the two servers are attempting to re- 4531 integrate with each other, but at least one of them was running in a 4532 state that did not guarantee automatic reintegration would be 4533 possible. In POTENTIAL-CONFLICT state the servers may determine that 4534 the same IP address has been offered and accepted by two different 4535 DHCP clients. 4537 It is a goal of this protocol to minimize the possibility that 4538 POTENTIAL-CONFLICT state is ever entered. 4540 9.8.1. Upon entry to POTENTIAL-CONFLICT state 4542 When a primary server enters POTENTIAL-CONFLICT state it should 4543 request that the secondary send it all updates of which it is 4544 currently unaware by sending an UPDREQ message to the secondary 4545 server. 4547 A secondary server entering POTENTIAL-CONFLICT state will wait for 4548 the primary to send it an UPDREQ message. 4550 9.8.2. Operation in POTENTIAL-CONFLICT state 4552 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 4553 DHCP requests. 4555 9.8.3. Transitions out of POTENTIAL-CONFLICT state 4557 If communications fails with the partner while in POTENTIAL-CONFLICT 4558 state, then the server will transition to RESOLUTION-INTERRUPTED 4559 state. 4561 Whenever either server receives an UPDDONE message from its partner 4562 while in POTENTIAL-CONFLICT state, it MUST transition to a new state. 4563 The primary MUST transition to CONFLICT-DONE state, and the secondary 4564 MUST transition to NORMAL state. This will cause the primary server 4565 to leave POTENTIAL-CONFLICT state prior to the secondary, since the 4566 primary sends an UPDREQ message and receives an UPDDONE before the 4567 secondary sends an UPDREQ message and receives its UPDDONE message. 4569 When a secondary server receives an indication that the primary 4570 server has transitioned from POTENTIAL-CONFLICT to CONFLICT-DONE 4571 state, it SHOULD send an UPDREQ message to the primary server. 4573 Primary Secondary 4574 Server Server 4576 | | 4577 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 4578 | | 4579 | >--UPDREQ--------------------> | 4580 | | 4581 | <---------------------BNDUPD--< | 4582 | >--BNDACK--------------------> | 4583 ... ... 4584 | | 4585 | <---------------------BNDUPD--< | 4586 | >--BNDACK--------------------> | 4587 | | 4588 | <--------------------UPDDONE--< | 4589 NORMAL | 4590 | >--STATE--(NORMAL)-----------> | 4591 | <---------------------UPDREQ--< | 4592 | | 4593 | >--BNDUPD--------------------> | 4594 | <---------------------BNDACK--< | 4595 ... ... 4596 | >--BNDUPD--------------------> | 4597 | <---------------------BNDACK--< | 4598 | | 4599 | >--UPDDONE-------------------> | 4600 | NORMAL 4601 | | 4602 | <--------------------POOLREQ--< | 4603 | >------POOLRESP-(n)----------> | 4604 | addresses | 4606 Figure 9.8.3-1: Transition out of POTENTIAL-CONFLICT 4608 9.9. RESOLUTION-INTERRUPTED state 4610 This state indicates that the two servers were attempting to re- 4611 integrate with each other in POTENTIAL-CONFLICT state, but 4612 communications failed prior to completion of re-integration. 4614 If the servers remained in POTENTIAL-CONFLICT while communications 4615 was interrupted, neither server would be responsive to DHCP client 4616 requests, and if one server had crashed, then there might be no 4617 server able to process DHCP requests. 4619 9.9.1. Upon entry to RESOLUTION-INTERRUPTED state 4621 When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an 4622 alarm condition to alert administrative staff of a problem in the 4623 DHCP subsystem. 4625 9.9.2. Operation in RESOLUTION-INTERRUPTED state 4627 In this state a server MUST respond to all DHCP client requests, and 4628 any load balancing (described in section 5.3) MUST NOT be used. When 4629 allocating new IP addresses, each server SHOULD allocate from its own 4630 IP address pool (if that can be determined), where the primary SHOULD 4631 allocate only FREE IP addresses, and the secondary SHOULD allocate 4632 only BACKUP IP addresses. When responding to renewal requests, each 4633 server will allow continued renewal of a DHCP client's current lease 4634 on an IP address irrespective of whether that lease was given out by 4635 the receiving server or not, although the renewal period MUST not 4636 exceed the maximum client lead time (MCLT) beyond the latest of: 1) 4637 the potential-expiration-time already acknowledged by the other 4638 server or 2) the lease-expiration-time or 3) `potential-expiration- 4639 time received from the partner server. 4641 However, since the server cannot communicate with its partner in this 4642 state, the acknowledged-potential-expiration time will not be updated 4643 in any new bindings. 4645 9.9.3. Transitions out of RESOLUTION-INTERRUPTED state 4647 If an external command is received by a server in RESOLUTION- 4648 INTERRUPTED state informing it that its partner is down, it will 4649 transition immediately into PARTNER-DOWN state. 4651 If communications is restored with the other server, then the server 4652 in RESOLUTION-INTERRUPTED state will transition into POTENTIAL- 4653 CONFLICT state. 4655 9.10. CONFLICT-DONE state 4657 This state indicates that during the process where the two servers 4658 are attempting to re-integrate with each other, the primary server 4659 has received all of the updates from the secondary server. It 4660 transitions into CONFLICT-DONE state in order that it may be totally 4661 responsive to the client load, as opposed to NORMAL state where it 4662 would be in a "balanced" responsive state, running the load balancing 4663 algorithm. 4665 9.10.1. Upon entry to CONFLICT-DONE state 4667 A secondary server should never enter CONFLICT-DONE state. 4669 9.10.2. Operation in CONFLICT-DONE state 4671 A primary server in CONFLICT-DONE state is fully responsive to all 4672 DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED 4673 state). 4675 If communications fails, remain in CONFLICT-DONE state. If communi- 4676 cations becomes OK, remain in CONFLICT-DONE state until the condi- 4677 tions for transition out become satistifed. 4679 9.10.3. Transitions out of CONFLICT-DONE state 4681 If communications fails with the partner while in CONFLICT-DONE 4682 state, then the server will remain in CONFLICT-DONE state. 4684 When a primary server determines that the secondary server has tran- 4685 sitioned into NORMAL state, the primary server will also transition 4686 into NORMAL state. 4688 9.11. RECOVER-DONE state 4690 This state exists to allow an interlocked transition for one server 4691 from RECOVER state and another server from PARTNER-DOWN or 4692 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 4694 9.11.1. Operation in RECOVER-DONE state 4696 A server in RECOVER-DONE state MUST respond only to 4697 DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 4699 9.11.2. Transitions out of RECOVER-DONE state 4701 When a server in RECOVER-DONE state determines that its partner 4702 server has entered NORMAL state, then it will transition into NORMAL 4703 state as well. 4705 If communications fails while in RECOVER-DONE state, a server will 4706 stay in RECOVER-DONE state. 4708 9.12. PAUSED state 4710 This state exists to allow one server to inform another that it will 4711 be out of service for what is predicted to be a relatively short 4712 time, and to allow the other server to transition to COMMUNICATIONS- 4713 INTERRUPTED state immediately and to begin servicing all DHCP clients 4714 with no interruption in service to new DHCP clients. 4716 A server which is aware that it is shutting down temporarily SHOULD 4717 send a STATE message with the server-state option containing PAUSED 4718 state and close the TCP connection. 4720 While a server may or may not transition internally into PAUSED 4721 state, the 'previous' state determined when it is restarted MUST be 4722 the state the server was in prior to receiving the command to shut- 4723 down and restart and which precedes its entry into the PAUSED state. 4724 See section 9.3.2 concerning the use of the previous state upon 4725 server restart. 4727 9.12.1. Upon entry to PAUSED state 4729 When entering PAUSED state, the server MUST store the previous state 4730 in stable storage, and use that state as the previous state when it 4731 is restarted. 4733 9.12.2. Transitions out of PAUSED state 4735 A server transitions out of PAUSED state by being restarted. At that 4736 time, the previous state MUST be the state the server was in prior to 4737 entering the PAUSED state. 4739 9.13. SHUTDOWN state 4741 This state exists to allow one server to inform another that it will 4742 be out of service for what is predicted to be a relatively long time, 4743 and to allow the other server to transition immediately to PARTNER- 4744 DOWN state, and take over completely for the server going down. 4746 A server which is aware that it is shutting down SHOULD send a STATE 4747 message with the server-state field containing SHUTDOWN. 4749 While a server may or may not transition internally into SHUTDOWN 4750 state, the 'previous' state determined when it is restarted MUST be 4751 the state active prior to the command to shutdown. See section 9.3.2 4752 concerning the use of the previous state upon server restart. 4754 9.13.1. Upon entry to SHUTDOWN state 4756 When entering SHUTDOWN state, the server MUST record the previous 4757 state in stable storage for use when the server is restarted. It 4758 also MUST record the current time as the last time operational. 4760 A server which is aware that it is shutting down SHOULD send a STATE 4761 message with the server-state field containing SHUTDOWN. 4763 9.13.2. Operation in SHUTDOWN state 4765 A server in SHUTDOWN state MUST NOT respond to any DHCP client input. 4767 If a server receives any message indicating that the partner has 4768 moved to PARTNER-DOWN state while it is in SHUTDOWN state then it 4769 MUST record RECOVER state as the previous state to be used when it is 4770 restarted. 4772 A server SHOULD wait for a few seconds after informing the partner of 4773 entry into SHUTDOWN state (if communications are okay) to determine 4774 if the partner entered PARTNER-DOWN state. 4776 9.13.3. Transitions out of SHUTDOWN state 4778 A server transitions out of SHUTDOWN state by being restarted. 4780 10. Safe Period 4782 Due to the restrictions imposed on each server while in 4783 COMMUNICATIONS-INTERRUPTED state, long-term operation in this state 4784 is not feasible for either server. One reason that these states 4785 exist at all, is to allow the servers to easily survive transient 4786 network communications failures of a few minutes to a few days 4787 (although the actual time periods will depend a great deal on the 4788 DHCP activity of the network in terms of arrival and departure of 4789 DHCP clients on the network). 4791 Eventually, when the servers are unable to communicate, they will 4792 have to move into a state where they no longer can re-integrate 4793 without some possibility of a duplicate IP address allocation. There 4794 are two ways that they can move into this state (known as PARTNER- 4795 DOWN). 4797 They can either be informed by external command that, indeed, the 4798 partner server is down. In this case, there is no difficulty in mov- 4799 ing into the PARTNER-DOWN state since it is an accurate reflection of 4800 reality and the protocol has been designed to operate correctly (even 4801 during reintegration) as long as, when in PARTNER-DOWN state the 4802 partner is, indeed, down. 4804 The more difficult scenario is when the servers are running 4805 unattended for extended periods, and in this case an option is pro- 4806 vided to configure something called a "safe-period" into each server. 4807 This OPTIONAL safe-period is the period after which either the pri- 4808 mary or secondary server will automatically transition to PARTNER- 4809 DOWN from COMMUNICATIONS-INTERRUPTED state. If this transition is 4810 completed and the partner is not down, then the possibility of dupli- 4811 cate IP address allocations will exist. 4813 The goal of the "safe-period" is to allow network operations staff 4814 some time to react to a server moving into COMMUNICATIONS-INTERRUPTED 4815 state. During the safe-period the only requirement is that the net- 4816 work operations staff determine if both servers are still running -- 4817 and if they are, to either fix the network communications failure 4818 between them, or to take one of the servers down before the expira- 4819 tion of the safe-period. 4821 The length of the safe-period is installation dependent, and depends 4822 in large part on the number of unallocated IP addresses within the 4823 subnet address pool and the expected frequency of arrival of previ- 4824 ously unknown DHCP clients requiring IP addresses. Many environments 4825 should be able to support safe-periods of several days. 4827 During this safe period, either server will allow renewals from any 4828 existing client. The only limitation concerns the need for IP 4829 addresses for the DHCP server to hand out to new DHCP clients and the 4830 need to re-allocate IP addresses to different DHCP clients. 4832 The number of "extra" IP addresses required is equal to the expected 4833 total number of new DHCP clients encountered during the safe period. 4834 This is dependent only on the arrival rate of new DHCP clients, not 4835 the total number of outstanding leases on IP addresses. 4837 In the unlikely event that a relatively short safe period of an hour 4838 is all that can be used (given a dearth of IP addresses or a very 4839 high arrival rate of new DHCP clients), even that can provide sub- 4840 stantial benefits in allowing the DHCP subsystem to ride through 4841 minor problems that could occur and be fixed within that hour. In 4842 these cases, no possibility of duplicate IP address allocation 4843 exists, and re-integration after the failure is solved will be 4844 automatic and require no operator intervention. 4846 11. Security 4848 The Failover protocol communicates DHCP lease activity and this data 4849 is generally easily discovered via other means, such as by pinging 4850 addresses and doing DNS lookups. Therefore, the need to encrypt the 4851 data over the wire is likely not great (though some sites may feel 4852 differently). 4854 However, it is very desirable to assure the integrity of failover 4855 partners and to thus ensure proper operation of the servers. For 4856 example, denial of service attacks are possible by the communication 4857 of invalid state information to one or both servers. 4859 Therefore, the Failover protocol MUST be capable of being secured by 4860 using a simple shared secret message digest which covers each mes- 4861 sage. This provides authentication of the servers, but does not pro- 4862 vide encryption of the data exchange. 4864 The Failover protocol MAY also be secured by using TLS [RFC 2246] 4865 (Transport Layer Security) if encryption of the data exchange is 4866 desired. The use of the shared secret or TLS will not protect 4867 against TCP or IP layer attacks (such as someone sending fake TCP RST 4868 segments). IPsec SHOULD be used to protect against most (if not all) 4869 of these kinds of attacks. 4871 11.1. Simple shared secret 4873 Messages between the failover partners are authenticated through the 4874 use of a shared secret, which is never sent over the network and must 4875 be known by each server. How each server is told about this shared 4876 secret and secures its storage of the shared secret is outside the 4877 scope of this document. If a server is configured with a shared 4878 secret for a partner, it MUST send the message-digest option in ALL 4879 messages to that partner and it MUST treat any messages received from 4880 that partner without a message-digest option as failing authentica- 4881 tion and reject them with reject reason 21: "Missing message digest". 4883 If a server is not configured with a shared secret for a partner, it 4884 MUST NOT send the message-digest option in any message to that 4885 partner and it MUST treat any messages received from that partner 4886 with a message-digest option as failing authentication with reject 4887 reason 13: "Message digest not configured". 4889 The shared secret is used to calculate a 16 octet message-digest 4890 which is sent in every failover message as the message-digest option. 4891 See section 12.17. The message-digest contains a one-way 16 octet MD5 4892 [RFC 1321] hash calculated over a stream of octets consisting of the 4893 entire message concatenated with the shared secret. 4895 For calculation, the message includes the message-digest option with 4896 the message-digest data zeroed (16-octets of zero). Once the calcula- 4897 tion is complete, these 16 octets of zero are replaced by the 16- 4898 octet MD5 hash and the message is sent. 4900 For verification, the 16-octet message-digest is saved and replaced 4901 with 16-octets of zero and calculated per above. The resulting MD5 4902 hash is compared to the received hash and if they match, the message 4903 is assumed authenticated. 4905 A failover partner that fails to authenticate a received message or 4906 receives a message without a message-digest option when configured 4907 with a shared secret MUST close the connection immediately and take 4908 steps to notify operators. 4910 This use of the shared secret is very similar to that used for RADIUS 4911 Accounting [RFC 2139]. 4913 11.2. TLS 4915 TLS, Transport Layer Security, as specified in [RFC 2246] MAY be 4916 used. The use of TLS would be similar to the way it is used with 4917 SMTP [RFC 2487] and IMAP/POP3/ACAP [RFC 2595]. 4919 To request the use of TLS, the server that successfully opened a con- 4920 nection to its peer MUST send the TLS option as part of the CONNECT 4921 message. The server receiving the TLS option MUST respond with a 4922 TLS-reply option indicating its acceptance or rejection of the TLS- 4923 request in the CONNECT message. 4925 If the CONNECTACK message contained a TLS-reply of 1 , then both 4926 servers begin TLS negotiation. 4928 Upon completion of this negotiation, the server which originally sent 4929 the CONNECT message MUST resend its CONNECT message without any TLS- 4930 request, and must wait for a corresponding CONNECTACK. 4932 Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [RFC 2246] 4933 cipher suite is REQUIRED in Failover servers supporting TLS. This is 4934 important as it assures that any two compliant implementations can be 4935 configured to interoperate. 4937 12. Failover Options 4939 This section lists all of the options that are currently defined to 4940 be used with the failover protocol. See section 6.2 for details con- 4941 cerning time values. 4943 12.1. addresses-transferred 4945 A 32 bit unsigned long in network byte order. Reports the number of 4946 addresses transferred by the primary to the secondary server 4947 (addresses to be used for the secondary server's private address 4948 pool). 4950 Code Len Number of Addresses 4951 +-----+-----+-----+-----+----+-----+-----+-----+ 4952 | 0 | 1 | 0 | 4 | n1 | n2 | n3 | n4 | 4953 +-----+-----+-----+-----+----+-----+-----+-----+ 4955 12.2. assigned-IP-address 4957 The DHCP managed IP address to which this message refers. 4959 Code Len Address 4960 +-----+-----+-----+-----+----+-----+-----+-----+ 4961 | 0 | 2 | 0 | 4 | a1 | a2 | a3 | a4 | 4962 +-----+-----+-----+-----+----+-----+-----+-----+ 4964 12.3. binding-status 4966 This option is used to convey the current state of a binding. 4968 Code Len Type 4969 +-----+-----+-----+-----+-----+ 4970 | 0 | 3 | 0 | 1 | 1-7 | 4971 +-----+-----+-----+-----+-----+ 4973 Legal values for this option are: 4975 Value Binding Status 4976 ----- ------------------------------------------------ 4977 1 FREE Lease is currently available 4978 2 ACTIVE Lease is assigned to a client 4979 3 EXPIRED Lease has expired 4980 4 RELEASED Lease has been released by client 4981 5 ABANDONED A server, or client flagged address as unusable 4982 6 RESET Lease was freed by some external agent 4983 7 BACKUP Lease belongs to secondary's private address pool 4985 12.4. client-identifier 4987 This is the client-identifier for the client associated with a 4988 binding. The client-identifier data is subject to the same 4989 conventions as DHCP option 81 [RFC 2132]. 4991 Code Len Client Identifier 4992 +-----+-----+-----+-----+----+-----+--- 4993 | 0 | 4 | 0 | n | i1 | i2 | ... 4994 +-----+-----+-----+-----+----+-----+-- 4996 12.5. client-hardware-address 4998 This is the hardware address for the client associated with a 4999 binding. Byte t1 (type) MUST be set to the proper ARP hardware 5000 address code, as defined in the ARP section of RFC 1700 (it MUST NOT 5001 be zero!) 5003 Code Len htype chaddr 5004 +-----+-----+-----+-----+----+-----+-----+--- 5005 | 0 | 5 | 0 | n | t1 | c1 | c2 | ... 5006 +-----+-----+-----+-----+----+-----+-----+--- 5008 12.6. client-last-transaction-time 5010 The time at which this server last received a DHCP request from a 5011 particular client expressed as an absolute time (see section 6.2). 5013 Code Len client last transaction time 5014 +-----+-----+-----+-----+----+-----+-----+-----+ 5015 | 0 | 6 | 0 | 4 | t1 | t2 | t3 | t4 | 5016 +-----+-----+-----+-----+----+-----+-----+-----+ 5018 12.7. client-reply-options 5020 This option contains options from a DHCP server's reply to a DHCP 5021 client request. It is sent in a BNDUPD message. The first 4 bytes 5022 of the option contain the "magic number" of the option area from 5023 which the DHCP reply options were taken and serves to define the 5024 format of the rest of the sub-options contained in this option. 5025 After the magic number, the options included are in the normal 5026 options format appropriate for that magic number. 5028 A server SHOULD NOT include all of the options in a DHCP server's 5029 reply to a client's request in this option, but rather a server 5030 SHOULD include only those options which are of likely interest to its 5031 partner server. See section 7.1 for details. 5033 Code Len Magic Number Embedded options 5034 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5035 | 0 | 7 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 5036 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5038 12.8. client-request-options 5040 This option contains options from a DHCP client's request. It is 5041 sent in a BNDUPD message. The first 4 bytes of the option contain 5042 the "magic number" of the option area from which the DHCP client's 5043 request options were taken and serves to define the format of the 5044 rest of the sub-options contained in this option. After the magic 5045 number, the options included are in the normal options format 5046 appropriate for that magic number. 5048 A server SHOULD NOT include all of the options in a DHCP client 5049 request in this option, but rather a server SHOULD include only those 5050 options which are of likely interest to its partner server. See 5051 section 7.1 for details. 5053 Code Len Magic Number Embedded options 5054 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5055 | 0 | 8 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 5056 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5058 12.9. DDNS 5060 If an implementation supports Dynamic DNS updates, this option is 5061 used to communicate the status of the DDNS update associated with a 5062 particular lease binding. The Flags field conveys the types of DNS 5063 RRs that are to be updated by the DHCP server, and the status of the 5064 DDNS update. The Domain Name field conveys the DNS FQDN that the 5065 DHCP server is using to refer to the client, in DNS encoding as 5066 specified in [RFC 1035]. 5068 Code Len Flags Domain Name 5069 +-----+-----+-----+-----+-----+------+------+-----+------ 5070 | 0 | 9 | 0 | n | flags | d1 | d2 | ... 5071 +-----+-----+-----+-----+-----+------+------+-----+------ 5073 The Flags field is a 16-bit field; several bit positions are 5074 specified here. 5076 1 1 1 1 1 1 5077 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 5078 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5079 |C|A|D|P| MBZ | 5080 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5082 The bits (numbered from the least-significant bit in network 5083 byte-order) are used as follows: 5085 0 (C): A RR update successfully completed 5086 1 (A): Server is controlling A RR on behalf of the client 5087 2 (D): PTR RR update successfully completed (Done) 5088 3 (P): Server is controlling PTR RR on behalf of the client 5089 4-15 : Must be zero 5091 All of the unspecified bit positions SHOULD be set to 0 by servers 5092 sending the Failover-DDNS option, and they MUST be ignored by servers 5093 receiving the option. 5095 12.10. delayed-service-parameter 5097 The delayed-service-parameter is an optional load balancing tuning 5098 parameter, defined in [LOADB]. If it is used, it MUST be sent in the 5099 same message as the hash-bucket-assignment option (see section 5100 12.11). Format : 5102 Code Len Seconds 5103 +-----+-----+-----+-----+----+ 5104 | 0 | 10 | 0 | 1 | S | 5105 +-----+-----+-----+-----+----+ 5107 S is a one byte value, 1..255. 5109 12.11. hash-bucket-assignment 5111 A set of load balancing hash values for the secondary server. A one 5112 bit in the hash buckets indicates that the secondary is to service 5113 that set of clients. See section 5.3 for more information on how 5114 this option is used. This option is only sent from the primary to 5115 the secondary. 5117 The format and usage of the data in this option is defined in 5118 [LOADB]. 5120 Code Len Hash Buckets 5121 +-----+-----+-----+-----+-----+-----+-----+-----+ 5122 | 0 | 11 | 0 | 32 | b1 | b2 | ... | b32 | 5123 +-----+-----+-----+-----+-----+-----+-----+-----+ 5125 12.12. IP-flags 5127 This option is used to convey the current flags of the assigned-IP- 5128 address option preceding it. 5130 Code Len IP Flags 5131 +-----+-----+-----+-----+-----+-----+ 5132 | 0 | 12 | 0 | 1 | f1 | f2 | 5133 +-----+-----+-----+-----+-----+-----+ 5135 The IP-flags field is a 16-bit field; two bit positions are 5136 specified here. 5138 1 1 1 1 1 1 5139 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 5140 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5141 |R|B| MBZ | 5142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5144 The bits (numbered from the least-significant bit in network 5145 byte-order) are used as follows: 5147 0 (R): RESERVED 5148 Bit 0 MUST be set to 1 whenever the IP address in the preceding 5149 assigned-IP-address option is reserved on the server sending the 5150 packet. 5151 1 (B): BOOTP 5152 Bit 1 MUST be set to 1 whenever the IP address in the preceding 5153 assigned-IP-address option is a an IP address which has been 5154 allocated due to an interaction with a BOOTP client (as opposed 5155 to a DHCP client). 5156 2-15 : Must be zero 5158 12.13. lease-expiration-time 5160 The lease expiration time is the lease interval that a DHCP server 5161 has ACKed to a DHCP client added to the time at which that ACK was 5162 transmitted -- expressed as an absolute time (see section 6.2). 5164 Code Len Time 5165 +-----+-----+-----+-----+----+-----+-----+-----+ 5166 | 0 | 13 | 0 | 4 | t1 | t2 | t3 | t4 | 5167 +-----+-----+-----+-----+----+-----+-----+-----+ 5169 12.14. max-unacked-bndupd 5171 The maximum number of BNDUPD message that this server is prepared to 5172 accept over the TCP connection without causing the TCP connection to 5173 block. A 32 bit unsigned integer value, in network byte order. 5175 Code Len Maximum Unacked BNDUPD 5176 +-----+-----+-----+-----+----+-----+-----+-----+ 5177 | 0 | 14 | 0 | 4 | n1 | n2 | n3 | n4 | 5178 +-----+-----+-----+-----+----+-----+-----+-----+ 5180 12.15. MCLT 5182 Maximum Client Lead Time, an interval, in seconds. A 32 bit unsigned 5183 integer value, in network byte order. 5185 Code Len Time 5186 +-----+-----+-----+-----+----+-----+-----+-----+ 5187 | 0 | 15 | 0 | 4 | t1 | t2 | t3 | t4 | 5188 +-----+-----+-----+-----+----+-----+-----+-----+ 5190 12.16. message 5192 This option is used to supply a human readable message text. It may 5193 be used in association with the Reject Reason Code to provide a human 5194 readable error message for the reject. 5196 Code Len Text 5197 +-----+-----+-----+-----+------+-----+-- 5198 | 0 | 16 | 0 | n | c1 | c2 | ... 5199 +-----+-----+-----+-----+------+-----+-- 5201 12.17. message-digest 5203 The message digest for this message. 5205 This option consists of a variable number of bytes which contain the 5206 message digest of the message prior to the inclusion of this option. 5208 When this option appears in a message, it MUST appear as the last 5209 option in the message. It MUST appear in every message if message 5210 digests are required. 5212 Code Len Message Digest 5213 +-----+-----+-----+-----+----+-----+----- 5214 | 0 | 17 | 0 | n | d1 | d2 | ... 5215 +-----+-----+-----+-----+----+-----+----- 5217 12.18. potential-expiration-time 5219 The potential expiration time is the time that one server tells 5220 another server that it may wish to grant in a lease to a DHCP client. 5221 It is an absolute time. See section 6.2. 5223 Code Len Time 5224 +-----+-----+-----+-----+----+-----+-----+-----+ 5225 | 0 | 18 | 0 | 4 | t1 | t2 | t3 | t4 | 5226 +-----+-----+-----+-----+----+-----+-----+-----+ 5228 12.19. receive-timer 5230 The number of seconds (an interval) within which the server must 5231 receive a message from its partner, or it will assume that 5232 communications with the partner is not ok. An unsigned 32 bit 5233 integer in network byte order. 5235 Code Len Receive Timer 5236 +-----+-----+-----+-----+----+-----+-----+-----+ 5237 | 0 | 19 | 0 | 4 | s1 | s2 | s3 | s4 | 5238 +-----+-----+-----+-----+----+-----+-----+-----+ 5240 12.20. protocol-version 5242 The protocol version being used by the server. It is only sent in the 5243 CONNECT and CONNECTACK messages. The current value for the version 5244 is 1. 5246 Code Len Version 5247 +-----+-----+-----+-----+-----+ 5248 | 0 | 20 | 0 | 1 | 1 | 5249 +-----+-----+-----+-----+-----+ 5251 12.21. reject-reason 5253 This option is used to selectively reject binding updates. It MAY be 5254 used in a BNDACK message or a CONNECTACK message, always associated 5255 with an assigned-IP-address option, which contains the IP address of 5256 the update being rejected. 5258 Code Len Reason Code 5259 +-----+-----+-----+-----+-----+ 5260 | 0 | 21 | 0 | 1 | R1 | 5261 +-----+-----+-----+-----+-----+ 5263 Reason codes (section where referenced in parentheses): 5265 0 Reserved 5266 1 Illegal IP address (not part of any address pool). (7.1.3) 5267 2 Fatal conflict exists: address in use by other client. (7.1.3) 5268 3 Missing binding information. (7.1.3) 5269 4 Connection rejected, time mismatch too great. (7.8.2) 5270 5 Connection rejected, invalid MCLT. (7.8.2) 5271 6 Connection rejected, unknown reason. (not specifically referenced) 5272 7 Connection rejected, duplicate connection. (unused) 5273 8 Connection rejected, invalid failover partner. (7.8.2) 5274 9 TLS not supported. (7.8.2) 5275 10 TLS supported but not configured. (7.8.2) 5276 11 TLS required but not supported by partner. (7.8.2) 5277 12 Message digest not supported. (11.1) 5278 13 Message digest not configured. (11.1) 5279 14 Protocol version mismatch. (7.8.2) 5280 15 Outdated binding information. (7.1.3) 5281 16 Less critical binding information. (7.1.3) 5282 17 No traffic within sufficient time. (8.6) 5283 18 Hash bucket assignment conflict. (7.8.2) 5284 19 IP not reserved on this server. (7.1.3) 5285 20 Message digest failed to compare. (7.8.2) 5286 21 Missing message digest. (7.1.3) 5287 22-253, reserved. 5288 254 Unknown: Error occurred but does not match any reason code. 5289 255 Reserved for code expansion. 5291 12.22. sending-server-IP-address 5293 The IP address of the server sending this message. This option is 5294 required for all messages if the message digest option used. 5296 Code Len Address 5297 +-----+-----+-----+-----+----+-----+-----+-----+ 5298 | 0 | 22 | 0 | 4 | a1 | a2 | a3 | a4 | 5299 +-----+-----+-----+-----+----+-----+-----+-----+ 5301 12.23. server-flags 5303 This option is used to convey the current flags of the failover 5304 endpoint in the sending server. 5306 Code Len Server Flags 5307 +-----+-----+-----+-----+-------+ 5308 | 0 | 23 | 0 | 1 | flags | 5309 +-----+-----+-----+-----+-------+ 5311 The flags field is an 8-bit field; one bit position is 5312 specified here. 5314 0 1 2 3 4 5 6 7 5315 +-+-+-+-+-+-+-+-+ 5316 |S| MBZ | 5317 +-+-+-+-+-+-+-+-+ 5319 The bits (numbered from the least-significant bit in network 5320 byte-order) are used as follows: 5322 0 (S): STARTUP, 5323 Bit 0 MUST be set to 1 whenever the server is in STARTUP state, 5324 and set to 0 otherwise. (Note that when in STARTUP state, the 5325 state transmitted in the server-state option is usually the last 5326 recorded state from stable storage, but see section 9.3 for 5327 details.) 5328 1-7 : Must be zero 5330 12.24. server-state 5332 This option is used to convey the current state of the failover 5333 endpoint in the sending server. 5335 Code Len Server State 5336 +-----+-----+-----+-----+-----+ 5337 | 0 | 24 | 0 | 1 | 1-9 | 5338 +-----+-----+-----+-----+-----+ 5340 Legal values for this option are: 5342 Value Server State 5343 ----- ------------------------------------------------------------- 5344 0 reserved 5345 1 STARTUP Startup state (1) 5346 2 NORMAL Normal state 5347 3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe) 5348 4 PARTNER-DOWN Partner down (unsafe mode) 5349 5 POTENTIAL-CONFLICT Synchronizing 5350 6 RECOVER Recovering bindings from partner 5351 7 PAUSED Shutting down for a short period. 5352 8 SHUTDOWN Shutting down for an extended 5353 period. 5354 9 RECOVER-DONE Interlock state prior to NORMAL 5355 10 RESOLUTION-INTERRUPTED Comm. failed during resolution 5356 11 CONFLICT-DONE Primary has resolved its conflicts 5358 (1) The STARTUP state is never sent to the partner server, it is 5359 indicated by the STARTUP bit in the server-flags options (see section 5360 12.22). 5362 12.25. start-time-of-state 5364 This option is used for different states in different messages. In a 5365 BNDUPD message it represents the start time of the state of the lease 5366 in the BNDUPD message. In a STATE message, it represents the start 5367 time of the partner server's failover state. In all cases it is an 5368 absolute time. 5370 Code Len Start Time of State 5371 +-----+-----+-----+-----+----+-----+-----+-----+ 5372 | 0 | 25 | 0 | 4 | t1 | t2 | t3 | t4 | 5373 +-----+-----+-----+-----+----+-----+-----+-----+ 5375 12.26. TLS-reply 5377 This option contains information relating to TLS security 5378 negotiation. It is sent in a CONNECTACK message 5380 A t1 value of 0 indicates no TLS operation, a value of 1 indicates 5381 that TLS operation is required. 5383 Code Len TLS 5384 +-----+-----+-----+-----+-----+ 5385 | 0 | 26 | 0 | 1 | t1 | 5386 +-----+-----+-----+-----+-----+ 5388 12.27. TLS-request 5390 This option contains information relating to TLS security 5391 negotiation. It is sent in a CONNECT message. 5393 The t1 byte is the TLS request from this server. A value of 0 5394 indicates no TLS operation (to communicate the other server MUST NOT 5395 require TLS), a value of 1 indicates that TLS operation is desired 5396 but not required (to communicate, the other server MAY utilize TLS), 5397 and a value of 2 indicates that TLS operation is required (to 5398 communicate the other server MUST utilize TLS) to establish 5399 communications with this server. 5401 Code Len TLS 5402 +-----+-----+-----+-----+-----+ 5403 | 0 | 27 | 0 | 1 | t1 | 5404 +-----+-----+-----+-----+-----+ 5406 12.28. vendor-class-identifier 5408 A string which identifies the vendor of the failover protocol 5409 implementation. 5411 Code Len vendor class string 5412 +-----+-----+-----+-----+----+-----+--- 5413 | 0 | 28 | 0 | n | c1 | c2 | ... 5414 +-----+-----+-----+-----+----+-----+--- 5416 12.29. vendor-specific-options 5418 This option is used to convey options specific to a particular 5419 vendor's implementation. The vendor class identifier is used to 5420 specify which option space the embedded options are drawn from. 5422 It functions similarly to the vendor class identifier and vendor 5423 specific options in the DHCP protocol. 5425 This option contains other options in the same two byte code, two 5426 byte length format. If this option appears in a message without a 5427 corresponding vendor class identifier, it MUST be ignored. 5429 Code Len Embedded options 5430 +-----+-----+-----+-----+----+-----+--- 5431 | 0 | 29 | 0 | n | c1 | c2 | ... 5432 +-----+-----+-----+-----+----+-----+--- 5434 13. IANA Considerations 5436 This document defines several number spaces (failover options, fail- 5437 over message types, and failover reject reason codes). For all of 5438 these number spaces, certain values are defined in this specifica- 5439 tion. New values may only be defined by IETF Consensus, as described 5440 in [RFC 2434]. Basically, this means that they are defined by RFCs 5441 approved by the IESG. 5443 14. Acknowledgments 5445 Ralph Droms started it all, by sketching out an initial interserver 5446 draft that embodied ideas from several past IETF meetings. In that 5447 draft, he acknowledged contributions by Jeff Mogul, Greg Minshall, 5448 Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group. 5450 Kim Kinnear and Bob Cole each extended that draft, separately and 5451 then together, until they created an interserver draft that supported 5452 any number of servers. The complexity of that approach was just too 5453 great, and that draft wasn't greeted with enthusiasm by many, includ- 5454 ing its authors. 5456 It did however lead to a much simpler approach embodied in the first 5457 Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph 5458 Droms. This draft posited only two servers -- a primary and a 5459 secondary. 5461 Kim Kinnear then wrote the Safe Failover draft to layer on top of the 5462 Failover Draft and increase its robustness in the face of certain 5463 rare network failures. 5465 At the spring 1998 IETF meeting in LA, the DHC working group said 5466 that they wanted a merged Failover and Safe Failover draft. Steve 5467 Gonczi and Bernie Volz stepped up and produced the raw material for 5468 such a merged draft, along with a new message format designed around 5469 DHCP options and other extensions and clarifications. Kim Kinnear 5470 edited their work into draft format and made other changes in time 5471 for the Summer Chicago IETF meeting. 5473 Many people have reviewed the various earlier drafts that went into 5474 this result. At American Internet, ideas were contributed by Brad 5475 Parker. At Cisco Systems Paul Fox and Ellen Garvey contributed to 5476 the design of the protocol. 5478 During the summer and fall of 1998, two groups worked on separate 5479 implementations of the UDP failover draft. Bernie Volz and Steve 5480 Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul 5481 Fox made up the other. These two groups worked together to produce 5482 considerable changes and simplifications of the protocol during that 5483 period, and Steve Gonczi and Kim Kinnear edited those changes into 5484 -03 draft in time for submission to the December 1998 Orlando IETF 5485 meeting. 5487 In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting of 5488 people interested in the failover draft. During that meeting a gen- 5489 eral agreement was reached to recast the failover protocol to use TCP 5490 instead of UDP. In addition, the group together brainstormed a work- 5491 able load-balancing technique. Kim Kinnear rewrote the entire draft 5492 to include the changes made at that meeting as well as to restructure 5493 the draft along guidelines suggested by Thomas Narten. The result 5494 was the -04 draft, submitted prior to the Oslo IETF meeting. 5496 The initial idea for a hash-based load balancing approach was offered 5497 by Ted Lemon, and the determination of an algorithm and its integra- 5498 tion into the draft was done by Steve Gonczi. The security section 5499 was spearheaded by Bernie Volz. Both contributed considerably to the 5500 ideas and text in the rest of the draft with several reviews. 5502 In early October of 1999, three conference calls were held to discuss 5503 the -04 draft. The -05 includes changes as a result of those calls, 5504 perhaps the largest of which was to remove the load balancing 5505 approach into a separate draft. Thanks to all of the many people 5506 who participated in the conference calls. Changes were made because 5507 of contributions by: Ted Lemon, David Erdmann, Richard Jones, Rob 5508 Stevens, Thomas Narten, Diana Lane, and Andre Kostur. 5510 Another conference call was held in mid-January of 2000, and the -06 5511 draft was produced to tighten up the the -05 draft both technically 5512 as well as editorially. 5514 The -07 draft was edited by Kim Kinnear and was based in part on 5515 reviews by Richard Jones, Bernie Volz, and Steve Gonczi. It embodies 5516 several technical updates as well as numerous editorial revisions 5517 that enhanced both correctness as well as clarity. 5519 This, the -08 draft was edited by Kim Kinnear and was based on the 5520 results of two conference calls held in October and November of 2000. 5521 It includes the correct second port number, a new state to synchron- 5522 ize conflict resolution with load balancing, a generally accepted 5523 approach to secondary pool allocation, and many other updates based 5524 on both operational as well as implementation experience. 5526 These most recent changes have not been widely circulated among the 5527 other authors prior to submission to the IETF. 5529 Glenn Waters of Nortel Networks contributed ideas and enthusiasm to 5530 make a Failover protocol that was both "safe" and "lazy". 5532 15. References 5534 [AGENTINFO] Patrick, M., "draft-ietf-dhc-agent-options-11.txt", July, 5535 2000. 5537 [DDNS] Rekhter, Y., Stapp, M., "draft-ietf-dhc-dhcp-dns-12.txt", 5538 March, 2000. 5540 [LOADB] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "draft-ietf- 5541 dhc-loadb-02.txt", July, 1999. 5543 [RFC 1035] Mockapetris, P., "Domain Names - Implementation and 5544 Specification", November, 1987. 5546 [RFC 1321] Rivest, R., and Dusse, S., "The MD5 Message-Digest Algo- 5547 rithm", RFC 1321, MIT Laboratory for Computer Science, RSA Data 5548 Security Inc., April 1992. 5550 [RFC 1534] Droms, R., "Interoperation between DHCP and BOOTP", RFC 5551 1534, October 1993. 5553 [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate 5554 Requirement Levels", RFC 2119. 5556 [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 5557 2131, March 1997. 5559 [RFC 2132] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor 5560 Extensions", Internet RFC 2132, March 1997. 5562 [RFC 2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic 5563 Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April 5564 1997 5566 [RFC 2139] Rigney, C., "Radius Accounting", RFC 2139, Livingston 5567 Enterprises, April 1997. 5569 [RFC 2246] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, 5570 January 1999. 5572 [RFC 2434] Alvestrand, H. and T. Narten, "Guidelines for Writing an 5573 IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 5574 1998. 5576 [RFC 2487] Hoffman, P., "SMTP Service Extension for Secure SMTP over 5577 TLS", RFC 2487, January 1999. 5579 [RFC 2595] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC 5580 2595, June 1999. 5582 [USERCLASS] Droms, R., Demirtjis A., Stump, G., Gu, Y., Vyaghrapuri, 5583 R., Beser, B., Privat, J. "draft-ietf-dhc-userclass-08.txt", July, 5584 2000. 5586 16. Author's information 5588 Ralph Droms 5589 Kim Kinnear 5590 Mark Stapp 5591 Cisco Systems 5592 250 Apollo Drive 5593 Chelmsford, MA 01824 5595 Phone: (978) 244-8000 5597 EMail: rdroms@cisco.com 5598 kkinnear@cisco.com 5599 mjs@cisco.com 5601 Bernie Volz 5602 IPWorks, Inc. 5603 959 Concord St. 5604 Framingham, MA 01701 5606 Phone: (508) 879-1809 5608 EMail: volz@ipworks.com 5610 Steve Gonczi 5611 Network Engines, Inc. 5612 25 Dan Road 5613 Canton, MA 02021-2817 5615 Phone: (781) 332-1165 5617 Email: steve.gonczi@networkengines.com 5619 Greg Rabil, Mike Dooley, Arun Kapur 5620 Lucent Technologies 5621 400 Lapp Road 5622 Malvern, PA 19355 5624 Phone: (800) 208-2747 5626 EMail: grabil@lucent.com 5627 mdooley@lucent.com 5628 akapur@lucent.com 5630 17. Full Copyright Statement 5632 Copyright (C) The Internet Society (2000). All Rights Reserved. 5634 This document and translations of it may be copied and furnished to oth- 5635 ers, and derivative works that comment on or otherwise explain it or 5636 assist in its implementation may be prepared, copied, published and dis- 5637 tributed, in whole or in part, without restriction of any kind, provided 5638 that the above copyright notice and this paragraph are included on all 5639 such copies and derivative works. However, this document itself may not 5640 be modified in any way, such as by removing the copyright notice or 5641 references to the Internet Society or other Internet organizations, 5642 except as needed for the purpose of developing Internet standards in 5643 which case the procedures for copyrights defined in the Internet Stan- 5644 dards process must be followed, or as required to translate it into 5645 languages other than English. 5647 The limited permissions granted above are perpetual and will not be 5648 revoked by the Internet Society or its successors or assigns. 5650 This document and the information contained herein is provided on an "AS 5651 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 5652 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 5653 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 5654 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT- 5655 NESS FOR A PARTICULAR PURPOSE.