idnits 2.17.1 draft-ietf-dhc-failover-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 62 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([RFC2131]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1451 has weird spacing: '...od ends addre...' == Line 1897 has weird spacing: '...eserved not...' == Line 2411 has weird spacing: '... accept tim...' == Line 2412 has weird spacing: '... accept acc...' == Line 2413 has weird spacing: '... accept acce...' == (9 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests, and any load balancing (described in section 5.3) MUST NOT be used. When allocating new IP addresses, each server SHOULD allocate from its own IP address pool (if that can be determined), where the primary SHOULD allocate only FREE IP addresses, and the secondary SHOULD allocate only BACKUP IP addresses. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease on an IP address irrespective of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the latest of: 1) the potential-expiration-time already acknowledged by the other server or 2) the lease-expiration-time or 3) `potential-expiration-time received from the partner server. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2001) is 8473 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 2816 -- Looks like a reference, but probably isn't: '3' on line 2084 -- Looks like a reference, but probably isn't: '4' on line 2561 -- Looks like a reference, but probably isn't: '9' on line 2687 -- Looks like a reference, but probably isn't: '7' on line 2749 -- Looks like a reference, but probably isn't: '8' on line 2777 -- Looks like a reference, but probably isn't: '2' on line 2867 -- Looks like a reference, but probably isn't: '5' on line 2905 -- Looks like a reference, but probably isn't: '6' on line 3087 -- Looks like a reference, but probably isn't: '10' on line 3251 -- Looks like a reference, but probably isn't: '11' on line 3299 -- Looks like a reference, but probably isn't: '12' on line 3322 -- Possible downref: Non-RFC (?) normative reference: ref. 'AGENTINFO' -- Possible downref: Non-RFC (?) normative reference: ref. 'DDNS' -- Possible downref: Non-RFC (?) normative reference: ref. 'LOADB' ** Downref: Normative reference to an Informational RFC: RFC 1321 ** Obsolete normative reference: RFC 2139 (Obsoleted by RFC 2866) ** Obsolete normative reference: RFC 2246 (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 2487 (Obsoleted by RFC 3207) -- Possible downref: Non-RFC (?) normative reference: ref. 'USERCLASS' Summary: 11 errors (**), 0 flaws (~~), 9 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Ralph Droms 3 INTERNET DRAFT Bucknell University 5 Kim Kinnear 6 Mark Stapp 7 Cisco Systems 9 Bernie Volz 10 IPWorks 12 Steve Gonczi 13 Network Engines 15 Greg Rabil 16 Mike Dooley 17 Arun Kapur 18 Lucent Technologies 20 July 2000 21 Expires January 2001 23 DHCP Failover Protocol 24 26 Status of this Memo 28 This document is an Internet-Draft and is in full conformance with 29 all provisions of Section 10 of RFC2026. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF), its areas, and its working groups. Note that 33 other groups may also distribute working documents as Internet- 34 Drafts. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet- Drafts as reference 39 material or to cite them other than as "work in progress." 41 The list of current Internet-Drafts can be accessed at 42 http://www.ietf.org/ietf/1id-abstracts.txt 44 The list of Internet-Draft Shadow Directories can be accessed at 45 http://www.ietf.org/shadow.html. 47 Copyright Notice 49 Copyright (C) The Internet Society (2000). All Rights Reserved. 51 Abstract 53 DHCP [RFC 2131] allows for multiple servers to be operating on a 54 single network. Some sites are interested in running multiple 55 servers in such a way so as to provide redundancy in case of server 56 failure. In order for this to work reliably, the cooperating primary 57 and secondary servers must maintain a consistent database of the 58 lease information. This implies that servers will need to coordinate 59 any and all lease activity so that this information is synchronized 60 in case of failover. 62 This document defines a protocol to provide such synchronization 63 between two servers. One server is designated the "primary" server, 64 the other is the "secondary" server. This document also describes a 65 way to integrate the failover protocol with the DHCP load balancing 66 approach. 68 This document is a substantial reorganization as well as a technical 69 and editorial revision of draft-ietf-dhc-failover-05.txt. 71 Table of Contents 73 1. Introduction................................................. 4 74 2. Terminology.................................................. 5 75 2.1. Requirements terminology................................... 5 76 2.2. DHCP and failover terminology.............................. 5 77 3. Background and External Requirements......................... 9 78 3.1. Key aspects of the DHCP protocol........................... 9 79 3.2. BOOTP relay agent implementation........................... 11 80 3.3. What does it mean if a server can't communicate with its partner? 12 81 3.4. Challenging scenarios for a Failover protocol.............. 12 82 3.5. Using TCP to detect partner server failure................. 14 83 4. Design Goals................................................. 15 84 4.1. Design goals for this protocol............................. 15 85 4.2. Limitations of this protocol............................... 16 86 5. Protocol Overview............................................ 17 87 5.1. Messages and States........................................ 17 88 5.2. Fundamental guarantees..................................... 20 89 5.3. Load balancing............................................. 26 90 5.4. Operating in NORMAL state.................................. 27 91 5.5. Operating in COMMUNICATIONS-INTERRUPTED state.............. 27 92 5.6. Operating in PARTNER-DOWN state............................ 28 93 5.7. Operating in RECOVER state................................. 28 94 5.8. Operating in STARTUP state................................. 28 95 5.9. Time synchronization between servers....................... 28 96 5.10. IP address binding-status................................. 29 97 5.11. DNS dynamic update considerations......................... 33 98 5.12. Reservations and failover................................. 37 99 5.13. Dynamic BOOTP and failover................................ 39 100 5.14. Guidelines for selecting MCLT............................. 39 101 6. Common Message Format........................................ 40 102 6.1. Message header format...................................... 40 103 6.2. Common option format....................................... 43 104 6.3. Batching multiple binding update transactions in one BNDUPD mes- 44 105 7. Protocol Messages............................................ 46 106 7.1. BNDUPD message [3]......................................... 46 107 7.2. BNDACK message [4]......................................... 56 108 7.3. UPDREQ message [9]......................................... 59 109 7.4. UPDREQALL message [7]...................................... 60 110 7.5. UPDDONE message [8]........................................ 61 111 7.6. POOLREQ message [1]........................................ 62 112 7.7. POOLRESP message [2]....................................... 63 113 7.8. CONNECT message [5]........................................ 64 114 7.9. CONNECTACK message [6]..................................... 68 115 7.10. STATE message [10]........................................ 71 116 7.11. CONTACT message [11]...................................... 72 117 7.12. DISCONNECT message [12]................................... 73 118 8. Connection Management........................................ 74 119 8.1. Connection granularity..................................... 74 120 8.2. Creating the TCP connection................................ 74 121 8.3. Using the TCP connection for determining communications status 76 122 8.4. Using the TCP connection for binding data.................. 78 123 8.5. Using the TCP connection for control messages.............. 78 124 8.6. Losing the TCP connection.................................. 78 125 9. Failover Endpoint States..................................... 79 126 9.1. Server Initialization...................................... 79 127 9.2. Server State Transitions................................... 79 128 9.3. STARTUP state.............................................. 82 129 9.4. PARTNER-DOWN state......................................... 84 130 9.5. RECOVER state.............................................. 86 131 9.6. NORMAL state............................................... 89 132 9.7. COMMUNICATIONS-INTERRUPTED State........................... 91 133 9.8. POTENTIAL-CONFLICT state................................... 95 134 9.9. RESOLUTION-INTERRUPTED state............................... 96 135 9.10. RECOVER-DONE state........................................ 97 136 9.11. PAUSED state.............................................. 98 137 9.12. SHUTDOWN state............................................ 98 138 10. Safe Period................................................. 99 139 11. Security.................................................... 101 140 11.1. Simple shared secret...................................... 101 141 11.2. TLS....................................................... 102 142 12. Failover Options............................................ 103 143 12.1. addresses-transferred..................................... 103 144 12.2. assigned-IP-address....................................... 103 145 12.3. binding-status............................................ 104 146 12.4. client-identifier......................................... 104 147 12.5. client-hardware-address................................... 105 148 12.6. client-last-transaction-time.............................. 105 149 12.7. client-reply-options...................................... 105 150 12.8. client-request-options.................................... 106 151 12.9. DDNS...................................................... 107 152 12.10. delayed-service-parameter................................ 108 153 12.11. hash-bucket-assignment................................... 108 154 12.12. lease-expiration-time.................................... 108 155 12.13. max-unacked-bndupd....................................... 109 156 12.14. MCLT..................................................... 109 157 12.15. message.................................................. 109 158 12.16. message-digest........................................... 110 159 12.17. potential-expiration-time................................ 110 160 12.18. receive-timer............................................ 110 161 12.19. protocol-version......................................... 111 162 12.20. reject-reason............................................ 112 163 12.21. sending-server-IP-address................................ 113 164 12.22. server-flags............................................. 113 165 12.23. server-state............................................. 114 166 12.24. start-time-of-state...................................... 114 167 12.25. TLS-reply................................................ 115 168 12.26. TLS-request.............................................. 115 169 12.27. vendor-class-identifier.................................. 115 170 12.28. vendor-specific-options.................................. 116 171 13. IANA Considerations......................................... 116 172 14. Acknowledgments............................................. 116 173 15. References.................................................. 118 174 16. Author's information........................................ 119 175 17. Full Copyright Statement.................................... 120 177 1. Introduction 179 DHCP [RFC 2131] allows for multiple servers to be operating on a sin- 180 gle network. Some sites are interested in running multiple servers 181 in such a way so as to provide redundancy in case of server failure 182 since the DHCP subsystem is in many cases a critical part of the net- 183 work infrastructure. 185 This document defines a protocol to provide synchronization between 186 two servers in order that each can take over for the other should 187 either one fail or become unreachable. 189 One server is designated the "primary" server, the other is the 190 "secondary" server, and most DHCP client requests are sent to each 191 server (see Section 3.1.1 for details). 193 In order to provide a high availability DHCP service, these 194 cooperating primary and secondary servers must maintain a consistent 195 database of lease information. This implies that servers will need 196 to coordinate all lease activity so that this information is syn- 197 chronized in case failover is required. The protocol messages and 198 processing techniques required to maintain a consistent database are 199 specified in the protocol described here. 201 The failover protocol also contains a way to integrate the DHCP load- 202 balancing algorithm described in [LOADB] with the failover protocol. 204 2. Terminology 206 This section discusses both the generic requirements terminology com- 207 mon to many IETF protocol specifications as well as specialized DHCP 208 and failover protocol specific terminology. 210 2.1. Requirements terminology 212 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 213 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 214 document are to be interpreted as described in RFC 2119 [RFC 2119]. 216 2.2. DHCP and failover terminology 218 This document uses the following terms: 220 o "binding" 222 A binding is a collection of configuration parameters, includ- 223 ing at least an IP address, associated with or "bound to" a 224 DHCP client. Bindings are managed by DHCP servers. 226 o "binding database" 228 The collection of bindings managed by a primary and secondary. 230 o "binding update transaction" 232 A binding update transaction refers to the set of information 233 (contained in options) necessary to perform a binding update 234 for a single IP address. It will be comprised of the 235 assigned-IP-address option and the binding-status option, along 236 other options as appropriate. 238 o "binding-status" 240 The binding-status is the status of an IP address with respect 241 to its association with a client. There are specific binding- 242 status values defined for use by the failover protocol, e.g., 243 ACTIVE, FREE, RELEASED, ABANDONED, etc. These are designed to 244 map more or less directly onto the binding-status values used 245 internally in most DHCP server implementations. The term 246 binding-status refers to the concept also sometimes known as 247 "lease state" or "IP address state", but in this document the 248 term "state" is reserved for the failover state of a failover 249 endpoint, and binding-status is always used to refer to the 250 state associated with an IP address or lease. 252 o "DHCP client" or "client" 254 A DHCP client is an Internet host using DHCP to obtain confi- 255 guration parameters such as a network address. The term 256 "client" used within this document always means a DHCP client, 257 and never one of the two failover servers. 259 o "DHCP server" or "server" 261 A DHCP server is an Internet host that returns configuration 262 parameters to DHCP clients. 264 o "DDNS" 266 An abbreviation for "Dynamic DNS", which refers to the capabil- 267 ity to update a DNS server's name (actually resource record) 268 database using an on-the-wire protocol defined in [RFC 2136]. 270 o "DNS" 272 An abbreviation for "Domain Name System", a scheme where a cen- 273 tral name repository is used to map names to IP addresses and IP 274 addresses to names. 276 o "failover endpoint" 278 The failover protocol allows for there to be a unique failover 279 endpoint per partner per role (where role is primary or secon- 280 dary). This failover endpoint can take actions and hold unique 281 states. There are thus a maximum of two failover endpoints per 282 server per partner (one for each partner as a primary and one 283 for that same partner as a secondary.) 285 o "FQDN" 287 An FQDN is a "fully qualified domain name". A fully qualified 288 domain name generally is a host name with at least one zone 289 name, for example "www.dhcp.org" is a fully qualified domain 290 name. 292 o "lazy update" 294 Lazy update refers to the requirement placed on a server imple- 295 menting a failover protocol to update its failover partner when- 296 ever the binding database changes. A failover protocol which 297 didn't support lazy update would require the failover partner 298 update to be complete before a DHCP server could respond to a 299 DHCP client request with a DHCPACK. A failover protocol which 300 does support lazy update places no such restriction on the 301 update of the failover partner server, and so a server can allo- 302 cate an IP address or extend a lease on an IP address and then 303 update its failover partner as time permits. A failover proto- 304 col which supports lazy update not only removes the requirement 305 to update the failover partner prior to responding to a DHCP 306 client with a DHCPACK, but also allows gathering up batches of 307 updates from one failover server to its partner. 309 o "MCLT" 311 The MCLT refers to maximum client lead time. This time is con- 312 figured on the primary server and transmitted from the primary 313 to the secondary server in the CONNECT message. It is the max- 314 imum amount of time that one server can extend a lease for a 315 client's binding beyond the time known by the partner server. 316 See section 5.2.1 for details. 318 o "partner" 320 A "partner", for the purposes of this document, refers to a 321 failover server, typically the other failover server. In many 322 (if not most) cases, the failover protocol is symmetric with 323 respect to the primary or secondary nature of the servers, and 324 so it is often appropriate to discuss "updating the partner 325 server", since it could be a primary server updating a secondary 326 server or a secondary server updating a primary server. 328 o "Primary server" or "Primary" 329 A DHCP server configured to provide primary service to a set of 330 DHCP clients for a particular set of subnet address pools. 332 o "RR" 334 "RR" is an abbreviation for "resource record". All records in 335 the DNS are resource records. The resource records of most 336 relevance to this document are the "A" resource record, which 337 maps a DNS name to a particular IP address, the "PTR" resource 338 record, which allows a "reverse map", from the IP address back 339 to a DNS name, and the "KEY" resource record, which is used in 340 ways defined in [DDNS] to tag a DNS name with the identity of 341 the DHCP client with which it is associated. 343 o "Secondary server" or "Secondary" 345 A DHCP server configured to act as backup to a primary server 346 for a particular set of subnet address pools. 348 o "stable storage" 350 Every DHCP server is assumed to have some form of what is called 351 "stable storage". Stable storage is used to hold information 352 concerning IP address bindings (among other things) so that this 353 information is not lost in the event of a server failure which 354 requires restart of the server. 356 o "state" 358 In this document, the term "state" refers exclusively to the 359 state of a failover endpoint, for example: NORMAL, 360 COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN. It is not used to 361 refer to any attributes of an IP address or a binding of an IP 362 address. See "binding-status". 364 o "subnet address pool" 366 A subnet address pool is the set of IP addresses which is asso- 367 ciated with a particular network number and subnet mask. In the 368 simple case, there is a single network number and subnet mask 369 and a set of IP addresses. In the more complex case (sometimes 370 called "secondary subnets", sometimes "superscopes"), several 371 (apparently unrelated) network number and subnet mask combina- 372 tions with their associated IP addresses may all be configured 373 together into one subnet address pool. 375 3. Background and External Requirements 377 This section highlights key aspects of the DHCP protocol on which the 378 failover protocol depends. It also discusses the requirements that 379 the failover protocol places on other aspects of the network infras- 380 tructure, and some general issues surrounding server failure detec- 381 tion. Some failure scenarios that provide particular challenges to a 382 failover protocol are discussed. Finally, the challenges inherent in 383 using a TCP connection as a means to detect failure of a partner 384 server are elaborated. 386 3.1. Key aspects of the DHCP protocol 388 The failover protocol is designed to augment the DHCP protocol as 389 described in RFC 2131 [RFC 2131]. There are several key aspects of 390 the DHCP protocol which are required by the failover protocol in 391 order to successfully meet its design goals. 393 3.1.1. Broadcast behavior 395 There are two aspects of the broadcast behavior of the DHCP protocol 396 which are key to making the failover protocol operate successfully. 397 The first is simply that the DHCP protocol requires a DHCP client to 398 broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages. 399 Because of this requirement, a DHCP client who was communicating with 400 one server will automatically be able to communicate with another 401 server if one is available. 403 The second aspect of broadcast behavior is similar to the first, but 404 involves the distinction between a DHCPREQUEST/RENEW and 405 DHCPREQUEST/REBINDING. A DHCPREQUEST/RENEW is the message that a 406 DHCP client uses to extend its lease. It is unicast to the DHCP 407 server from which it acquired the lease. However, the DHCP protocol 408 (in a farsighted move), was explicitly designed so that in the event 409 that a DHCP client cannot contact the server from which it received a 410 lease on an IP address using a DHCPREQUEST/RENEW, the client is 411 required to broadcast its renewal using a DHCPREQUEST/REBINDING to 412 any available DHCP server. Since all DHCP clients were required to 413 implement this algorithm, the failover protocol can have a different 414 server from the one that initially granted a lease be the server to 415 renew a lease. Thus, one server can take over for another with no 416 interruption in the service as experienced by the DHCP client or its 417 associated applications software. 419 3.1.2. Client responsibility 421 In the DHCP protocol the DHCP clients are entrusted with a consider- 422 able responsibility. In particular, after they are granted a lease 423 on an IP address, they are enjoined to only use that IP address while 424 their lease is valid. Every DHCP client is expected to stop using an 425 IP address if the expiration time on the lease has passed and if it 426 cannot get an extension on the lease for that IP address from some 427 DHCP server. Thus, the correct behavior of every DHCP client in this 428 regard is required to ensure the integrity of the DHCP service. On 429 the other hand, incorrect behavior by a client in this area will tend 430 to adversely affect at most one other DHCP client. 432 Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or 433 DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or 434 broadcast for a REBINDING) MUST still have time to run on the lease 435 for that IP address. The DHCP server sends the DHCPACK back unicast 436 to the IP address from which the RENEW or REBINDING originated. 438 Given the existing responsibility placed on the client to only use an 439 IP address when the lease is valid, and to only send in a RENEW or 440 REBINDING if the lease is valid, the failover protocol relies on DHCP 441 clients to perform responsibly and will, in the absence of conflict- 442 ing information, believe a DHCP client that is attempting to RENEW or 443 REBIND a lease on an IP address is the legitimate owner of that IP 444 address. 446 If clients do not follow these rules, it is possible for an address 447 to be in use by more than one client. For a single server, this hap- 448 pens because the server has leased the expired address to another 449 client and the original client is also attempting to use the address. 450 The server would NAK the renewal request. This is made slightly worse 451 in the failover protocol if the two servers are unable to communicate 452 with each other and one server leases an available address to a new 453 client while the other server receives a renewal from a different 454 client. In this case, both servers lease the same address to dif- 455 ferent clients for the MCLT time. 457 One troublesome issue is that of the DHCP client responsibility when 458 sending in DHCPREQUEST/INIT-REBOOT requests. While the original DHCP 459 RFC was written to require a DHCP client to have time left to run on 460 the lease for an IP address if the client is sending an INIT-REBOOT 461 request, it was sufficiently unclear that some client vendors didn't 462 realize this until recently. Since the INIT-REBOOT request was sent 463 with the IP address in the dhcp-requested-address option and not in 464 the ciaddr (for perfectly good reasons), the similarity to the RENEW 465 and REBINDING case was lost on many people. 467 At present, the failover protocol does not assume that a client send- 468 ing in an INIT-REBOOT request necessarily has a valid lease on the IP 469 address appearing in the dhcp-requested-address option in the INIT- 470 REBOOT request. 472 The implications of this are as follows: Assume that there is a DHCP 473 client that gets a lease from one server while that server is unable 474 to communicate with its failover partner. Then, assume that after 475 that client reboots it is able only to communicate with the other 476 failover server. If the failover servers have not been able to com- 477 municate with each other during this process, then the DHCP client 478 will get a new IP address instead of being able to continue to use 479 its existing IP address. This will affect no applications on the DHCP 480 client, since it is rebooting. However, it will use up an additional 481 IP address in this marginal case. 483 3.1.3. Stable storage update before DHCPACK 485 The DHCP protocol allocates resources, and in order to operate 486 correctly it requires that a DHCP server update some form of stable 487 storage prior to sending a DHCPACK to a DHCP client in order to grant 488 that client a lease on an IP address. 490 One of the goals of the failover protocol is that it not add signifi- 491 cant additional time to this already time consuming requirement to 492 update stable storage prior to a DHCPACK. In particular, adding a 493 requirement to communicate with another server prior to sending a 494 DHCPACK would greatly simplify the failover protocol, but it would 495 unacceptably limit the potential scalability of any DHCP server which 496 employed the failover protocol. 498 3.2. BOOTP relay agent implementation 500 Many DHCP clients are not resident on the same network segment as a 501 DHCP server. In order to support this form of network architecture, 502 most contemporary routers implement something known as a BOOTP Relay 503 Agent. This capability inside of a router listens for all broadcasts 504 at the DHCP port, port 67, and will relay any broadcasts that it 505 receives on to a DHCP server. The IP address of the DHCP server must 506 have been previously configured into the router. As part of the 507 relay process, the relay agent will place the address of the inter- 508 face on which it received the broadcast into the giaddr field of the 509 DHCP packet. 511 Since the failover protocol requires two DHCP servers to receive any 512 broadcast DHCP messages, in order to work with DHCP clients which are 513 not local to the DHCP server, the BOOTP relay agent on the router 514 closest to the DHCP client must be configured to point at more than 515 one DHCP server. 517 Most BOOTP relay agent implementations allow this duplication of 518 packets. 520 If this is not possible, an administrator might be able to configure 521 the relay agent with a subnet broadcast address, but in this case the 522 primary and secondary DHCP servers in a failover pair must both 523 reside on the same subnet. 525 3.3. What does it mean if a server can't communicate with its partner? 527 In any protocol designed to allow one server to take over some 528 responsibilities from a partner server in the event of "failure" of 529 that partner server, there is an inherent difficulty in determining 530 when that partner server has failed. 532 In fact, it is fundamentally impossible for one server to distinguish 533 a network communications failure from the outright failure of the 534 server to which it is trying to communicate. In the case where each 535 server is handing out resources (in this case IP addresses) to a 536 client community, mistaking an inability to communicate with a 537 partner server for failure of that partner server could easily cause 538 both servers to be handing out the same IP addresses to different 539 clients. 541 One way that this is sometimes handled is for there to be more than 542 two servers. In the case of an odd number of servers, the servers 543 that can still communicate with a majority of other servers will con- 544 sider themselves operational, and any server which can't communicate 545 to a majority of other servers must immediately cease operations. 547 While this technique works in some domains, having the only server to 548 which a DHCP client can communicate voluntarily shut itself down 549 seems like something worth avoiding. 551 The failover protocol will operate correctly while both servers are 552 unable to communicate, whether they are both running or not. At some 553 point there may be resource contention, and if one of the servers is 554 actually down, then the operator can inform the operational server 555 and the operational server will be able to use all of the failed 556 server's resources. 558 The protocol also allows detection of an orderly shutdown of a parti- 559 cipating server. 561 3.4. Challenging scenarios for a Failover protocol 563 There exist two failure scenarios which provide particular challenges 564 to the correctness guarantees of a failover protocol. 566 3.4.1. Primary Server crash before "lazy" update: 568 In the case where the primary server sends a DHCPACK to a client for 569 a newly allocated IP address and then crashes prior to sending the 570 corresponding update to the secondary server, the secondary server 571 will have no record of the IP address allocation. When the secondary 572 server takes over, it may well try to allocate that IP address to a 573 different client. In the case where the first client to receive the 574 IP address is not on the net at the time (yet while there was still 575 time to run on its lease), an ICMP echo (i.e., ping) will not prevent 576 the secondary server from allocating that IP address to a different 577 client. 579 The failover protocol deals with this situation by having the primary 580 and secondary servers allocate addresses for new clients from dis- 581 joint address pools. See section 5.4 for details. 583 A more likely (in that DHCPRENEWs are presumably more common than 584 DHCPDISCOVERs) and more subtle version of this problem is where the 585 primary server crashes after extending a client's lease time, and 586 before updating the secondary with a new time using a lazy update. 587 After the secondary takes over, if the client is not connected to the 588 network the secondary will believe the client's lease has expired 589 when, in fact, it has not. In this case as well, the IP address 590 might be reallocated to a different client while the first client is 591 still using it. 593 This scenario is handled by the failover protocol through control of 594 the lease time and the use of the maximum client lead time (MCLT). 595 See section 5.2.1 for details. 597 3.4.2. Network partition where DHCP servers can't communicate but each 598 can talk to clients: 600 Several conditions are required for this situation to occur. First, 601 due to a network failure, the primary and secondary servers cannot 602 communicate. As well, some of the DHCP clients must be able to com- 603 municate with the primary server, and some of the clients must now 604 only be able to communicate with the secondary server. When this 605 condition occurs, both primary and secondary servers could attempt to 606 allocate IP addresses for new clients from the same pool of available 607 addresses. At some point, then, two clients will end up being allo- 608 cated the same IP address. This will cause problems when the network 609 failure that created this situation is corrected. 611 The failover protocol deals with this situation by having the primary 612 and secondary servers allocate addresses for new clients from dis- 613 joint address pools. See section 5.4 for details. 615 3.5. Using TCP to detect partner server failure 617 There are several characteristics of TCP that are important to the 618 functioning of the failover protocol, which uses one TCP connection 619 for both bulk data transfer as well as to assess communications 620 integrity with the other server. Reliable and ordered message 621 delivery are chief among these important characteristics. 623 It would be nice to use the capabilities built in to TCP to allow it 624 to determine if communications integrity exists to the failover 625 partner but this strategy contains some problems which require 626 analysis. There exist three fundamental cases for an open TCP con- 627 nection that must be examined. 629 1. When no data is being sent then no messages are traveling 630 across the TCP connection. 632 2. When data is queued to be sent, and the receiver has not 633 blocked the sending of additional data, then messages are 634 flowing across the TCP connection containing the applications 635 data. 637 3. When data is queued to be sent, and the receiver has blocked 638 the transmission of additional data, then persist messages are 639 flowing from the receiver to the sender to ensure that the 640 sender doesn't miss the receiver opening the window for 641 further transmissions. 643 The first case can be turned into the second case by sending 644 application-level keep-alive messages periodically when there is no 645 other data queued to be sent. Note TCP keep-alive messages might be 646 used as well, but they present additional problems. 648 Thus, we can ensure that the TCP connection has messages flowing 649 periodically across the connection fairly easily. The question 650 remains as to what TCP will do if the other end of the connection 651 fails to respond (either because of network partition or because the 652 receiving server crashes). TCP will attempt to retransmit a message 653 with an exponential backoff, and will eventually timeout that 654 retransmission. However, the length of that timeout cannot, in gen- 655 eral, be set on a per-connection basis, and is frequently as long as 656 nine minutes, though in some cases it may be as short as two minutes. 657 On some systems it can be set system-wide, while on other systems it 658 cannot be changed at all. 660 A value for this timeout that would be appropriate for the failover 661 protocol, say less than 1 minute, could have unpleasant side-effects 662 on other applications running on the same server, assuming that it 663 could be changed at all on the host operating system. 665 Nine minutes is a long time for the DHCP service to be unavailable to 666 any new clients that were being served by the server which has 667 crashed, when there is another server running that could respond to 668 them as soon as it determines that its partner is not operational. 670 The conclusion drawn from this analysis is that TCP provides very 671 useful support for the failover protocol in the areas of reliable and 672 ordered message delivery, but cannot by itself be relied upon to 673 detect partner server failure in a fashion acceptable to the needs of 674 the failover protocol. Additional failover protocol capabilities 675 have been created to support timely detection of partner server 676 failure. See section 8.3 for details on this mechanism. 678 4. Design Goals 680 This section lists the design goals and the limitations of the fail- 681 over protocol. 683 4.1. Design goals for this protocol 685 The following is a list of goals that are met by this protocol. They 686 are listed in priority order. 688 1. Implementations of this protocol must work with existing DHCP 689 client implementations based on the DHCP protocol [1]. 691 2. Implementations of the protocol must work with existing BOOTP 692 relay agent implementations. 694 3. The protocol must provide failover redundancy between servers 695 that are not located on the same subnet. 697 4. Provide for continued service to DHCP clients through an 698 automated mechanism in the event of failure of the primary 699 server. 701 5. Avoid binding an IP address to a client while that binding is 702 currently valid for another client. In other words, do not 703 allocate the same IP address to two clients. 705 6. Minimize any need for manual administrative intervention. 707 7. Introduce no additional delays in server response time as a 708 result of the network communications required to implement the 709 failover protocol, i.e., don't require communications with the 710 partner between the receipt of a DHCPREQUEST and the 711 corresponding DHCPACK. 713 8. Share IP address ranges between primary and secondary servers; 714 i.e., impose no requirement that the pool of available 715 addresses be manually or permanently divided between servers. 717 9. Continue to meet the goals and objectives of this protocol in 718 the event of server failure or network partition. 720 10. Provide graceful reintegration of full protocol service after 721 server failure or network partition. 723 11. Allow for one computer to act as a secondary server for multi- 724 ple primary servers. The protocol must allow failover primary 725 and secondary configuration choices to be made at a granular- 726 ity smaller than "all of the subnets served by a single 727 server", though individual implementations may not choose to 728 allow such flexibility. 730 12. Ensure that an existing client can keep its existing IP 731 address binding if it can communicate with either the primary 732 or secondary DHCP server implementing this protocol - not just 733 whichever server that originally offered it the binding. 735 13. Ensure that a new client can get an IP address from some 736 server. Ensure that in the face of partition, where servers 737 continue to run but cannot communicate with each other, the 738 above goals and requirements may be met. In addition, when 739 the partition condition is removed, allow graceful automatic 740 re-integration without requiring human intervention. 742 14. If either primary or secondary server loses all of the infor- 743 mation that it has stored in stable storage, ensure that it be 744 able to refresh its stable storage from the other server. 746 15. Support load balancing between the primary and secondary 747 servers, and allow configuration of the percentage of the 748 client population served by each with a moderately fine granu- 749 larity. 751 4.2. Limitations of this protocol 753 The following are explicit limitations of this protocol. 755 1. This protocol provides only one level of redundancy through a 756 single secondary server for each primary server. 758 2. A subset of the address pool is reserved for secondary server 759 use. In order to handle the failure case where both servers 760 are able to communicate with DHCP clients, but unable to com- 761 municate with each other, a subset of the IP address pool must 762 be set aside as a private address pool for the secondary 763 server. The secondary can use these to service newly arrived 764 DHCP clients during such a period. The required size of this 765 private pool is based only on the arrival rate of new DHCP 766 clients and the length of expected downtime, and is not influ- 767 enced in any way by the total number of DHCP clients supported 768 by the server pair. 770 The failover protocol can be used in a mode where both the 771 primary and secondary servers can share the load between them 772 when both are operating. In this load balancing mode, the 773 addresses allocated by the primary server to the secondary 774 server are not unused, but are used instead to service the 775 portion of the client base to which the secondary server is 776 required to respond. See section 5.3 for more information on 777 load balancing. 779 3. The primary and secondary servers do not respond to client 780 requests at all while recovering from a failure that could 781 have resulted in duplicate IP assignments. (When synchroniz- 782 ing in POTENTIAL-CONFLICT state). 784 5. Protocol Overview 786 This section will discuss the failover protocol at a relatively high 787 level of detail. In the event that a description in this section 788 conflicts (or appears to conflict due to the overview nature of this 789 section) with information in later sections of this draft, the infor- 790 mation in the later sections should be considered authoritative. 792 5.1. Messages and States 794 This protocol is centered around the message exchange used by one 795 server to update the other server of binding database changes result- 796 ing from DHCP client activity: 798 o Communication of binding database changes 800 The binding update (BNDUPD) message is used to send the binding 801 database changes to the partner server, and the partner server 802 responds with a binding acknowledgement (BNDACK) message when it 803 has successfully committed those changes to its own stable 804 storage. 806 All of the other messages involve ancillary issues: 808 o Management of available IP addresses 810 The pool request (POOLREQ) is used by the secondary server to 811 request an allocation of IP addresses from the primary server. 812 The pool response (POOLRESP) is used by the primary server to 813 inform the secondary server how many IP addresses were allocated 814 to the secondary server as the result of the pool request. 816 o Synchronization of the binding databases between the servers 817 after they've been out of communications 819 The update request (UPDREQ) message is used by one server to 820 request that its partner send it all binding database informa- 821 tion that it has not already seen. The update request all 822 (UPDREQALL) message is used by one server to request that all 823 binding database information be sent in order to recover from a 824 total loss of its binding database by the requesting server. 825 The update done (UPDDONE) message is used by the responding 826 server to indicate that all requested updates have been sent the 827 responding server and acked by the requesting server. 829 o Connection establishment 831 The connect (CONNECT) message is used by the primary server to 832 establish a high level connection with the other server, and to 833 transmit several important configuration data items between the 834 servers. The connect acknowledgement message (CONNECTACK) is 835 used by the secondary server to respond to a CONNECT message 836 from the primary server. The disconnect (DISCONNECT) message is 837 used by either server when closing a connection. 839 o Server synchronization 841 The state change (STATE) message is used by either server to 842 inform the other server of a change of failover state. 844 o Connection integrity management 846 The contact (CONTACT) message is used by either server to ensure 847 that the other server continues to see the connection as opera- 848 tional. It MUST be transmitted periodically over every esta- 849 blished connection if other message traffic is not flowing, and 850 it MAY be sent at any time. 852 5.1.1. Failover endpoints 854 The proper operation of the failover protocol requires more than the 855 transmission of messages between one server and the other. Each end- 856 point might seem to be a single DHCP server, but in fact there are 857 many situations where additional flexibility in configuration is use- 858 ful. 860 For instance, there might be several servers which are each primary 861 for a distinct set of address pools, and one server which is secon- 862 dary for all of those address pools. The situation with the pri- 863 maries is straightforward, but the secondary will need to maintain a 864 separate failover state, partner state, and communications up/down 865 status for each of the separate primary servers for which it is act- 866 ing as a secondary. 868 The failover protocol calls for there to be a unique failover end- 869 point per partner per role (where role is primary or secondary). 870 This failover endpoint can take actions and hold unique states. 871 There are thus a maximum of two failover endpoints per partner (one 872 for the partner as a primary and one for that same partner as a 873 secondary.) 875 Thus, in the case where there are two primary servers A and B each 876 backed up by a single common secondary server C, there is one fail- 877 over endpoint on each of A and B, and two different failover end- 878 points on C. The two different failover endpoints on C each have 879 unique states and independent TCP connections. 881 This document frequently describes the behavior of the protocol in 882 terms of primary and secondary servers, not primary and secondary 883 failover endpoints. However, it is important to remember that every 884 'server' described in this document is in reality a failover endpoint 885 that resides in a particular process, and that many failover end- 886 points may reside in the same process. 888 It is not the case that there is a unique failover endpoint for each 889 subnet address pool that participates in a failover relationship. On 890 one server, there is one failover endpoint per partner per role, 891 regardless of how many subnet address pools are managed by that com- 892 bination of partner and role. Conversely, on a particular server, 893 any given subnet address pool will be associated with exactly one 894 failover endpoint. 896 When a connection is received from the partner, the unique failover 897 endpoint to which the message is directed is determined solely by the 898 IP address of the partner and the port to which the connection is 899 directed by the partner. See section 8.2. 901 5.2. Fundamental guarantees 903 There a several fundamental restrictions this protocol places on what 904 one server can do in the absence of knowledge of the other server. 905 Operating within these restrictions allows certain guarantees to be 906 made to the partner server, and these are key to the correct opera- 907 tion of the protocol. 909 5.2.1. Control of lease time 911 The key problem with lazy update is that when a server fails after 912 updating a client with a particular lease time and before updating 913 its partner, the partner will believe that a lease has expired even 914 though the client still retains a valid lease on that IP address. 916 In order to handle this problem, a period of time known as the "Max- 917 imum Client Lead Time" (MCLT) is defined and must be known to both 918 the primary and secondary servers. Proper use of this time interval 919 places an upper bound on the difference allowed between the lease 920 time provided to a DHCP client by a server and the lease time known 921 by that server's partner. However, the MCLT is typically much less 922 than the lease time that a server has been configured to offer a 923 client, and so some strategy must exist to allow a server to offer 924 the configured lease time to a client. During a lazy update the 925 updating server typically updates its partner with a potential 926 expiration time which is longer than the lease time previously given 927 to the client and which is longer than the lease time that the server 928 has been configured to give a client. This allows that server to 929 give a longer lease time to the client the next time the client 930 renews its lease, since the time that it will give to the client will 931 not exceed the MCLT beyond the potential expiration time acknowledged 932 by its partner. 934 The PARTNER-DOWN state exists so that a server can be sure that its 935 partner is, indeed, down. Correct operation while in that state 936 requires (generally) that the server wait the MCLT after anything 937 that happened prior to its transition into PARTNER-DOWN state (or, 938 more accurately, when the other server went down if that is known). 939 Thus, the server MUST wait the MCLT after the partner server went 940 down before allocating any of the partner's addresses which were 941 available for allocation. In the event the partner was not in com- 942 munication prior to going down, it might have allocated one or more 943 of its FREE addresses to a DHCP client and been unable to inform the 944 server entering PARTNER-DOWN prior to going down itself. By waiting 945 the MCLT after the time the partner went down, the server in 946 PARTNER-DOWN state ensures that any clients which have a lease on one 947 of the partner's FREE addresses will either time out or contact the 948 server in PARTNER-DOWN by the time that period ends. 950 In addition, once a server has transitioned to PARTNER-DOWN state, it 951 MUST NOT reallocate an IP address from one client to another client 952 until an additional MCLT interval after the lease by the original 953 client expires. (Actually, until the maximum client lead time after 954 what it believes to be the lease expiration time of the client.) 956 Some optimizations exist for this restriction, in that it only 957 applies to leases that were issued BEFORE entering PARTNER-DOWN. Once 958 a server has entered PARTNER-DOWN and it leases out an address, it 959 need not wait this time as long as it has never communicated with the 960 partner since the lease was given out. 962 The fundamental relationship on which much of the correctness of this 963 protocol depends is that the lease expiration time known to a DHCP 964 client MUST NOT be more than the maximum client lead time greater 965 than the potential expiration time known to a server's partner. 967 The remainder of this section makes the above fundamental relation- 968 ship more explicit. 970 This protocol requires a DHCP server to deal with several different 971 lease intervals and places specific restrictions on their relation- 972 ships. The purpose of these restrictions is to allow the other server 973 in the pair to be able to make certain assumptions in the absence of 974 an ability to communicate between servers. 976 The different lease times are: 978 o desired lease interval 980 The desired lease interval is the lease interval that a DHCP 981 server would like to give to a DHCP client in the absence of any 982 restrictions imposed by the Failover protocol. Its determina- 983 tion is outside of the scope of this protocol. Typically this is 984 the result of external configuration of a DHCP server. 986 o actual lease interval 988 The actual lease internal is the lease interval that a DHCP 989 server gives out to a DHCP client in the dhcp-lease-time option 990 of a DHCPACK packet. It may be shorter than the desired client 991 lease interval (as explained below). 993 o potential lease interval 995 The potential lease interval is the lease expiration interval 996 the local server tells to its partner in the potential- 997 expiration-time option of a BNDUPD message. 999 o acknowledged potential lease interval 1001 The acknowledged potential lease interval is the potential lease 1002 interval the partner server has most recently acknowledged in 1003 the potential-expiration-time option of a BNDACK message. 1005 The key restriction (and guarantee) that any server makes with 1006 respect to lease intervals is that the actual client lease interval 1007 never exceeds the acknowledged potential lease interval (if any) by 1008 more than a fixed amount. This fixed amount is called the "Maximum 1009 Client Lead Time" (MCLT). 1011 The MCLT MAY be configurable on the primary server, but for correct 1012 server operation it MUST be the same and known to both the primary 1013 and secondary servers. The secondary server determines the MCLT from 1014 the MCLT option sent from the primary server to the secondary server 1015 in the CONNECT message. 1017 A server MUST record in its stable storage both the actual lease 1018 interval and the most recently acknowledged potential lease interval 1019 for each IP address binding. It is assumed that the desired client 1020 lease interval can be determined through techniques outside of the 1021 scope of this protocol. See section 7.1.5 for more details concern- 1022 ing the times that the server MUST record in its stable storage and 1023 the way that they interact with the lease time that may be offered to 1024 a DHCP client. 1026 Again, the fundamental relationship among these times which MUST be 1027 maintained is: 1029 actual lease interval < 1030 ( acknowledged potential lease interval + MCLT ) 1032 Figure 5.2.1-1 illustrates an initial lease to a client using the 1033 rules discussed in the example which follows it. Note that this is 1034 only one example -- as long as the fundamental relationship is 1035 preserved, the actual times used could be quite different. 1037 DHCP Primary Secondary 1038 time Client Server Server 1040 | (time in intervals) | (absolute time) | 1041 | | | 1042 | >-DHCPDISCOVER-> | | 1043 | <---DHCPOFFER-< | | 1044 | | | 1045 | >-DHCPREQUEST-> | | 1046 | (selecting) | | 1047 | | | 1048 t | <--------DHCPACK-< | | 1049 | lease-time=MCLT | | 1050 | | >-BNDUPD--> | 1051 | | lease-expiration=t+MCLT 1052 | | potential-expiration=t+(MCLT/2)+X 1053 | | | 1054 | | <-BNDACK-< | 1055 | | potential-expiration=t+(MCLT/2)+X 1056 ... ... ... 1057 | | | 1058 t+MCLT/2 | >-DHCPREQUEST-> | | 1059 | (renew) | | 1060 | | | 1061 t1 | <--------DHCPACK-< | | 1062 | lease-time=X | | 1063 | | >-BNDUPD--> | 1064 | | lease-expiration=t1+X 1065 | | potential-expiration=t1+(X/2)+X 1066 | | | 1067 | | <-BNDACK-< | 1068 | | potential-expiration=t1+(X/2)+X 1069 ... ... ... 1071 Figure 5.2.1-1: Lazy Update Message Traffic 1072 X = Desired Lease Interval 1073 Assumes renewal interval = lease interval / 2 1075 DISCUSSION: 1077 This protocol mandates only that the above fundamental relation- 1078 ship concerning lease intervals is preserved. 1080 In the interests of clarity, however, let's examine a specific 1081 example. The MCLT in this case is 1 hour. The desired lease 1082 interval is 3 days, and its renewal time is half the lease 1083 interval. 1085 The rules for this example are: 1087 o What to tell the client: 1089 Take the remainder of the acknowledged potential lease interval. 1090 If this is a new lease, then this value will be zero. If this 1091 remainder plus the MCLT is greater than the desired lease inter- 1092 val, give the client the desired lease interval else give the 1093 client the remainder plus the MCLT. 1095 o What to tell the failover partner server: 1097 Take the renewal interval (typically half of the actual client 1098 lease interval), add to it the desired lease interval, and add 1099 it to the current time to yield the value that goes into the 1100 potential-expiration-time option. 1102 Also tell the failover partner the actual lease interval by 1103 adding it to the current time to yield the value that goes into 1104 the lease-expiration option. 1106 In operation this might work as follows: 1108 When a server makes an offer for a new lease on an IP address to a 1109 DHCP client, it determines the desired lease interval (in this 1110 case, 3 days). It then examines the acknowledged potential lease 1111 interval (which in this case is zero) and determines the remainder 1112 of the time left to run, which is also zero. To this it adds the 1113 MCLT. Since the actual lease interval cannot be allowed to exceed 1114 the remainder of the current acknowledged potential lease interval 1115 plus the MCLT, the offer made to the client is for the remainder 1116 of the current acknowledged potential lease interval (i.e., zero) 1117 plus the MCLT. Thus, the actual lease interval is 1 hour. 1119 Once the server has performed the BNDACK to the DHCP client, it 1120 will update the secondary server with the lease information. How- 1121 ever, the desired potential lease interval will be composed of the 1122 one half of the current actual lease interval added to the desired 1123 lease interval. Thus, the secondary server is updated with a 1124 BNDUPD with a lease interval of 3 days + 1/2 hour specified in the 1125 potential-expiration-time option. 1127 When the primary server receives an ACK to its update of the 1128 secondary server's (partner's) potential lease interval, it 1129 records that as the acknowledged potential lease interval. A 1130 server MUST NOT send a BNDACK in response to a BNDUPD message 1131 until it is sure that the information in the BNDUPD message 1132 resides in its stable storage. Thus, the primary server in this 1133 case can be sure that the secondary server has recorded the poten- 1134 tial lease interval in its stable storage when the primary server 1135 receives a BNDACK message from the secondary server. 1137 When the DHCP client attempts to renew at T1 (approximately one 1138 half an hour from the start of the lease), the primary server 1139 again determines the desired lease interval, which is still 3 1140 days. It then compares this with the remaining acknowledged 1141 potential lease interval (3 days + 1/2 hour) and adjusts for the 1142 time passed since the secondary was last updated (1/2 hour). Thus 1143 the time remaining of the acknowledged potential lease interval is 1144 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which 1145 is more than the desired lease interval of 3 days. So the client 1146 is renewed for the desired lease interval -- 3 days. 1148 When the primary DHCP server updates the secondary DHCP server 1149 after the DHCP client's renewal ACK is complete, it will calculate 1150 the desired potential lease interval as the T1 fraction of the 1151 actual client lease interval (1/2 of 3 days this time = 1.5 days). 1152 To this it will add the desired client lease interval of 3 days, 1153 yielding a total desired partner server lease interval of 4.5 1154 days. In this way, the primary attempts to have the secondary 1155 always "lead" the client in its understanding of the client's 1156 lease interval so as to be able to always offer the client the 1157 desired client lease interval. 1159 Once the initial actual client lease interval of the MCLT is past, 1160 the protocol operates effectively like the DHCP protocol does 1161 today in its behavior concerning lease intervals. However, the 1162 guarantee that the actual client lease interval will never exceed 1163 the remaining acknowledged partner server lease interval by more 1164 than the MCLT allows full recovery from a variety of failures. 1166 5.2.2. Controlled re-allocation of IP addresses 1168 When in PARTNER-DOWN state there is a waiting period after which an 1169 IP address can be re-allocated to another client. For leases which 1170 are available when the server enters PARTNER-DOWN state, the period 1171 is the MCLT from entry into PARTNER-DOWN state. For IP addresses 1172 which are not available when the server enters PARTNER-DOWN state, 1173 the period is the MCLT after the lease becomes available. See sec- 1174 tion 9.4.2 for more details. 1176 In any other state, a server cannot reallocate an address from one 1177 client to another without first notifying its partner (through a 1178 BNDUPD message) and receiving acknowledgement (through a BNDACK 1179 message) that its partner is aware that that first client is not 1180 using the address. 1182 This could be modeled in the following way. Though this specific 1183 implementation is in no way required, it may serve to better illus- 1184 trate the concept. 1186 An "available" IP address on a server may be allocated to any client. 1187 An IP address which was leased to a client and which expired or was 1188 released by that client would take on a new state, EXPIRED or 1189 RELEASED respectively. The partner server would then be notified 1190 that this IP address was EXPIRED or RELEASED through a BNDUPD. When 1191 the sending server received the BNDACK for that IP address showing it 1192 was FREE, it would move the IP address from EXPIRED or RELEASED to 1193 FREE, and it would be available for allocation by the primary server 1194 to any clients. 1196 A server MAY reallocate an IP address in the EXPIRED or RELEASED 1197 state to the same client with no restrictions provided it has not 1198 sent a BNDUPD message to its partner. This situation would exist if 1199 the lease expired or was released after the transition into PARTNER- 1200 DOWN state, for instance. 1202 5.3. Load balancing 1204 In order to implement load balancing between a primary and secondary 1205 server pair, each server must respond to DHCPDISCOVER requests from 1206 some clients and not from other clients. In order to do this suc- 1207 cessfully, each server must be able to determine immediately upon 1208 receipt of a DHCP client request whether it is to service this 1209 request or to ignore it in order to allow the other server to service 1210 the request. 1212 In addition, it should be possible to configure the percentage of 1213 clients which will be serviced by either the primary or secondary 1214 server. This configuration should be more or less continuous, from 1215 all clients serviced by the primary through an even split with half 1216 serviced by each, to all clients serviced by the secondary. 1218 The technique chosen to support these goals is described in [LOADB]. 1220 A bitmap-style Hash Bucket Assignment (as described in [LOADB]) is 1221 used to determine which DHCP clients can be processed. There are two 1222 potential HBA's in a failover server -- a server HBA and a failover 1223 HBA. The way that a server acquires a server HBA is outside of the 1224 scope of the failover protocol, but both servers in a failover pair 1225 MUST have the same server HBA. The failover HBA is sent by the 1226 primary server to the secondary server whenever a connection is esta- 1227 blished, using the hash-bucket-assignment option defined in section 1228 12.11. 1230 When using the server HBA (if any) and the failover HBA (if any), to 1231 decide whether to process a DHCP request, the server HBA always 1232 applies in every failover state, and the failover HBA (which MUST be 1233 a subset of the server HBA) is used by the secondary server to decide 1234 which packets to process when in NORMAL state. 1236 5.4. Operating in NORMAL state 1238 When in NORMAL state, each server services DHCPDISCOVER's and all 1239 other DHCP requests other than DHCPREQUEST/RENEWAL or 1240 DHCPREQUEST/REBINDING from the client set defined by the load balanc- 1241 ing algorithm [LOADB]. Each server services DHCPREQUEST/RENEWAL or 1242 DHCPDISCOVER/REBINDING requests from any client. 1244 In general, whenever the binding database is changed in stable 1245 storage (other than a change resulting from receiving a BNDUPD from 1246 the failover partner), then a BNDUPD message is sent with the con- 1247 tents of that change to the partner server. The partner server then 1248 writes the information about that binding in its bindings database in 1249 stable storage and replies with a BNDACK message. 1251 The binding database in a DHCP server would normally be changed as a 1252 result of DHCP protocol activity with a DHCP client (e.g., granting 1253 a lease to a DHCP client through the familiar 1254 DISCOVER/OFFER/REQUEST/ACK cycle or extending a lease due to a 1255 renewal from a DHCP client) or possibly (on some servers) because a 1256 lease has expired or undergone another state change that must be 1257 recorded in the DHCP binding database. These are the state changes 1258 that would be communicated to the partner server using a BNDUPD mes- 1259 sage. Of course, receipt of a BNDUPD message itself will normally 1260 cause an update of the binding database for all of the IP addresses 1261 contained in the BNDUPD, and a binding database change such as this 1262 MUST NOT trigger a corresponding BNDUPD message to the partner. 1264 5.5. Operating in COMMUNICATIONS-INTERRUPTED state 1266 When operating in COMMUNICATIONS-INTERRUPTED state, each server is 1267 operating independently, but does not assume that its partner is not 1268 operating. The partner server might be operating and simply unable 1269 to communicate with this server, or might not be operating. 1271 Each server responds to the full range of DHCP client messages that 1272 it receives, but in such a way that graceful reintegration is always 1273 possible when its partner comes back into contact with it. 1275 5.6. Operating in PARTNER-DOWN state 1277 When operating in PARTNER-DOWN state, a server assumes that its 1278 partner is not currently operating, but does make allowances for the 1279 possibility that that server was operating in the past, though possi- 1280 bly out of communications with this server. It responds to all DHCP 1281 client requests in PARTNER-DOWN state. 1283 5.7. Operating in RECOVER state 1285 A server operating in RECOVER state assumes that it is reintegrating 1286 with a server that has been operating in PARTNER-DOWN state, and that 1287 it needs to update its bindings database before it services DHCP 1288 client requests. 1290 A server may also operate in RECOVER state in order to fully recover 1291 its bindings database from its partner server. 1293 5.8. Operating in STARTUP state 1295 A server operating in STARTUP state assumes that failover is opera- 1296 tional, and it spends a short time whenever it comes up attempting to 1297 contact the partner. During this time (generally a few seconds), the 1298 server is unresponsive to DHCP client requests. This period exists 1299 in order to give a server a chance to determine that its partner has 1300 changed state since it was last in communications, and to react to 1301 that changed state (if any) prior to responding to DHCP client 1302 requests. 1304 The period of time a server remains in STARTUP state SHOULD be long 1305 enough to ensure that it will connect to the other server if that 1306 server is available for connections. 1308 5.9. Time synchronization between servers 1310 The failover protocol is designed to operate between two servers 1311 which have time values which differ by an arbitrarily large amount. 1312 A particular implementation MAY choose to only support servers whose 1313 time values differ by an arbitrarily small amount. 1315 In any event, whether large or only small differences in time values 1316 are supported, every message that is received MUST be tagged with a 1317 time value as soon as possible after receipt. This time value is 1318 used along with the time value that is sent in every message between 1319 the failover partners to develop a delta time between the servers. 1320 This delta time is used during the connection process to establish a 1321 baseline delta time between the servers, and upon receipt of each 1322 message, the delta time for that message is used to refine the delta 1323 time for the server pair. 1325 While the algorithm for this refinement of delta time is not speci- 1326 fied as part of this protocol, a server SHOULD allow the delta time 1327 value for a pair of failover servers to be periodically updated to 1328 account for time drift. In addition, the delta time value between 1329 servers SHOULD be smoothed in some fashion, so that transient network 1330 delays will not cause it to vary wildly. 1332 A server SHOULD recognize a drastic change in the delta time value as 1333 an event to be signaled to a network administrator, as well as reset- 1334 ting the time delta between the failover partners. 1336 The specific definitions of a minor or drastic change in delta time 1337 as well as the algorithm used to smooth minor changes into the run- 1338 ning delta time are implementation issues and are not further 1339 addresses in this document. 1341 5.10. IP address binding-status 1343 In most DHCP servers an IP address can take on several different 1344 binding-status values, sometimes also called states. While no two 1345 DHCP servers probably have exactly the same possible binding-status 1346 values, the DHCP RFC enforces some commonality among the general 1347 semantics of the binding-status values used by various DHCP server 1348 implementations. 1350 In order to transmit binding database updates between one server and 1351 another using the failover protocol, some common denominator 1352 binding-status values must be defined. It is not expected that these 1353 binding-status-values correspond with any actual implementation of 1354 the DHCP protocol in a DHCP server, but rather that the binding- 1355 status values defined in this document should be a common denominator 1356 of those in use by many DHCP server implementations. It is a goal of 1357 this protocol that any DHCP server can map the various IP address 1358 binding-status values that it uses internally into these failover IP 1359 address binding-status values on transmission of binding database 1360 updates to its partner, and likewise that it can map any failover IP 1361 address binding-status values it received in a binding update into 1362 its internal IP address binding-status values. 1364 The IP address binding-status values defined for the failover proto- 1365 col are listed below. Unless otherwise noted below, there MAY be 1366 client information associated with each of these binding-status 1367 values. 1369 o 1370 o ACTIVE -- Lease is assigned to a client. Client identification 1371 MUST appear. 1373 o EXPIRED -- indicates that a client's binding on an IP address 1374 has expired. When the partner server ACK's the BNDUPD of an 1375 EXPIRED IP address, the server sets its internal state to FREE. 1376 It is then available for allocation to any client of the primary 1377 server. It may be allocated to the same client on the server 1378 where the lease expired if a BNDUPD containing the EXPIRED state 1379 has not yet been sent to the partner (e.g., in the event that 1380 the servers are not in communication). Client identification 1381 SHOULD appear. 1383 o RELEASED -- indicates that a DHCP client sent in a DHCPRELEASE 1384 message. When the partner server ACK's the BNDUPD of an 1385 RELEASED IP address, the server sets its internal state to FREE, 1386 and it is available for allocation by the primary server to any 1387 DHCP client. It may be allocated to the same client if a BNDUPD 1388 has not yet been sent to the partner. Client identification 1389 SHOULD appear. 1391 o FREE -- is used when a DHCP server needs to communicate that an 1392 IP address is unused by any DHCP client, but it was not just 1393 released, expired, or reset by a network administrator. When 1394 the partner server ACK's the BNDUPD of a FREE IP address, the 1395 server sets its internal state such that it is available for 1396 allocation by the primary DHCP server to any DHCP client. (Note 1397 that in PARTNER-DOWN state, after waiting the MCLT, the IP 1398 address MAY be allocated to a DHCP client by the secondary 1399 server.) 1401 Note that when an IP address that was allocated by the secondary 1402 reverts to the FREE state, it must (like any other IP address) 1403 be assigned to the secondary through the POOLREQ/BNDUPD process 1404 before the secondary can reallocate it. 1406 Client identification MAY appear. 1408 o ABANDONED -- indicates that an IP address is considered unusable 1409 by the DHCP subsystem. An IP address for which a valid PING 1410 response was received SHOULD be set to ABANDONED. An IP address 1411 for which a DHCPDECLINE was received should be set to ABANDONED. 1412 Client identification MUST NOT appear. 1414 o RESET -- indicates that this IP address was made available by 1415 operator command. This is a distinct state so that the reason 1416 that the IP address became FREE can be determined. Client iden- 1417 tification MAY appear. 1419 o BACKUP -- indicates that this IP address can be allocated by the 1420 secondary server to a DHCP client at any time. When the MCLT has 1421 passed after its time of entry into PARTNER-DOWN state, the IP 1422 address may be allocated by the primary to any DHCP client. 1423 Client identification MAY appear. 1425 These binding-status values are communicated from one failover 1426 partner to another using the binding-status option, see section 12.3 1427 for details of this option. Unless otherwise noted above there MAY 1428 be client information associated with each of these binding-status 1429 values. 1431 An IP address will move between these binding-status values using the 1432 following state transition diagram: 1434 DHCP client DECLINE or 1435 server detected problem 1436 from any state 1437 +----------+ V +---------+ 1438 External >---->| RESET | | |ABANDONED| 1439 command | | +-->| | 1440 +----------+ +---------+ 1441 | 1442 Comm w/Parter(1) 1443 V 1444 +---------+ Comm(1) +----------+ Comm(1) +---------+ 1445 | EXPIRED |--------->| FREE |<----------| RELEASED| 1446 | | w/Parter | | w/Partner | | 1447 +---------+ +----------+ +---------+ 1448 ^ ^ | | +-----------+ ^ 1449 | | | | | | 1450 | Exp. grace IP | IP addr alloc. IP addr | 1451 | period ends address to sec.(2) reserved | 1452 | | leasedy V V | 1453 | | by | +----------+ +---------+ | 1454 | | primary| BACKUP | | BACKUP- | | 1455 | wait for | | | | RESERVED| | 1456 | grace period | +----------+ +---------+ | 1457 | | | | | 1458 | | | IP addr leased by | 1459 | Expired grace | secondary | 1460 | period exists V V | 1461 | | +----------+ | 1462 | | Lease on | ACTIVE | DHCPRELEASE | 1463 +-----+-IP addr---| |------------------+ 1464 expires +----------+ 1466 Figure 5.10-1: Transitions between binding-status values. 1468 (1) This transition MAY also occur if the server is in 1469 PARTNER-DOWN state and the MCLT has passed since the entry 1470 in the RELEASED, EXPIRED, or RESET states. 1472 (2) This transition MAY occur if the server is the secondary 1473 and the MCLT has passed since its entry into PARTNER-DOWN state. 1475 Again, note that a DHCP server implementing the failover protocol 1476 does not have to implement either this state machine or use these 1477 particular binding-status values in its normal operation of allocat- 1478 ing IP addresses to DHCP clients. It only needs to map its internal 1479 binding-status-values onto these "standard" binding-status values, 1480 and map these "standard" binding-status values back into its internal 1481 binding-status values. For example, a server which implements a 1482 grace period for a IP address binding SHOULD simply wait to update 1483 its partner server until the grace period on that binding has run 1484 out. 1486 The process of setting an IP address to FREE deserves some detailed 1487 discussion. When an IP address is moved to the EXPIRED,RELEASED, or 1488 RESET binding-status on a server, it will send a BNDUPD with the 1489 binding-status of EXPIRED, RELEASED, or RESET to its partner. If its 1490 partner agrees that is acceptable (see sections 7.1.2 and 7.1.3 con- 1491 cerning why a server might not accept a BNDUPD) it will return a 1492 BNDACK with no reject-reason, signifying that it accepted the update. 1493 As part of the BNDUPD processing, the server returning the BNDACK 1494 will set the binding-status of the IP address to FREE, and upon 1495 receipt of the BNDACK the server which sent the BNDUPD will set the 1496 binding-status of the IP address to FREE. Thus, the EXPIRED, 1497 RELEASED, or RESET binding-status is something of a transitory state. 1498 This process is encoded in the transition diagram above by "Comm 1499 w/Partner". 1501 5.11. DNS dynamic update considerations 1503 DHCP servers (and clients) can use DNS Dynamic Updates as described 1504 in [RFC 2136] to maintain DNS name-mappings as they maintain DHCP 1505 leases. Many different administrative models for DHCP-DNS integra- 1506 tion are possible. Descriptions of several of these models, and 1507 guidelines that DHCP servers and clients should follow in carrying 1508 them out, are laid out in [DDNS]. The nature of the DHCP failover 1509 protocol introduces some issues concerning dynamic DNS updates that 1510 are not part of non-failover DHCP environments. This section 1511 describes these issues, and defines the information which failover 1512 partners should exchange and the protocol which they should follow in 1513 order to ensure consistent behavior. The presence of this section 1514 should not be interpreted as requiring that implementations of the 1515 DHCP failover protocol must also support DDNS updates. The purpose 1516 of this discussion is to clarify the areas where the DHCP failover 1517 and DHCP-DDNS protocols intersect for the benefit of implementations 1518 which support both protocols, not to introduce a new requirement into 1519 the DHCP failover protocol. Thus, a DHCP server which implements the 1520 failover protocol MAY also support dynamic DNS updates, but if it 1521 does support dynamic DNS updates it SHOULD utilize the techniques 1522 described here in order to correctly distribute them between the 1523 failover partners. 1525 From the standpoint of the failover protocol, there is no reason why 1526 a server which is utilizing the DDNS protocol to update a DNS server 1527 should not be a partner with a server which is not utilizing the DDNS 1528 protocol to update a DNS server. However, a server which is not able 1529 to support DDNS or is not configured to support DDNS SHOULD output a 1530 warning message when it receives BNDUPD messages which indicate that 1531 its failover partner is configured to support the DDNS protocol to 1532 update a DNS server. An implementation MAY consider this an error 1533 and refuse to operate, or it MAY choose to operate anyway, having 1534 warned the user of the problem in some way. 1536 5.11.1. Relationship between failover and dynamic DNS update 1538 The failover protocol describes the conditions under which each fail- 1539 over server may renew a lease to its current DHCP client, and 1540 describes the conditions under which it may grant a lease to a new 1541 DHCP client. An analogous set of conditions determines when a fail- 1542 over server should initiate a DDNS update, and when it should attempt 1543 to remove records from the DNS. The failover protocol's conditions 1544 are based on the desired external behavior: avoiding duplicate 1545 address assignments; allowing clients to continue using leases which 1546 they obtained from one failover partner even if they can only commun- 1547 icate with the other partner; allowing the backup DHCP server to 1548 grant new leases even if it is unable to communicate with the primary 1549 server. The desired external DDNS behavior for DHCP failover servers 1550 is: 1552 1. Allow timely DDNS updates from the server which grants a 1553 client a lease. Recognize that there is often a DDNS update 1554 lifecycle which parallels the DHCP lease lifecycle. This is 1555 likely to include the addition of records when the lease is 1556 granted, and the removal of DNS records when the lease is sub- 1557 sequently made available for allocation to a different client. 1559 2. Communicate enough information between the two failover 1560 servers to allow one to complete the DDNS update 'lifecycle' 1561 even if the other server originally granted the lease. 1563 3. Avoid redundant or overlapping DDNS updates, where both fail- 1564 over servers are attempting to perform DDNS updates for the 1565 same lease-client binding. Avoid situations where one partner 1566 is attempting to add RRs related to a lease binding while the 1567 other partner is attempting to remove RRs related to the same 1568 lease binding. 1570 5.11.2. Use of the DDNS option 1572 In order for either server to be able to complete a DDNS update, or 1573 to remove DNS records which were added by its partner, both servers 1574 need to know the FQDN associated with the lease-client binding. The 1575 FQDN associated with the client's A RR and PTR RR SHOULD be communi- 1576 cated from the server which adds records into the DNS to its partner. 1577 The initiating server SHOULD use the DDNS option in the BNDUPD mes- 1578 sages to inform the partner server of the status of any DDNS updates 1579 associated with a lease binding. Failover servers MAY choose not to 1580 include the DDNS option in BNDUPD messages if there has been no 1581 change in the status of any DDNS update related to the lease binding. 1582 The partner server receiving BNDUPD messages containing the DDNS 1583 option SHOULD compare the status flags and the FQDN contained in the 1584 option data with the current DDNS information it has associated with 1585 the lease binding, and update its notion of the DDNS status accord- 1586 ingly. 1588 The initiating server MAY send a BNDUPD to its partner before the 1589 DDNS update has been successfully completed. If it does so, it SHOULD 1590 leave the 'C' bit in the Flags field clear, to indicate to the 1591 partner that the DDNS update may not be complete. When the DDNS 1592 update has been successfully acknowledged by the DNS server, the ini- 1593 tiating DHCP server SHOULD include the DDNS option in its next BNDUPD 1594 message about the binding, so that the partner server will be able to 1595 record the final status of the DDNS update. The initiating server 1596 SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc- 1597 cessfully accepted by the DNS server. 1599 Some implementations will choose to send a BNDUPD without waiting for 1600 the DDNS update to complete, and then will send a second BNDUPD once 1601 the DDNS update is complete. Other implementations will delay sending 1602 the partner a BNDUPD until the DDNS update has been acknowledged by 1603 the DNS server, or until some time-limit has elapsed, in order to 1604 avoid sending a second BNDUPD. 1606 The Domain Name field in the DDNS option contains the FQDN that will 1607 be associated with the A RR (if the server is performing an A RR 1608 update for the client) and the PTR RR. This FQDN may be composed in 1609 any of several ways, depending on server configuration and the infor- 1610 mation provided by the client in its DHCP messages. The client may 1611 supply a hostname which it would like the server to use in forming 1612 the FQDN, or it may supply the entire FQDN. The server may be config- 1613 ured to attempt to use the information the client supplies, it may be 1614 configured with an FQDN to use for the client, or it may be config- 1615 ured to synthesize an FQDN. The responsive server SHOULD include the 1616 FQDN that it will be using in DDNS updates it initiates when it sends 1617 the DDNS option. 1619 Since the responsive server may not have completed the DDNS update at 1620 the time it sends the first BNDUPD about the lease binding, there may 1621 be cases where the FQDN in later BNDUPD messages does not match the 1622 FQDN included in earlier messages. For example, the responsive 1623 server may be configured to handle situations where two or more DHCP 1624 client FQDNs are identical by modifying the most-specific label in 1625 the FQDNs of some of the clients in an attempt to generate unique 1626 FQDNs for them (a process sometimes called "disambiguation"). Alter- 1627 natively, at sites which use some or all of the information which 1628 clients supply to form the FQDN, it's possible that a client's confi- 1629 guration may be changed so that it begins to supply new data. The 1630 responsive server may react by removing the DNS records which it ori- 1631 ginally added for the client, and replacing them with records that 1632 refer to the client's new FQDN. In such cases, the responsive server 1633 SHOULD include the actual FQDN that was used in subsequent DDNS 1634 options. The responsive server SHOULD include relevant client-option 1635 data in the client-request-options option in its BNDUPD messages. 1636 This information may be necessary in order to allow the non- 1637 responsive partner to detect client configuration changes that change 1638 the hostname or FQDN data which the client includes in its DHCP 1639 requests. 1641 5.11.3. Adding RRs to the DNS 1643 A failover server which is going to perform DDNS updates SHOULD ini- 1644 tiate the DDNS update when it grants a new lease to a client. The 1645 non-responsive partner SHOULD NOT initiate a DDNS update when it 1646 receives the BNDUPD after the lease has been granted. The failover 1647 protocol ensures that only one of the partners will grant a lease to 1648 any individual client, so it follows that this requirement will 1649 prevent both partners from initiating updates simultaneously. The 1650 server initiating the update SHOULD follow the protocol in [DDNS]. 1651 The server may be configured to perform an A RR update on behalf of 1652 its clients, or not. Ordinarily, a failover server will not initiate 1653 DDNS updates when it renews leases. In two cases, however, a failover 1654 server MAY initiate a DDNS update when it renews a lease to its 1655 existing client: 1657 1. When the lease was granted before the server was configured to 1658 perform DDNS updates, the server MAY be configured to perform 1659 updates when it next renews existing leases. Since both 1660 servers are responsive to renewals in NORMAL state, it is not 1661 enough to simply require the non-responsive server to avoid a 1662 DNS update in this case. The server which would be responsive 1663 to a DHCPDISCOVER from this client (even though the current 1664 request is a DHCPREQUEST/RENEW) is the server which should 1665 initiate the DDNS update. 1667 2. If a server is in PARTNER-DOWN state, it can conclude that its 1668 partner is no longer attempting to perform an update for the 1669 existing client. If the remaining server has not recorded that 1670 an update for the binding has been successfully completed, the 1671 server MAY initiate a DDNS update. It MAY initiate this 1672 update immediately upon entry to PARTNER-DOWN state, it may 1673 perform this in the background, or it MAY initiate this update 1674 upon next hearing from the DHCP client. 1676 5.11.4. Deleting RRs from the DNS 1678 The failover server which makes an IP address FREE SHOULD initiate 1679 any DDNS deletes, if it has recorded that DNS records were added on 1680 behalf of the client. 1682 A server not in PARTNER-DOWN state "makes an IP address FREE" when it 1683 initiates a BNDUPD with a binding-status of FREE, EXPIRED, or 1684 RELEASED. Its partner confirms this status by acking that BNDUPD, 1685 and upon receipt of the ACK the server has "made the IP address 1686 FREE". Conversely, a server in PARTNER-DOWN state "makes an IP 1687 address FREE" when it sets the binding-status to FREE, since in 1688 PARTNER-DOWN state not communications is required with the partner. 1690 It is at this point that it should initiate the DDNS operations to 1691 delete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS 1692 deletes for DNS records related to the lease binding as part of send- 1693 ing the BNDACK message. The partner MAY have issued BNDUPD messages 1694 with a binding-status of FREE, EXPIRED, or RELEASED previously, but 1695 the other server will have NAKed these BNDUPD messages. 1697 The failover protocol ensures that only one of the two partner 1698 servers will be able to make a lease FREE. The server making the 1699 lease FREE may be doing so while it is in NORMAL communication with 1700 its partner, or it may be in PARTNER-DOWN state. If a server is in 1701 PARTNER-DOWN state, it may be performing DDNS deletes for RRs which 1702 its partner added originally. This allows a single remaining partner 1703 server to assume responsibility for all of the DDNS activity which 1704 the two servers were undertaking. 1706 Another implication of this approach is that no DDNS RR deletes will 1707 be performed while either server is in COMMUNICATIONS-INTERRUPTED 1708 state, since no IP addresses are moved into the FREE state during 1709 that period. 1711 5.12. Reservations and failover 1713 Some DHCP servers support a capability to offer specific pre- 1714 configured IP addresses to DHCP clients. These are real DHCP 1715 clients, they do the entire DHCP protocol, but these servers always 1716 offer the client a specific pre-configured IP address -- and they 1717 offer that IP address to no other clients. Such a capability has 1718 several names, but it is sometimes called a "reservation", in that 1719 the IP address is reserved for a particular DHCP client. 1721 In a situation where there are two DHCP servers serving the same sub- 1722 net without using failover, the two DHCP server's need to have dis- 1723 joint IP address pools, but identical reservations for the DHCP 1724 clients. 1726 In a failover context, both servers need to be configured with the 1727 proper reservations in an identical manner, but if we stop there 1728 problems can occur around the edge conditions where reservations are 1729 made for an IP address that has already been leased to a different 1730 client. Different servers handle this conflict in different ways, 1731 but the goal of the failover protocol is to allow correct operation 1732 with any server's approach to the normal processing of the DHCP pro- 1733 tocol. 1735 The general solution with regards to reservations is as follows. 1736 Whenever a reserved IP address becomes FREE (i.e., when first config- 1737 ured or whenever a client frees it or it expires or is reset), the 1738 primary server MUST show that IP address as FREE (and thus available 1739 for its own allocation) and it MUST send it to the secondary server 1740 as BACKUP-RESERVED, in order that the secondary server be able to 1741 allocate it as well. 1743 Note that this implies that a reserved IP address goes through the 1744 normal state changes from FREE to ACTIVE (and possibly back to FREE). 1745 The failover protcol supports this approach to reservations, i.e., 1746 where the IP address undergoes the normal state changes of any IP 1747 address, but it can only be offered to the client for which it is 1748 reserved. Other approaches to the support of reservations exist in 1749 some DHCP server implementations (e.g., where the IP address is 1750 apparently leased to a particular client forever, without any expira- 1751 tion). The goal is for the failover protocol to support any of the 1752 usual approaches to reservations, both those that allow an IP address 1753 to go through different states when reserved, and those that don't. 1755 From the above, it follows that a reservation soley on the secondary 1756 will not necessarily allow the secondary to offer that address to 1757 client to whom it is reserved. The reservation must also appear on 1758 the primary as well for the secondary to be able to offer the IP 1759 address to the client to which is is reserved. 1761 When the reservation on an IP address is cancelled, if the IP address 1762 is currently FREE and the server is the primary, or BACKUP and the 1763 server is the secondary, the server MUST send a BNDUPD to the other 1764 server with the binding-status FREE. 1766 5.13. Dynamic BOOTP and failover 1768 Some DHCP servers support a capability to offer IP addresses to BOOTP 1769 clients without having a particular address previously allocated for 1770 those clients. This capability is often called something like 1771 "dynamic BOOTP". It is discussed briefly in RFC 1534 [RFC 1534]. 1773 This capability has a negative interaction with the fundamental ele- 1774 ments of the failover protocol, in that an address handed out to a 1775 BOOTP device has no term (or effectively no term, in that usually 1776 they are considered leases for "forever"). There is no opportunity 1777 to hand out a lease which is only the MCLT long when first hearing 1778 from a BOOTP device, because they may only interact once with the 1779 DHCP server and they have no notion of a lease expiration time. Thus 1780 the entire concept of the MCLT and waiting the MCLT after entering 1781 PARTNER-DOWN state is defeated when dealing with BOOTP devices. 1783 With some restrictions, however, dynamic BOOTP devices can be sup- 1784 ported in a server on a subnet where failover is supported. The only 1785 restriction (and it is not small) is that on any portion of the sub- 1786 net (in any address pool) where dynamic BOOTP devices can be allo- 1787 cated IP addresses, a DHCP server MUST NOT ever use any of the IP 1788 addresses which were previously available for allocation by its fail- 1789 over partner. Thus, the addresses allocated by the primary to the 1790 secondary for allocation that might have been allocated to BOOTP dev- 1791 ices MUST NOT ever be used by the primary server even if it is in 1792 PARTNER-DOWN state and has waited the MCLT after entering that state. 1793 Conversely, addresses available for allocation by the primary MUST 1794 NOT be used by the secondary even it is in PARTNER-DOWN state. The 1795 reason for this is because one of those IP address could have been 1796 allocated by the secondary server to a BOOTP device, and the primary 1797 server would have no way of ever knowing that happened. 1799 5.14. Guidelines for selecting MCLT 1801 There is no one correct value for the MCLT. There is an explicit 1802 tradeoff between various factors in selecting an MCLT value. 1804 5.14.1. Short MCLT 1806 A short MCLT value will mean that after entering PARTNER-DOWN state, 1807 a server will only have to wait a short time before it can start 1808 allocating its partner's IP addresses to DHCP clients. Furthermore, 1809 it will only have to wait a short time after the expiration of a 1810 lease on an IP address before it can reallocate that IP address to 1811 another DHCP client. 1813 However the downside of a short MCLT value is that the initial lease 1814 interval that will be offered to every new DHCP client will be short, 1815 which will cause increased traffic as those clients will need to send 1816 in their first renew in a half of a short MCLT time. In addition, 1817 the lease extensions that a server in COMMUNICATIONS-INTERRUPTED 1818 state can give will be only the MCLT after the server has been in 1819 COMMUNICATIONS-INTERRUPTED for around the desired client lease 1820 period. If a server stays in COMMUNICATIONS-INTERRUPTED for that 1821 long, then the leases it hands out will be short and that will 1822 increase the load on that server, possibly causing difficulty. 1824 5.14.2. Long MCLT 1826 A long MCLT value will mean that the initial lease period will be 1827 longer and the time that a server in COMMUNICATIONS-INTERRUPTED state 1828 will be able to extend leases (after it has been in COMMUNICATIONS- 1829 INTERRUPTED state for around the desired client lease period) will be 1830 longer. 1832 However, a server entering PARTNER-DOWN state will have to wait the 1833 longer MCLT before being able to allocate its partner's IP addresses 1834 to new DHCP clients. This may mean that additional IP addresses are 1835 required in order to cover this time period. Further, the server in 1836 PARTNER-DOWN will have to wait the longer MCLT from every lease 1837 expiration before it can reallocate an IP address to a different DHCP 1838 client. 1840 6. Common Message Format 1842 This section discusses the common message format that all failover 1843 messages have in common, including the message header format as well 1844 as the common option format. See section 12 for the the definitions 1845 of the specific options used in the failover protocol. 1847 6.1. Message header format 1849 The options contained in the payload data section of the failover 1850 message all use a two byte option number and two byte length format. 1852 All failover protocol messages are sent over the TCP connection 1853 between failover endpoints and encoded using a message format 1854 specific to the failover protocol. 1856 There exists a common message format for all failover messages, which 1857 utilizes the options in a way similar to the DHCP protocol. For each 1858 message type, some options are required and some are optional. In 1859 addition, when a message is received any options that are not under- 1860 stood by the receiving server MUST be ignored. 1862 All of the fields in the fixed portion of the message MUST be filled 1863 with correct data in every message sent. 1865 0 1 2 3 1866 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1867 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1868 | message length (2) | msg type (1) |payload off (1)| 1869 +---------------+---------------+---------------+---------------+ 1870 | time (4) | 1871 +---------------------------------------------------------------+ 1872 | xid (4) | 1873 +---------------------------------------------------------------+ 1874 | 0 or more additional header bytes (variable) | 1875 +---------------------------------------------------------------+ 1876 | payload data (variable) | 1877 | | 1878 | formatted as DHCP-style options | 1879 | using a two byte option code and two byte length | 1880 | See section 6.2 for details. | 1881 +---------------------------------------------------------------+ 1883 message length - 2 bytes, network byte order 1885 This is the length of the message. It includes the two byte message 1886 length itself. The maximum length is 2048 bytes. The minimum length 1887 is 12. 1889 msg type - 1 byte 1891 The message type field is used to distinguish between messages. 1893 The following message types are defined: 1895 Value Message Type 1896 ----- ------------ 1897 0 reserved not used 1898 1 POOLREQ request allocation of addresses 1899 2 POOLRESP respond with allocation count 1900 3 BNDUPD update partner with binding info 1901 4 BNDACK acknowledge receipt of binding update 1902 5 CONNECT establish connection with the secondary 1903 6 CONNECTACK respond to attempt to establish connection with partner 1904 7 UPDREQALL request full transfer of binding info 1905 8 UPDDONE ack send and ack of req'd binding info 1906 9 UPDREQ req transfer of un-acked binding info 1907 10 STATE inform partner of current state or state change 1908 11 CONTACT probe communications integrity with partner 1909 12 DISCONNECT close a connection 1911 New message types should be defined in one of two ranges, 0-127 or 1912 129-255. The range of 0-127 is used for messages that MUST be sup- 1913 ported by every server, and if a server receives a message in the 1914 range of 0-127 that it doesn't understand, it MUST close the TCP con- 1915 nection. The range of 128-255 is used for messages which MAY be sup- 1916 ported but are not required, and if a server receives a message in 1917 this range that it does not understand it SHOULD ignore the message. 1919 payload offset - 1 byte 1921 The byte offset of the Payload Data, from the beginning of the 1922 failover message header. The value for the current protocol version 1923 (version 1) is 8. 1925 time - 4 bytes, network byte order 1927 The absolute time in GMT when the message was transmitted, 1928 represented as seconds elapsed since Jan 1, 1970 (i.e., similar to 1929 the ANSI C time_t time value representation). While the ANSI C 1930 time_t value is signed, the value used in this specification is 1931 unsigned. 1933 A server SHOULD set this time as close to the actual transmission of 1934 the message as possible. 1936 xid - 4 bytes, network byte order 1938 This is the transaction id of the failover message. The sender of a 1939 failover protocol message is responsible for setting this number, and 1940 the receiver of the message copies the number over into any response 1941 message, treating it as opaque data. The sender MUST ensure that 1942 every message sent from a particular failover endpoint over the 1943 associated TCP connection has a unique transaction id. 1945 For failover messages that have no corresponding response message, 1946 the XID value is meaningless, but MUST be supplied. The XID value is 1947 used solely by the receiver of a response message to determine the 1948 corresponding request message. 1950 Requests messages where the XID is used in the corresponding response 1951 messages are: POOLREQ, BNDUPD, CONNECT, UPDREQALL, and UPDREQ. The 1952 corresponding response messages are POOLRESP, BNDACK, CONNECTACK, 1953 UPDDONE, and UPDDONE, respectively. 1955 As requests/responses don't survive connection reestablishment, XIDs 1956 only need to be unique during a specific connection. 1958 payload data - variable length 1960 The options are placed after the header, after skipping payload 1961 offset bytes from beginning of the message. The payload data options 1962 are not preceded by a "cookie" value. 1964 The payload data is formatted as DHCP style options using two byte 1965 option codes and two byte option lengths. The option codes are in a 1966 namespace which is unique to the failover protocol. 1968 The maximum length of the payload data in octets is 2048 less the 1969 size of the header, i.e., the maximum message length is 2048 octets. 1971 6.2. Common option format 1973 The options contained in the payload data section of the failover 1974 message all use a two byte option number and two byte length format. 1976 The option numbers are drawn from an option number space unique to 1977 the failover protocol. All of the message types share a common 1978 option number space and common options definitions, though not all 1979 options are required or meaningful for every message. 1981 In contrast to the options which appear in DHCP client and server 1982 messages, the options in failover message are ordered. That is, for 1983 some messages the order in which the options appear in the payload 1984 data area is significant. The messages for which option ordering is 1985 significant explicitly describe the ordering requirements. If no 1986 ordering requirements are mentioned, then the order is not signifi- 1987 cant for that message. 1989 For all options which refer to time, they all use an absolute time in 1990 GMT. Time synchronization has already been achieved between the 1991 source and the target server using the CONNECT message and is updated 1992 and refined using the time in every packet. 1994 The time value is an unsigned 32 bit integer in network byte order 1995 giving the number of seconds since 00:00 UTC, 1st January 1970. This 1996 can be converted to an NTP timestamp by adding decimal 2208988800. 1997 This time format will not wrap until the year 2106. Until sometime 1998 in 2038, it is equal to the ANSI C time_t value (which is a signed 32 1999 bit value and will overflow into a negative number in 2038). 2001 Options should appear once only in each message (except for BNDUPD 2002 and BNDACK messages where bulking is used, see section 6.3 for 2003 details.) An option that appears twice is not concatenated, but 2004 treated as an error. 2006 Specific option values are described in section 12. 2008 See section 13 for how to define additional options. 2010 6.3. Batching multiple binding update transactions in one BNDUPD mes- 2011 sage 2013 Implementations of this protocol MAY send multiple binding update 2014 transactions in one BNDUPD message, where a binding update transac- 2015 tion is defined as the set of options which are associated with the 2016 update of a single IP address. All implementations of this protocol 2017 MUST be prepared to receive BNDUPD messages which contain multiple 2018 binding update transactions and respond correctly to them, including 2019 replying with a BNDACK message which contains status for the multiple 2020 binding update transactions contained in the BNDUPD message. 2022 In the discussion of sending and receiving BNDUPD messages in section 2023 7.1 and BNDACK messages in section 7.2, each BNDUPD message and 2024 BNDACK message is assumed to contain a single binding update transac- 2025 tion in order to reduce the complexity of the discussions in section 2026 7. 2028 Multiple binding update transactions MAY be batched together in one 2029 BNDUPD protocol message with the data sets for the individual tran- 2030 sactions delimited by the assigned-IP-address option, which MUST 2031 appear first in the option set for each transaction. Ordering of 2032 options between the assigned-IP-address options is not significant. 2033 This is illustrated in the following schematic representation: 2035 Non-IP Address/Non-client specific options first 2036 assigned-IP-address option for the first IP address 2037 Options pertaining to first address, including 2038 at least the binding-status option and others as 2039 required. 2040 assigned-IP-address option for the second IP address 2041 Options pertaining to second address, including 2042 at least the binding-status option and others as 2043 required. 2044 ... 2045 Trailing options (message digest). 2047 There MUST be a one-to-one correspondence between BNDUPD and BNDACK 2048 messages, and every BNDACK message MUST contain status for all of the 2049 binding update transactions in the corresponding BNDUPD message. 2051 The BNDACK message corresponding to a BNDUPD message MUST contain 2052 assigned-IP-address options for all of the binding update transac- 2053 tions in the BNDUPD message. Thus, every BNDACK message contains 2054 exactly the same assigned-IP-address options as does its correspond- 2055 ing BNDUPD message. The order of the assigned-IP-address options 2056 MAY, however, be different. Here is a schematic representation of a 2057 BNDACK: 2059 Non-IP Address/Non-client specific options first 2060 assigned-IP-address option for the first IP address 2061 If rejected, reject-reason option and message option. 2062 assigned-IP-address option for the second IP address 2063 If rejected, reject-reason option and message option. 2064 ... 2065 Trailing options (message digest). 2067 In case the server chooses to reject some or all of the IP address 2068 binding information in a BNDUPD message in a BNDACK reply, the BNDACK 2069 message MUST contain a reject-reason option following every 2070 assigned-IP-address option in order to indicate that the binding 2071 update transaction for that IP address was not accepted and why. As 2072 with a BNDACK message containing a single binding update transaction, 2073 an assigned-IP-address option without any associated reject-reason 2074 option indicates a successful binding update transaction. 2076 7. Protocol Messages 2078 This section contains the detailed definition of the protocol mes- 2079 sages, including the information to include when sending the message, 2080 as well as the actions to take upon receiving the message. The mes- 2081 sage type for each message appears as [n] in the heading for the mes- 2082 sage (see section 6.1). 2084 7.1. BNDUPD message [3] 2086 The binding update (BNDUPD) message is used to send the binding data- 2087 base changes (known as binding update transactions) to the partner 2088 server, and the partner server responds with a binding acknowledge- 2089 ment (BNDACK) message when it has successfully committed those 2090 changes to its own stable storage. 2092 The rest of the failover protocol exists to determine whether the 2093 partner server is able to communicate or not, and to enable the 2094 partners to exchange BNDUPD/BNDACK messages in order to keep their 2095 binding databases in stable storage synchronized. 2097 The rest of this section is written as though every BNDUPD message 2098 contains only a single binding update transaction in order to reduce 2099 the complexity of the discussion. See section 6.3 for information on 2100 how to create and process BNDUPD and BNDACK messages which contain 2101 multiple binding update transactions. Note that while a server MAY 2102 generate BNDUPD messages with multiple binding update transactions, 2103 every server MUST be able to process a BNDUPD message which contains 2104 multiple binding update transactions and generate the corresponding 2105 BNDACK messages with status for multiple binding update transactions. 2107 The following table summarizes the various options for the BNDUPD 2108 message. 2110 binding-status BACKUP 2111 RESET 2112 ABANDONED 2113 Option ACTIVE EXPIRED RELEASED FREE 2114 ------ ------ ------- -------- ---- 2115 assigned-IP-address (3) MUST MUST MUST MUST 2116 binding-status MUST MUST MUST MUST 2117 client-identifier MAY MAY MAY MAY(2) 2118 client-hardware-address MUST MUST MUST MAY(2) 2119 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 2120 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 2121 start-time-of-state SHOULD SHOULD SHOULD SHOULD 2122 client-last-trans.-time MUST SHOULD MUST MAY 2123 DDNS(1) SHOULD SHOULD SHOULD SHOULD 2124 client-request-options SHOULD SHOULD NOT SHOULD SHOULD NOT 2125 client-reply-options SHOULD SHOULD NOT SHOULD NOT SHOULD NOT 2127 (1) MUST if server is performing dynamic DNS for this IP address, else 2128 MUST NOT. 2129 (2) MUST NOT if binding-status is ABANDONED. 2130 (3) assigned-IP-address MUST be the first option for an IP address 2132 Table 7.1-1: Options used in a BNDUPD message 2134 7.1.1. Sending the BNDUPD message 2136 A BNDUPD message SHOULD be generated whenever any binding changes. A 2137 change might be in the binding-status, the lease-expiration-time, or 2138 even just the last-transaction-time. In general, any time a DHCP 2139 server writes its stable storage, a BNDUPD message SHOULD be gen- 2140 erated. This will often be the result of the processing of a DHCP 2141 client request, but it might also be the result of a successful 2142 dynamic DNS update operation. 2144 BNDUPD (and BNDACK) messages refer to the binding-status of the IP 2145 address, and this protocol defines a series of binding-statuses, dis- 2146 cussed in more detail below. Some servers may not support all of 2147 these binding-statuses, and so in those cases they will not be sent. 2148 Upon receipt of a BNDUPD message which contains an unsupported 2149 binding-status, a reasonable interpretation should be made (see sec- 2150 tion 5.10). 2152 All BNDUPD messages MUST contain the IP address of the binding update 2153 transaction in the assigned-IP-address option. 2155 All binding update transactions contain a binding-status option, and 2156 it will have one of the values found in section 5.10. Client infor- 2157 mation consists of client-hardware-address and possibly a client- 2158 identifier, and is explained in more detail later in this section. 2159 The following table indicates whether client information should or 2160 should not appear with each binding-status in a binding update tran- 2161 saction: 2163 binding-status includes client information 2164 ------------------------------------------------ 2165 ACTIVE MUST 2166 EXPIRED SHOULD 2167 RELEASED SHOULD 2168 FREE MAY 2169 ABANDONED MUST NOT 2170 RESET MAY 2171 BACKUP MAY 2173 Table 7.1.1-1: Client information required by various 2174 binding-status values. 2176 The ACTIVE binding-status requires some options to indicate the 2177 length of the binding: 2179 o lease-expiration-time 2181 The lease-expiration-time option MUST appear, and be set to the 2182 expiration time most recently ACKed to the DHCP client. Note 2183 that the time ACKed to a DHCP client is a lease duration in 2184 seconds, while the lease-expiration-time option in a BNDUPD mes- 2185 sage is an absolute time value. 2187 o potential-expiration-time 2189 The potential-expiration-time option MUST appear, and be set to 2190 a value beyond that of the lease-expiration time. This is the 2191 value that is ACKed by the BNDACK message. A server sending a 2192 BNDUPD message MUST be able to recover the potential- 2193 expiration-time sent in every BNDUPD, not just those that 2194 receive a corresponding BNDACK, in order to be able to protect 2195 against possible duplicate allocation of IP addresses after 2196 transitioning to PARTNER-DOWN state. See section 5.2.1 for 2197 details as to why the potential-expiration-time exists and 2198 guidelines for how to decide on the value. 2200 The following option information applies to all BNDUPD messages, 2201 regardless of the value of the binding-status, unless otherwise 2202 noted. 2204 o Identifying the client 2206 For many of the binding-status values a client MUST appear while 2207 for others a client MAY appear, and for some a client MUST NOT 2208 appear. 2210 A client is identified in a BNDUPD message by at least one and pos- 2211 sibly two options. The client-hardware-address option MUST appear 2212 any time that a client appears in a BNDUPD message, and contains 2213 the hardware type and chaddr information from the DHCP request 2214 packet. A failover client-identifier option MUST appear any time 2215 that a client appears in a BNDUPD message if and only if that 2216 client used a DHCP client-identifier option when communicating with 2217 the DHCP server. See section 12.5 and 12.4 for details of how to 2218 construct these two options from a DHCP request packet. 2220 o start-time-of-state 2222 The start-time-of-state SHOULD appear. It is set to the time at 2223 which this IP address first took on the state that corresponds to 2224 the current value of binding-status. 2226 o last-transaction-time 2228 The last-transaction-time value SHOULD appear. This is the time at 2229 which this DHCP server last received a packet from the DHCP client 2230 referenced by the client-identifier or client-hardware-address that 2231 was associated with the IP address referenced by the assigned-IP- 2232 address. 2234 o DDNS 2236 If the DHCP server is performing dynamic DNS operations on behalf 2237 of the DHCP client represented by the client-identifier or client- 2238 hardware-address, then it should include a DDNS option containing 2239 the domain name and status of any dynamic DNS operations enabled. 2241 o client-request-options 2243 If the BNDUPD was triggered by a request from a DHCP client (typi- 2244 cally those with binding-status of ACTIVE and RELEASED), then the 2245 server SHOULD include options of interest to a failover partner 2246 from the client's request packet in the client-request-options for 2247 transmission to its partner (see section 12.8). 2249 A server sending a BNDUPD SHOULD remember the "interesting" options 2250 or the information that would appear in an "interesting" option for 2251 transmission at a time when the BNDUPD is not closely associated 2252 with a DHCP client request. 2254 A server SHOULD send the following "interesting" options. It MAY 2255 send any DHCP client options. As new options are defined, the RFC 2256 defining these options SHOULD include information that they are 2257 "interesting to failover servers" if they should be sent as part of 2258 a BNDUPD. 2260 option option 2261 number name 2262 ----------------------------------------- 2264 12 host-name 2265 81 client-FQDN [DDNS] 2266 82 relay-agent-information [AGENTINFO] 2267 TBD user-class [USERCLASS] 2268 60 vendor-class-identifier 2270 Table 7.1.1-2: Options which SHOULD be sent in 2271 the client-request-options option in a BNDUPD message. 2273 o client-reply-options 2275 If the BNDUPD was triggered by a request from a DHCP client (typi- 2276 cally those with binding-status of ACTIVE and RELEASED), then the 2277 server SHOULD include options of interest to a failover partner 2278 from the server's DHCP reply packet in the client-reply-options for 2279 transmission to its partner (see section 12.7). 2281 A server sending a BNDUPD SHOULD remember the "interesting" options 2282 or the information that would appear in an "interesting" option for 2283 transmission at a time when the BNDUPD is not closely associated 2284 with a DHCP client request. 2286 A server SHOULD send the following "interesting" options. It MAY 2287 send any DHCP client options. As new options are defined, the RFC 2288 defining these options SHOULD include information that they are 2289 "interesting to failover servers" if they should be sent as part of 2290 a BNDUPD. 2292 option option 2293 number name 2294 ----------------------------------------- 2296 58 renewal-time 2297 59 rebinding-time 2299 Table 7.1.1-3: Options which SHOULD be sent in 2300 the client-reply-options option in a BNDUPD message. 2302 The BNDUPD message SHOULD be sent as soon as possible from the time 2303 that the DHCP client received a response and the lease bindings data- 2304 base is written on stable storage. 2306 7.1.2. Receiving the BNDUPD message 2308 When a server receives a BNDUPD message, it needs to decide how to 2309 process the binding update transaction it contains and whether that 2310 transaction represents a conflict of any sort. The conflict resolu- 2311 tion process MUST be used on the receipt of every BNDUPD message, not 2312 just those that are received while in POTENTIAL-CONFLICT state, in 2313 order to increase the robustness of the protocol. 2315 There are three sorts of conflicts: 2317 o Two clients, one IP address conflict 2319 This is the duplicate IP address allocation conflict. There are 2320 two different clients each allocated the same address. See sec- 2321 tion 7.1.3 for how to resolve this conflict. 2323 o Two IP addresses, one client conflict 2325 This conflict exists when a client on one server is associated 2326 with a one IP address, and on the other server with a different 2327 IP address in the same or a related subnet. This does not refer 2328 to the case where a single client has addresses in multiple dif- 2329 ferent subnets or administrative domains, but rather the case 2330 where on the same subnet the client has as lease on one IP 2331 address in one server and on a different IP address on the other 2332 server. 2334 This conflict may or may not be a problem for a given DHCP 2335 server implementation. In the event that a DHCP server requires 2336 that a DHCP client have only one outstanding lease for an IP 2337 address on one subnet, this conflict should be resolved by 2338 accepting the update which has the latest client-last- 2339 transaction-time. 2341 o binding-status conflict 2343 This is normal conflict, where one server is updating the other 2344 with newer information. See section 7.1.3 for details of how to 2345 resolve these conflicts. 2347 7.1.3. Deciding whether to accept the binding update transaction in a 2348 BNDUPD message 2350 IP addresses undergo binding status changes for several reasons, 2351 including receipt and processing of DHCP client requests, administra- 2352 tive inputs and receipt of BNDUPD messages. Every DHCP server needs 2353 to respond to DHCP client requests and administrative inputs with 2354 changes to its internal record of the binding-status of an IP 2355 address, and this response is not in the scope of the failover proto- 2356 col. However, the receipt of BNDUPD messages implies at least a pos- 2357 sible change of the binding-status for an IP address, and must be 2358 discussed here. See section 7.1.2 for general actions to take upon 2359 receipt of a BNDUPD message. 2361 When receiving a BNDUPD message, it is important to note that it may 2362 not be current, in that the server receiving the BNDUPD message may 2363 have had a more recent interaction with the DHCP client than its 2364 partner who sent the BNDUPD message. In this case, the receiving 2365 server MUST reject the BNDUPD message. In addition, it is worth not- 2366 ing that two (and possibly three) binding-status values are the 2367 direct result of interaction with a DHCP client, ACTIVE and RELEASED 2368 (and possibly ABANDONED). All other binding-status values are either 2369 the result of the expiration of a time period or interaction with an 2370 external agency (e.g., a network administrator). 2372 Every BNDUPD message SHOULD contain a client-last-transaction-time 2373 option, which MUST, if it appears, be the time that the server last 2374 interacted with the DHCP client. It MUST NOT be, for instance, the 2375 time that the lease on an IP address expired. If there has been no 2376 interaction with the DHCP client in question (or there is no DHCP 2377 client presently associated with this IP address), then there will be 2378 no client-last-transaction-time option in the BNDUPD message. 2380 The list in Figure 7.1.3-1 is indexed by the binding-status that a 2381 server receives in a BNDUPD message. In many cases, the binding- 2382 status of an IP address within the receiving server's data storage 2383 will have an affect upon the checks performed prior to accepting the 2384 new binding-status in a BNDUPD message. 2386 In Figure 7.1.3-1, to "accept" a BNDUPD means to update the server's 2387 bindings database with the information contained in the BNDUPD and 2388 once that update is complete, send a BNDACK message corresponding to 2389 the BNDUPD message. To "reject" a BNDUPD means to respond to the 2390 BNDUPD with a BNDACK with a reject-reason option included. 2392 When interpreting the rules in the following list, if a BNDUPD 2393 doesn't have a client-last-transaction-time value, then it MUST NOT 2394 be considered later than the client-last-transaction-time in the 2395 receiving server's binding. If the BNDUPD contains a client-last- 2396 transaction-time value and the receiving server's binding does not, 2397 then the client-last-transaction-time value in the BNDUPD MUST be 2398 considered later than the server's. 2400 The second rule concerns clients and IP addresses. If the clients in 2401 a BNDUPD message and in a receiving server's binding differ, then if 2402 the receiving server's binding-status is ACTIVE and the binding- 2403 status in the BNDUPD is ACTIVE, then if the receiving server is a 2404 secondary server accept it, else reject it. 2406 binding-status in received BNDUPD 2407 binding-status 2408 in receiving FREE RESET 2409 server ACTIVE EXPIRED RELEASED BACKUP ABANDONED 2411 ACTIVE accept time(2) time(1) time(2) accept 2412 EXPIRED time(1) accept accept accept accept 2413 RELEASED time(1) time(1) accept accept accept 2414 FREE/BACKUP accept accept accept accept accept 2415 RESET time(3) accept accept accept accept 2416 ABANDONED reject reject reject reject accept 2418 time(1): If the client-last-transaction-time in the BNDUPD 2419 is later than the client-last-transaction-time in the 2420 receiving server's binding, accept it, else reject it. 2422 time(2): If the current time is later than the receiving 2423 servers' lease-expiration-time, accept it, else reject it. 2425 time(3): If the client-last-transaction-time in the BNDUPD 2426 is later than the start-time-of-state in the receiving server's 2427 binding, accept it, else reject it. 2429 Figure 7.1.3-1: Accepting BNDUPD messages 2431 7.1.4. Accepting the BNDUPD message 2433 When accepting a BNDUPD message, the information contained in the 2434 client-request-options and client-reply-options SHOULD be examined 2435 for any information of interest to this server. For instance, a 2436 server which wished to detect changes in client specified host names 2437 might want to examine and save information from the host-name or 2438 client-FQDN options. Servers which expect to utilize information 2439 from the relay-agent-information option would want to store this 2440 information. 2442 7.1.5. Time values related to the BNDUPD message 2444 There are four time values that MAY be sent in a BNDUPD message. 2446 o lease-expiration-time 2448 The time that the server gave to the client, i.e., the time that 2449 the server believes that the client's lease will expire. 2451 o potential-expiration-time 2453 The time that the server wants to be sure its partner waits 2454 (added to the MCLT) before assuming that this lease has expired. 2455 Typically some time beyond the desired client lease time. 2457 o client-last-transaction-time 2459 The time that the client last interacted with this server. 2461 o start-time-of-state 2463 The time at which the binding first went into the current state. 2465 As discussed in section 5.2, each server knows what its partner has 2466 ACKed with regard to potential-expiration time. In addition, each 2467 server needs to remember what it has told its partner as the 2468 potential-expiration-time. Moreover, each server must remember what 2469 it has acked to the *other* server as the most recent potential- 2470 expiration-time from that server. 2472 Remember that each server sends a potential-expiration-time and 2473 receives an ACK for that as well as receiving a potential- 2474 expiration-time and needing to remember what it has acked for that. 2476 While they don't have to be named in any particular way, the times 2477 that a server needs to remember for every IP address in order to 2478 implement the failover protocol are: 2480 o lease-expiration-time 2482 The time that a server gave to the DHCP client. A DHCP server 2483 needs to remember this time already, just to be a DHCP server. 2484 A server SHOULD update this time with the lease-expiration time 2485 received from a partner in a BNDUPD if the received lease- 2486 expiration time is later than the lease-expiration time recorded 2487 for this binding. 2489 o sent-potential-expiration-time 2491 The latest time sent to the partner for a potential-expiration- 2492 time. 2494 o acked-potential-expiration-time 2496 The latest time that the partner has acked for a potential 2497 expiration time. Typically the same as sent-potential- 2498 expiration-time if there is not a BNDUPD outstanding. 2500 o received-potential-expiration-time 2502 The latest time that this server has ever received as a 2503 potential-expiration-time from its partner in a BNDUPD that this 2504 server ACKed. 2506 So, a server has to remember two additional times concerning BNDUPD 2507 messages that it has initiated, and one additional time concerning 2508 BNDUPD message that it has received. How are these times used? 2510 First, let's look at the time that a DHCP server can offer to a DHCP 2511 client. A server can offer to a DHCP client a time that is no longer 2512 than the MCLT beyond the max( received-potential-expiration-time, 2513 acked-potential-expiration-time). One might think that the server 2514 should be able to offer only the MCLT beyond the acked-potential- 2515 expiration-time, and while that is certainly simple and easy to 2516 understand, it has negative consequences in actual operation. 2518 To illustrate this, in the simple case where the primary updates the 2519 secondary for a while and then fails, if the secondary can then renew 2520 the client for only the MCLT beyond the acked-potential-expiration- 2521 time, then the secondary will only be able to renew the client for 2522 the MCLT, because the secondary has never sent a BNDUPD packet to the 2523 primary concerning this IP address and client, and so its acked- 2524 potential-expiration-time is zero. 2526 However, since the secondary is allowed to renew the client with the 2527 MCLT beyond the max( received-potential-expiration-time, acked- 2528 potential-expiration-time), then the secondary can usually renew the 2529 client for the full lease period, at least for the first renew it 2530 sees from the client, since the received-potential-expiration-time is 2531 generally longer than the client's desired lease interval. The 2532 difference in renew times could make a big difference in server load 2533 on the secondary in this case. 2535 What are the consequences of allowing a server to offer a DHCP client 2536 a lease term of the MCLT beyond the max( received-potential- 2537 expiration-time, acked-potential-expiration-time)? The consequences 2538 appear whenever a server enters PARTNER-DOWN state, and affect how 2539 long that server has to wait before reallocating expired leases. 2540 With this approach, when a server goes into PARTNER-DOWN state, it 2541 must wait the MCLT beyond the max( lease-expiration-time, sent- 2542 potential-expiration-time, acked-potential-expiration-time, 2543 received-potential-expiration-time ) for each IP address before it 2544 can reallocate that IP address to another DHCP client. One might 2545 normally think that it needed to wait only the MCLT beyond the max( 2546 lease-expiration-time, received-potential-expiration-time ), i.e., 2547 beyond what it has told the client and what it has explicitly acked 2548 to the other server. But with the optimization discussed above -- 2549 where either server can offer the DHCP client a lease term of the 2550 MCLT beyond the max( received-potential-expiration-time, acked- 2551 potential-expiration-time), then the additional times sent- 2552 potential-expiration-time and acked-potential-expiration-time must be 2553 added into the expression, since the partner could have used those 2554 times as part of its own lease time calculation. 2556 Thus this optimization may require a longer waiting time when enter- 2557 ing PARTNER-DOWN state, but will generally allow servers to operate 2558 considerably more effectively when running in COMMUNICATIONS- 2559 INTERRUPTED state. 2561 7.2. BNDACK message [4] 2563 A server sends a binding acknowledgement (BNDACK) message when it has 2564 processed a BNDUPD message and after it has successfully committed to 2565 stable storage any binding database changes made as a result of pro- 2566 cessing the BNDUPD message. A BNDACK message is used to both accept 2567 or reject a BNDUPD message. A BNDACK message which contains a 2568 reject-reason option is a rejection of the corresponding BNDUPD mes- 2569 sage. 2571 In order to reduce the complexity of the discussion, the rest of this 2572 section is written as though every BNDUPD message contains only a 2573 single binding update transaction and thus every corresponding BNDACK 2574 message would also contain reply information about only a single 2575 binding update transaction. See section 6.3 for information on how 2576 to create and process BNDUPD and BNDACK messages which contain multi- 2577 ple binding update transactions. 2579 Note that while a server MAY generate BNDUPD messages with multiple 2580 binding update transactions, every server MUST be able to process a 2581 BNDUPD message which contains multiple binding update transactions 2582 and generate the corresponding BNDACK messages with status for multi- 2583 ple binding update transactions. If a server does not ever create 2584 BNDUPD messages which contain multiple binding update transactions, 2585 then it does not need to be able to process a received BNDACK message 2586 with multiple binding update transactions. However, all servers MUST 2587 be able to create BNDACK messages which deal with multiple binding 2588 update transactions received in a BNDUPD message. 2590 Every BNDUPD message that is received by a server MUST be responded 2591 to with a corresponding BNDACK message. The receiving server SHOULD 2592 respond quickly to every BNDUPD message but it MAY choose to respond 2593 preferentially to DHCP client requests instead of BNDUPD messages, 2594 since there is no absolute time period within which a BNDACK must be 2595 sent in response to a BNDUPD message, while DHCP clients frequently 2596 have strict time constraints. 2598 A BNDACK message can only be sent in response to a BNDUPD message 2599 using the same TCP connection from which the BNDUPD message was 2600 received, since the XID's in BNDUPD messages are guaranteed unique 2601 only during the life of a single TCP connection. When a connection 2602 to a partner server goes down, a server with unprocessed BNDUPD mes- 2603 sages MAY simply drop all of those messages, since it can be sure 2604 that the partner will resend them when they are next in communica- 2605 tions (albeit with a different XID), or it MAY instead choose to pro- 2606 cess those BNDUPD messages, but it MUST NOT send any BNDACK messages 2607 in response. 2609 The following table summarizes the options for the BNDACK message. 2611 binding-status BACKUP 2612 RESET 2613 ABANDONED 2614 Option ACTIVE EXPIRED RELEASED FREE 2615 ------ ------ ------- -------- ---- 2616 assigned-IP-address (3) MUST MUST MUST MUST 2617 binding-status MUST MUST MUST MUST 2618 client-identifier MAY MAY MAY MAY(2) 2619 client-hardware-address MUST MUST MUST MAY(2) 2620 reject-reason MAY MAY MAY MAY 2621 message MAY MAY MAY MAY 2622 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 2623 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 2624 start-time-of-state SHOULD SHOULD SHOULD SHOULD 2625 client-last-trans.-time SHOULD SHOULD SHOULD MAY 2626 DDNS(1) SHOULD SHOULD SHOULD SHOULD 2628 (1) MUST if server is performing dynamic DNS for this IP address, else 2629 MUST NOT. 2630 (2) MUST NOT if binding-status is ABANDONED. 2631 (3) assigned-IP-address MUST be the first option for an IP address 2633 Table 7.2-1: Options used in a BNDACK message 2635 7.2.1. Sending the BNDACK message 2637 The BNDACK message MUST contain the same xid as the corresponding 2638 BNDUPD message. 2640 The assigned-IP-address option from the BNDUPD message MUST be 2641 included in the BNDACK message. Any additional options from the 2642 BNDUPD message SHOULD NOT appear in the BNDACK message. Note that 2643 any information sent in options (e.g, a later lease-expiration time) 2644 in the BNDACK message MUST NOT be assumed to necessarily be recorded 2645 in the stable storage of the server who receives the BNDACK message 2646 because there is no corresponding ACK of the BNDACK message. Any 2647 information that SHOULD be recorded in the partner server's stable 2648 storage MUST be transmitted in a subsequent BNDUPD. 2650 If the server is accepting the BNDUPD, the BNDACK message includes 2651 only the assigned-IP-address option. If the server is rejecting the 2652 BNDUPD, the additional option reject-reason MUST appear in the BNDACK 2653 message, and the message option SHOULD appear in this case containing 2654 a human-readable error message describing in some detail the reason 2655 for the rejection of the BNDUPD message. 2657 If the server rejects the BNDUPD message with a BNDACK and a reject- 2658 reason option, it may be because the server believes that it has 2659 binding information that the other server should know. A server 2660 which is rejecting a BNDUPD may initiate a BNDUPD of its own in order 2661 to update its partner with what it believes is better binding infor- 2662 mation, but it MUST ensure through some means that it will not end up 2663 in a situation where each server is sending BNDUPD messages as fast 2664 as possible because they can't agree on which server has better bind- 2665 ing data. Placing a considerable delay on the initiation of a BNDUPD 2666 message after sending a BNDACK with a reject-reason would be one way 2667 to ensure this situation doesn't occur. 2669 7.2.2. Receiving the BNDACK message 2671 When a server receives a BNDACK message, if it doesn't contain a 2672 reject-reason option that means that the BNDUPD message was accepted, 2673 and the server which sent the BNDUPD SHOULD update its stable storage 2674 with the potential-expiration-time value sent in the BNDUPD message 2675 and returned in the BNDACK message. Other values sent in the BNDUPD 2676 message MAY be used as desired. 2678 If the BNDACK message contains a reject-reason option, that means 2679 that the BNDUPD was rejected. There SHOULD be a message option in 2680 the BNDACK giving a text reason for the rejection, and the server 2681 SHOULD log the message in some way. The server MUST NOT immediately 2682 try to resend the BNDUPD message as there is no reason to believe the 2683 partner won't reject it a second time. However a server MAY choose 2684 to send another BNDUPD at some future time, for instance when the 2685 server next processes an update request from its partner. 2687 7.3. UPDREQ message [9] 2689 The update request (UPDREQ) message is used by one server to request 2690 that its partner send it all of the binding database information that 2691 it has not already seen. Since each server is required to keep 2692 track at all times of the binding information the other server has 2693 received and ACKed, one server can request transmission of all un- 2694 ACKed binding database information held by the other server by using 2695 the UPDREQ message. 2697 The UPDREQ message is used whenever the sending server cannot proceed 2698 before it has processed all previously un-ACKed binding update infor- 2699 mation, since the UPDREQ message should yield a corresponding UPDDONE 2700 message. The UPDDONE message is not sent until the server that sent 2701 the UPDREQ message has responded to all of the BNDUPD messages gen- 2702 erated by the UPDREQ message with BNDACK messages (they may either be 2703 accepted or rejected by the BNDACK messages, but they MUST have been 2704 responded to). Thus, the sender of the UPDREQ message can be sure 2705 upon receipt of an UPDDONE message that it has received and committed 2706 to stable storage all outstanding binding database updates. 2708 See section 9, Failover Endpoint States, for the details of when the 2709 UPDREQ message is sent. 2711 7.3.1. Sending the UPDREQ message 2713 The UPDREQ message has no message specific options. 2715 7.3.2. Receiving the UPDREQ message 2717 A server receiving an UPDREQ message MUST send all binding database 2718 changes that have not yet been ACKed by the sending server. These 2719 changes are sent as undistinguished BNDUPD messages. 2721 However, the server which received and is processing the UPDREQ mes- 2722 sage MUST track the BNDACK messages that correspond to the BNDUPD 2723 messages triggered by the UPDREQ message and, when they are all 2724 received, the server MUST send an UPDDONE message. 2726 The server processing the UPDREQ message and sending BNDUPD messages 2727 to its partner SHOULD only track the BNDUPD and BNDACK message pairs 2728 for unACKed binding database changes that were present upon the 2729 receipt of the UPDREQ message. A server which has received an UPDREQ 2730 message SHOULD send BNDUPD messages for binding database changes that 2731 occur after receipt of the UPDREQ message, but it SHOULD NOT include 2732 those additional BNDUPD messages and their corresponding BNDACK mes- 2733 sages in the accounting necessary to consider the UPDREQ complete and 2734 subsequently send the UPDDONE message. If some additional binding 2735 database changes end up becoming part of the set of BNDUPD messages 2736 considered as part of the UPDREQ (due to whatever algorithm the 2737 server uses to scan its bindings database for unacked changes) it 2738 will probably not cause any difficulty, but a server MUST NOT attempt 2739 to include all such later BNDUPD messages in the accounting for the 2740 UPDREQ in order to be able to transmit an UPDDONE message. 2742 When queuing up the BNDUPD messages for transmission to the sender of 2743 the UPDREQ message, the server processing the UPDREQ message MUST 2744 honor the value returned in the max-unacked-bndupd option in the CON- 2745 NECT or CONNECTACK message that set up the connection with the send- 2746 ing server. It MUST NOT send more BNDUPD messages without receiving 2747 corresponding BNDACKs than the value returned in max-unacked-bndupd. 2749 7.4. UPDREQALL message [7] 2751 The update request all (UPDREQALL) message is used by one server to 2752 request that its partner send it all of the binding database 2753 information. This message is used to allow one server to recover 2754 from a failure of stable storage and to restore its binding database 2755 in its entirety from the other server. 2757 A server which sends an UPDREQALL message cannot proceed until all of 2758 its binding update information is restored, and it knows that all of 2759 that information is restored when an UPDDONE message is received. 2761 See section 9, Protocol state transitions, for the details of when 2762 the UPDREQALL message is sent. 2764 The UPDREQALL message has no message specific options. 2766 7.4.1. Sending the UPDREQALL message 2768 The UPDREQALL is sent. 2770 7.4.2. Receiving the UPDREQALL message 2772 A server receiving an UPDREQALL message MUST send all binding data- 2773 base information to the sending server. These changes are sent as 2774 undistinguished BNDUPD messages. Otherwise the processing is the same 2775 as for the UPDREQ message. See section 7.3.2 for details. 2777 7.5. UPDDONE message [8] 2779 The update done (UPDDONE) message is used by a server receiving an 2780 UPDREQ or UPDREQALL message to signify that it has sent all of the 2781 BNDUPD messages requested by the UPDREQ or UPDREQALL request and that 2782 it has received a BNDACK for each of those messages. 2784 While a BNDACK message MUST have been received for each BNDUPD mes- 2785 sage prior to the transmission of the UPDDONE message, this doesn't 2786 necessarily mean that all of the BNDUPD messages were accepted, only 2787 that all of them were responded to with a BNDACK message. Thus, a 2788 NAK (comprised of a BNDACK message containing a reject-reason option) 2789 could be used to reject a BNDUPD, but for the purposes of the UPDDONE 2790 message, such NAK would count as a response to the associated BNDUPD 2791 message, and would not block the eventual transmission of the UPDDONE 2792 message. 2794 The xid in an UPDDONE message MUST be identical to the xid in the 2795 UPDREQ or UPDREQALL message that initiated the update process. 2797 The UPDDONE message has no message specific options. 2799 7.5.1. Sending the UPDDONE message 2801 The UPDDONE message SHOULD be sent as soon as the last BNDACK message 2802 corresponding to a BNDUPD message requested by the UPDREQ or 2803 UPDREQALL is received from the server which sent the UPDREQ or 2804 UPDREQALL. The XID of the UPDDONE message MUST be the same as the 2805 XID of the corresponding UPDREQ or UPDREQALL message. 2807 7.5.2. Receiving the UPDDONE message 2809 A server receiving the UPDDONE message knows that all of the informa- 2810 tion that it requested by sending an UPDREQ or UPDREQALL message has 2811 now been sent and that it has recorded this information in its stable 2812 storage. It typically uses the receipt of an UPDDONE message to move 2813 to a different failover state. See sections 9.5.2 and 9.8.3 for 2814 details. 2816 7.6. POOLREQ message [1] 2818 The pool request (POOLREQ) message is used by the secondary server to 2819 request an allocation of IP addresses from the primary server. It 2820 MUST be sent by a secondary server to a primary server to request IP 2821 address allocation by the primary. The IP addresses allocated are 2822 transmitted using normal BNDUPD messages from the primary to the 2823 secondary. 2825 The POOLREQ message SHOULD be sent from the secondary to the primary 2826 whenever the secondary transitions into NORMAL state. It SHOULD 2827 periodically be resent in order that any change in the number of 2828 available IP addresses on the primary be reflected in the pool on the 2829 secondary. The period may be influenced by the secondary server's 2830 leasing activity. 2832 The POOLREQ message has no message specific options. 2834 7.6.1. Sending the POOLREQ message 2836 The POOLREQ message is sent. 2838 7.6.2. Receiving the POOLREQ message 2840 When a primary server receives a POOLREQ message it SHOULD examine 2841 the binding database and determine how many IP addresses the secon- 2842 dary server should have, and set these IP addresses to BACKUP state. 2843 It SHOULD then send BNDUPD messages concerning all of these IP 2844 addresses to the secondary server. 2846 Servers frequently have several kinds of IP addresses available on a 2847 particular network segment. The failover protocol assumes that both 2848 primary and secondary servers are configured in such a way that each 2849 knows the type and number of IP addresses on every network segment 2850 participating in the failover protocol. The primary server is 2851 responsible for allocating the secondary server the correct propor- 2852 tion of available IP addresses of each kind, and the secondary server 2853 is responsible for being configured in such a way that it can tell 2854 the kind of every IP address based solely on the IP address itself. 2856 A primary server MUST keep track of how many IP addresses were allo- 2857 cated as a result of processing the POOLREQ message, and send that 2858 number in the POOLRESP message. 2860 A primary server MAY choose to defer processing a POOLREQ message 2861 until a more convenient time to process it, but it should not depend 2862 on the secondary server to resend the POOLREQ message in that case. 2864 If a secondary server receives a POOLREQ message it SHOULD report an 2865 error. 2867 7.7. POOLRESP message [2] 2869 A primary server sends a POOLRESP message to a secondary server after 2870 the allocation process for available addresses to the secondary 2871 server is complete. Typically this message will precede some of the 2872 BNDUPD messages that the primary uses to send the actual allocated IP 2873 addresses to the secondary. 2875 The xid in the POOLRESP message MUST be identical to the xid in the 2876 POOLREQ message for which this POOLRESP is a response. 2878 7.7.1. Sending the POOLRESP message 2880 The POOLRESP message MUST contain the same xid as the corresponding 2881 POOLREQ message. 2883 Only one option MUST appear in a POOLREQ message: 2885 o addresses-transferred 2887 The number of addresses allocated to the secondary server by the 2888 primary server as a result of a POOLREQ is contained in the 2889 addresses-transferred option in a POOLRESP message. Note this 2890 is the number of addresses that are transferred to the secondary 2891 in the primary's binding database as a result of the correspond- 2892 ing POOLREQ message, and that it may be some time before they 2893 can all be transmitted to the secondary server through the use 2894 of BNDUPD messages. 2896 7.7.2. Receiving the POOLRESP message 2898 When a secondary server receives a POOLRESP message, it SHOULD send 2899 another POOLREQ message if the value of the addresses-transferred 2900 option is non-zero. 2902 Typically, no other action is taken on the reception of a POOLRESP 2903 message. 2905 7.8. CONNECT message [5] 2907 The connect message is used to establish an applications level con- 2908 nection over a newly created TCP connection. It gives the source 2909 information for the connection, and critical configuration informa- 2910 tion. It MUST be sent only by the primary server. Either server can 2911 initiate a TCP connection, but the CONNECT message is only sent by 2912 the primary server. 2914 The CONNECT message MUST be the first message sent down a newly esta- 2915 blished connection, and it MUST be sent only by the primary server. 2917 The following table summarizes the options that are associated with 2918 the CONNECT message: 2920 Option 2921 ------ 2922 sending-server-IP-address MUST 2923 max-unacked-bndupd MUST 2924 receive-timer MUST 2925 vendor-class-identifier MUST 2926 protocol-version MUST 2927 TLS-request MUST (1) 2928 MCLT MUST 2929 hash-bucket-assignment MUST 2931 (1) MUST NOT if CONNECT is being sent over a TLS connection 2933 Table 7.8-1: Options used in a CONNECT message 2935 7.8.1. Sending the CONNECT message 2937 The CONNECT message MUST be the first message sent by the primary 2938 server after the establishment of a new TCP connection with a secon- 2939 dary server participating in the failover protocol. 2941 The xid of the CONNECT message must be unique. 2943 The IP address of the primary server MUST be placed in the sending- 2944 server-IP-address option. This information is placed in an option 2945 inside of the message in order to allow the identity of the sender to 2946 be covered by a shared secret. 2948 The number of BNDUPD messages the primary server can accept without 2949 blocking the TCP connection MUST be placed in the max-unacked-bndupd 2950 option. This MUST be a number equal to or greater than 1, SHOULD be 2951 a number greater than 10, and SHOULD be a number less than 100. 2953 The length of the receive timer (tReceive, see section 8.3) MUST be 2954 placed in the receive-timer option. 2956 The MCLT MUST be placed in the MCLT option. 2958 The hash-bucket-assignment option MUST be included in the CONNECT 2959 message. In the event that load balancing is not configured for this 2960 server, the hash-bucket-assignment option will indicate that. The 2961 value of the hash-bucket-assignment option is determined from the 2962 specific buckets that the primary server has determined that the 2963 secondary server MUST service as part of the load-balancing algo- 2964 rithm. The way in which the primary server determines this informa- 2965 tion is outside the scope of this protocol definition. The primary 2966 server SHOULD be configured with a percentage of clients that the 2967 secondary server will be instructed to service, and the primary 2968 server SHOULD use the algorithm in [LOADB] to generate a Hash Bucket 2969 Assignment which it sends to the secondary server. 2971 The vendor class identifier MUST be placed in the vendor-class- 2972 identifier option. 2974 The protocol-version option MUST be included in every CONNECT mes- 2975 sage. The current value of the protocol version is 1. 2977 The TLS-request option MUST be sent and contains the desired TLS con- 2978 nection request as well as information concerning whether TLS is sup- 2979 ported. If this CONNECT message is being sent over a already 2980 created TLS connection, the TLS-request MUST NOT appear. 2982 7.8.2. Receiving the CONNECT message 2984 When a server receives a TCP connection on the failover port, if it 2985 is a PRIMARY server it should send a CONNECT message, and if it is a 2986 secondary server it should wait for a CONNECT message before sending 2987 any messages. To avoid denial of service attacks, a secondary should 2988 only wait for a CONNECT message on a new connection for a limited 2989 amount of time and close the connection if none is received during 2990 that time. 2992 When a secondary server receives a CONNECT message it should: 2994 1. Record the time at which the message was received. 2996 2. Examine the protocol-version option, and decide if this server 2997 is capable of interoperating with another server running that 2998 protocol version. If not, send the CONNECTACK message with 2999 the appropriate reject-reason. The server MUST include its 3000 protocol-version in the CONNECTACK message. 3002 3. Examine the TLS-request option. Figure out the TLS-reply 3003 value based on the capabilities and configuration of this 3004 server. If the result for the TLS-reply value is a 1 and the 3005 connection is accepted, indicating use of TLS, then immedi- 3006 ately send the CONNECTACK message and go into TLS negotiation. 3007 If the TLS-reply value implies rejection of the connection, 3008 then immediately send the CONNECTACK message with the TLS- 3009 reply value and the appropriate reject-reason option value. 3010 In all other cases, save the TLS-reply option information for 3011 the eventual CONNECTACK message. 3013 The possibilities for TLS-request and TLS-reply are: 3015 CONNECT CONNECTACK 3016 TLS TLS 3017 request reply 3018 Reject 3019 t1 t1 Reason Comments 3020 -- -- ------ -------- 3021 0 0 no TLS used 3022 0 1 11 primary won't use TLS, secondary requires TLS 3023 1 0 primary desires TLS, secondary doesn't 3024 1 1 primary desires TLS, secondary will use TLS 3025 2 0 9, 10 primary requires TLS and secondary won't 3026 2 1 primary requires TLS and secondary will use TLS 3028 4. Check to see if there is a message-digest option in the CON- 3029 NECT message. If there was, and the server does not support 3030 message-digests, then reject the connection with the appropri- 3031 ate reject-reason in the CONNECTACK. If the server does sup- 3032 port message-digests, then check this message for validity 3033 based on the message-digest, and reject it if the digest indi- 3034 cates the message was altered. 3036 5. Determine if the sender (from the sending-server-IP-address 3037 option) and the implicit role of the sender (i.e., primary) 3038 represents a server with which the receiver was configured to 3039 engage in failover activity. This is performed after any TLS 3040 or message digest processing so that it occurs after a secure 3041 connection is created, to ensure that there is no tampering 3042 with the IP address of the partner. 3044 If not, then the receiving server should reject the CONNECT 3045 request by sending a CONNECTACK message with a reject-reason 3046 value of: 8, invalid failover partner. 3048 If it is, then the receiving failover endpoint should be 3049 determined. 3051 6. Decide if the time delta between the sending of the message, 3052 in the time field, and the receipt of the message, recorded in 3053 step 1 above, is acceptable. A server MAY require an arbi- 3054 trarily small delta in time values in order to set up a fail- 3055 over connection with another server. See section 5.9 for 3056 information on time synchronization. 3058 If the delta between the time values is too great, the server 3059 should reject the CONNECT request by sending a CONNECTACK mes- 3060 sage with a reject-reason of 4, time mismatch too great. 3062 If the time mismatch is not considered too great then the 3063 receiving server MUST record the delta between the servers. 3064 The receiving server MUST use this delta to correct all of the 3065 absolute times received from the other server in all time- 3066 valued options. Note that servers can participate in failover 3067 with arbitrarily great time mismatches, as long as it is more 3068 or less constant. 3070 7. Examine the MCLT option in the CONNECT request and use the 3071 value of the MCLT as the MCLT for this failover endpoint. 3073 The secondary server SHOULD be able to operate with any MCLT 3074 sent by the primary, but if it cannot, then it should send a 3075 CONNECTACK with a reject-reason of 5, MCLT mismatch. 3077 8. The server MUST store hash-bucket-assignment option for use 3078 during processing during NORMAL state. If this hash bucket 3079 assignment conflicts with the secondary server's configured 3080 hash bucket assignment for use in other than NORMAL state, the 3081 secondary server should send a CONNECTACK with a reject reason 3082 of 19, Hash bucket assignment conflict. 3084 9. The receiving server MAY use the vendor-class-identifier to do 3085 vendor specific processing. 3087 7.9. CONNECTACK message [6] 3089 The CONNECTACK message is sent to accept or reject a CONNECT message. 3090 It is sent by the secondary server which received a CONNECT message. 3092 Attempting immediately to reconnect after either receiving a CONNEC- 3093 TACK with a reject-reason or after sending a CONNECTACK with a 3094 reject-reason could yield unwanted looping behavior, since the reason 3095 that the connection was rejected may well not have changed since the 3096 last attempt. A simple suggested solution is to wait a minute or two 3097 after sending or receiving a CONNECTACK message with a reject-reason 3098 before attempting to reestablish communication. 3100 The following table summarizes the options associated with the CON- 3101 NECTACK message: 3103 Option 3104 ------ 3105 sending-server-IP-address MUST 3106 max-unacked-bndupd MUST 3107 receive-timer MUST 3108 vendor-class-identifier MUST 3109 protocol-version MUST 3110 TLS-request MUST(1) 3111 reject-reason MAY(2) 3112 message MAY 3113 MCLT MUST NOT 3114 hash-bucket-assignment MUST NOT 3116 (1) MUST NOT if sending CONNECTACK after TLS negotiation 3117 (2) Indicates a rejection of the CONNECT message. 3119 Table 7.9-1: Options used in a CONNECTACK message 3121 7.9.1. Sending the CONNECTACK message 3123 The xid of the CONNECTACK message MUST be that of the corresponding 3124 CONNECT message. 3126 The IP address of the sending server MUST be placed in the sending- 3127 server-IP-address option. This information is placed in an option 3128 inside of the message in order to allow the identity of the sender to 3129 be covered by a shared secret. 3131 The protocol-version option MUST be included in every CONNECTACK mes- 3132 sage. The current value of the protocol version is 1. 3134 If the connection has been rejected, the reject-reason option MUST be 3135 placed in the CONNECTACK message with an appropriate reason, and a 3136 message option SHOULD be included with a human-readable error message 3137 describing the reason for the rejection in some detail. If the 3138 reject-reason option appears, then the remaining options listed below 3139 do not appear. The sending server should close the connection after 3140 sending the CONNECTACK if the connection was rejected. 3142 The results of the TLS negotiation MUST be placed in the TLS-reply 3143 option. If this CONNECTACK message is being sent over an already TLS 3144 secured connection, then there MUST NOT be a TLS-reply option. 3146 If there was a message-digest option in the CONNECT message, then 3147 there MUST be a message-digest in the CONNECTACK message and any sub- 3148 sequent messages if the CONNECTACK does not contain a reject-reason. 3150 The number of BNDUPD messages the server can accept without blocking 3151 the TCP connection MUST be placed in the max-unacked-bndupd option. 3152 This SHOULD be a number greater than 10, and SHOULD be a number less 3153 than 100. 3155 The length of the receive timer (tReceive, see section 8.3) MUST be 3156 placed in the receive-timer option. 3158 The vendor class identifier MUST be placed in the vendor-class- 3159 identifier option. 3161 After a connection is created (either by sending a CONNECTACK message 3162 to the first CONNECT message, or sending a CONNECTACK message to a 3163 CONNECT message received over a TLS connection), the server MUST send 3164 a STATE message. 3166 After a connection is created, the server MUST start two timers for 3167 the connection: tSend and tReceive. The tSend timer SHOULD be 3168 approximately 33 percent of the time in the receiver-timer option in 3169 the corresponding CONNECT message. The tReceive timer SHOULD be the 3170 time sent in the receiver-timer option in the CONNECTACK message. 3172 The tReceive timer is reset whenever a message is received from this 3173 TCP connection. If it ever expires, the TCP connection is dropped 3174 and communications with this partner is considered not ok. 3176 The tSend timer is reset whenever a message is sent over this connec- 3177 tion. When it expires, a CONTACT message MUST be sent. 3179 7.9.2. Receiving the CONNECTACK message 3181 If a CONNECTACK message is received with a different XID from the one 3182 in the CONNECT that was sent, it SHOULD be ignored. 3184 When a CONNECTACK message is received, the following actions should 3185 be taken: 3187 1. Record the time the message was received. 3189 2. Check to see if the xid on the CONNECTACK matches an outstand- 3190 ing CONNECT message on this TCP connection. 3192 3. Check to see if there is a reject-reason option in the CONNEC- 3193 TACK message. If not, continue with step 3. If there is a 3194 reject-reason option, the server SHOULD report the error code. 3195 If a message option appears a server SHOULD display the string 3196 from the message option in a user visible way. The server 3197 MUST close the connection if a reject-reason option appears. 3199 4. Check the value of the TLS-reply option (if any, which there 3200 won't be if this CONNECT is taking place utilizing TLS), and 3201 if it was 1, then skip processing of the rest of the CONNEC- 3202 TACK message, and immediately enter into TLS connection setup. 3204 This step occurs prior to steps 5 and 6 in order to allow 3205 creation of a secure connection (if required) prior to pro- 3206 cessing the protocol version and IP address information. 3208 5. Examine the value of the protocol-version option. If this 3209 server is able to establish connections with another server 3210 running this protocol version, then continue, else close the 3211 connection. 3213 6. Decide if the time delta between the sending of the message, 3214 in the time field, and the receipt of the message, recorded in 3215 step 1 above, is acceptable. A server MAY require an arbi- 3216 trarily small delta in time values in order to set up a fail- 3217 over connection with another server. 3219 If the delta between the time values is too great, the server 3220 should drop the TCP connection. 3222 If the time mismatch is not considered too great then the 3223 receiving server MUST record the delta between the servers. 3224 The receiving server MUST use this delta to correct all of the 3225 absolute times received from the other server in all time- 3226 valued options. Note that the failover protocol is 3227 constructed so that two servers can be failover partners with 3228 arbitrarily great time mismatches. 3230 7. The receiving server MAY use the vendor-class-identifier to do 3231 vendor specific processing. 3233 8. After accepting a CONNECTACK message, the server MUST send a 3234 STATE message. 3236 After receiving a CONNECTACK message, the server MUST start 3237 two timers for the connection: tSend and tReceive. The tSend 3238 timer SHOULD be approximately 20 percent of the time in the 3239 receiver-timer option in the corresponding CONNECTACK message. 3240 The tReceive timer SHOULD be set to the time sent in the 3241 receiver-timer option in the CONNECT message. 3243 The tReceive timer is reset whenever a message is received 3244 from this TCP connection. If it ever expires, the TCP connec- 3245 tion is dropped and communications with this partner is con- 3246 sidered not ok. 3248 The tSend timer is reset whenever a message is sent over this 3249 connection. When it expires, a CONTACT message MUST be sent. 3251 7.10. STATE message [10] 3253 The state (STATE) message is used to communicate the current failover 3254 state to the partner server. 3256 The STATE message MUST be sent after sending a CONNECTACK message 3257 that didn't contain a reject-reason option, and MUST be sent after 3258 receiving a CONNECTACK message without a reject-reason option. 3260 A STATE message MUST be sent whenever the failover endpoint changes 3261 its failover state and a connection exists to the partner. 3263 The STATE message requires no response from the failover partner. 3265 The following table shows the options that MUST appear in a STATE 3266 message: 3268 Option 3269 ------ 3270 sending-state MUST 3271 server-flags MUST 3272 start-time-of-state MUST 3274 Table 7.10-1: Options used in a STATE message 3276 7.10.1. Sending the STATE message 3278 The current failover state is placed in the server-state option and 3279 the current state of the STARTUP flag is placed in the server-flags 3280 option. 3282 The message is sent with a unique xid. 3284 A server SHOULD only send the STATE message either when the connec- 3285 tion is created (i.e, after sending or receiving a CONNECTACK message 3286 with no reject-reason option), or when there is a change from the 3287 values sent in a previous STATE message. 3289 7.10.2. Receiving the STATE message 3291 Every STATE message SHOULD indicate a change in state or a change in 3292 the flags. 3294 When a STATE message is received, any state transitions specified in 3295 section 9 are taken. 3297 No response to a STATE message is required. 3299 7.11. CONTACT message [11] 3301 The contact (CONTACT) message is sent to verify communications 3302 integrity with a failover partner. The CONTACT message is sent when 3303 no messages have been sent to the failover partner for a specified 3304 period of time. This is determined by the tSend timer expiring (see 3305 section 8.3). 3307 The CONTACT message has no message specific options. 3309 7.11.1. Sending the CONTACT message 3311 The CONTACT message is sent. 3313 7.11.2. Receiving the CONTACT message 3315 When a CONTACT message is received, the tReceive timer is reset (as 3316 it is with any message that is received). 3318 A server SHOULD use the time in the time field and the time the mes- 3319 sage was received to refine the delta time calculations between the 3320 servers. 3322 7.12. DISCONNECT message [12] 3324 The DISCONNECT is the last message sent over a connection before 3325 dropping an established connection (note that an established connec- 3326 tion is one where a CONNECTACK has been sent without a reject rea- 3327 son). 3329 After sending or receiving a DISCONNECT message, a server needs to 3330 have some mechanism to prevent an error loop. Simply reconnecting to 3331 the partner immediately is not the best option, especially after 3332 several consecutive attempts. 3334 A simple suggested solution is to wait a minute or two after sending 3335 or receiving a DISCONNECT before attempting to reestablish communica- 3336 tion. 3338 The DISCONNECT message MUST be the last message sent down a connec- 3339 tion before it is closed. 3341 The following table summarizes the options that are associated with 3342 the DISCONNECT message: 3344 Option 3345 ------ 3346 reject-reason MUST 3347 message SHOULD 3349 Table 7.12-1: Options used in a DISCONNECT message 3351 7.12.1. Sending the DISCONNECT message 3353 The DISCONNECT message MUST be the last message sent by the a server 3354 which is dropping a TCP connection. 3356 The xid of the DISCONNECT message must be unique. 3358 The reject-reason option MUST appear giving a reason why the connec- 3359 tion was dropped. A message option SHOULD appear giving a human 3360 readable error message with possibly more details. 3362 7.12.2. Receiving the DISCONNECT message 3364 When a server receives a DISCONNECT message it should log the message 3365 if there was one and possibly raise an alarm of some sort if the 3366 reject reason was one that was sufficiently serious. 3368 8. Connection Management 3370 Servers participating in the failover protocol communicate over TCP 3371 connections. These TCP connections are used both to transmit bind- 3372 ing information from one server to another as well as to allow each 3373 server to determine whether communications is possible with the other 3374 server. 3376 Central to the operation of the failover protocol is a notion of 3377 "communications okay" or "communications failed". Failover state 3378 transitions are taken in many cases when the status of communications 3379 with the partner changes, and the existence or non-existence of a TCP 3380 connections between failover endpoints is used to determine if com- 3381 munications is "okay" or "failed". 3383 A single TCP connection exists which connects two failover endpoints. 3385 8.1. Connection granularity 3387 There exists one TCP connection between each set of failover end- 3388 points. See section 5.1.1 for an explanation of failover endpoints. 3390 There are a maximum of two TCP connections between any two servers 3391 implementing the failover protocol, one for each of the possible 3392 failover endpoints between these two servers. There is a minimum of 3393 one TCP connection between one server and every other failover server 3394 with which it implements the failover protocol. 3396 8.2. Creating the TCP connection 3398 There are two ports used for initiating TCP connections, correspond- 3399 ing to the two roles that a server can fill with respect to another 3400 server. Every server implementing the failover protcol MUST listen 3401 on at least one of these ports. Port 647 is the port to which pri- 3402 mary servers will attempt a connection, and port TBD is the port to 3403 which secondary servers will attempt a connection. When a connection 3404 attempt is received on port 647 it is therefore from a primary 3405 server, and it is attempting to connect to this server to become a 3406 secondary server for it. Likewise, when an attempt to connect is 3407 received on port TBD the connection attempt is from a secondary 3408 server, and it is attempting to connect to this server to be a pri- 3409 mary server. The source port of any TCP connection is unimportant. 3410 See the schematic representation below: 3412 Primary Server 3413 -------------- 3414 Listens on port TBD for secondary server to connect to it 3415 Periodically connects on port 647 to contact secondary 3417 Secondary Server 3418 -------------- 3419 Listens on port 647 for primary server to connect to it 3420 Periodically connects on port TDB to contact primary 3422 Every server implementing the failover protocol SHOULD attempt to 3423 connect to all of its partners periodically, where the period is 3424 implementation dependent and SHOULD be configurable. In the event 3425 that a connection has been rejected by a CONNECTACK message with a 3426 reject-reason option contained in it or a DISCONNECT message, a 3427 server SHOULD reduce the frequency with which it attempts to connect 3428 to that server but it SHOULD continue to attempt to connect periodi- 3429 cally. 3431 If a connection attempt has been received from another server in a 3432 particular role (i.e., from a specific failover endpoint) then the 3433 receiving server MUST NOT initiate a connection attempt to the 3434 partner server in that same role. 3436 If both servers happen to attempt to connect simultaneously, the 3437 secondary server MUST drop its attempt in favor of the primary's 3438 attempt. Thus, in the event that a secondary server receives a con- 3439 nection attempt to port 647 from a primary server when it has already 3440 initiated a connection attempt to port TBD on the same primary 3441 server, it MUST accept the connection to port 647 and it MUST drop 3442 drop the connection attempt to port TBD. In the event that a primary 3443 server receives a connection attempt to port TBD from a secondary 3444 server when it has already initiated a connection attempt to port 647 3445 on that same server, it MUST reject the connection attempt to port 3446 TBD and continue to pursue the connection attempt on port 647. 3448 Once a connection is established, the primary server MUST send a CON- 3449 NECT message across the connection. A secondary server MUST wait for 3450 the CONNECT message from a primary server. 3452 Every CONNECT message includes a TLS-request option, and if the CON- 3453 NECTACK message does not reject the CONNECT message and the TLS-reply 3454 option says TLS MUST be used, then the servers will immediately enter 3455 into TLS negotiation. 3457 Once TLS negotiation is complete, the primary server MUST resend the 3458 CONNECT message on the newly secured TLS connection and then wait for 3459 the CONNECTACK message in response. The TLS-request and TLS-reply 3460 options MUST NOT appear in either this second CONNECT or its associ- 3461 ated CONNECTACK message as they had in the first messages. 3463 The second message sent over a new connection (either a bare TCP con- 3464 nection or a connection utilizing TLS) is a STATE message. Upon the 3465 receipt of this message, the receiver can consider communications up. 3467 It is entirely possible that two servers will attempt to make connec- 3468 tions to each other essentially simultaneously, and in this case the 3469 secondary server will be waiting for a CONNECT message on each con- 3470 nection. The primary server MUST send a CONNECT message over one 3471 connection and it MUST close the other connection. 3473 A secondary server MUST NOT respond to the closing of a TCP connec- 3474 tion with a blind attempt to reconnect -- there may be another TCP 3475 connection to the same failover partner already in use. 3477 8.3. Using the TCP connection for determining communications status 3479 The TCP connection is used to determine the communications status of 3480 the other server, i.e., communications-ok, or communications- 3481 interrupted. 3483 Three things must happen for a server to consider that communications 3484 are ok with respect to another server: 3486 1. A TCP connection must be established to the other server. 3488 2. A CONNECT message must be received and a CONNECTACK message 3489 sent in response. The CONNECT message is used to determine 3490 the identify of the failover endpoint of the other end of the 3491 TCP connection -- without it, the failover endpoint cannot be 3492 uniquely determined. Without knowledge of the failover end- 3493 point, then the entity with which communications is ok is 3494 undetermined. 3496 3. A STATE message must be received from the other server over 3497 the connection. This STATE message initializes important 3498 information necessary to the operation of the state machine 3499 the governs the behavior of this failover endpoint. 3501 There are two ways that a server can determine that communications 3502 has failed: 3504 1. The TCP connection can go down, yielding an error when 3505 attempting to send or receive a message. This will happen at 3506 least as often as the period of the tSend timer. 3508 2. The tReceive timer can expire. 3510 In either of these cases, communications is considered interrupted. 3512 Several difficulties arise when trying to use one TCP connection for 3513 both bulk data transfer as well as to sense the communications status 3514 of the other server. One aspect of the problem stems from the dif- 3515 ferent requirements of both uses. The bulk data transfer is of 3516 course critically important to the protocol, but the speed with which 3517 it is processed is not terribly significant. It might well be 3518 minutes before a BNDUPD message is processed, and while not optimal, 3519 such an occasional delay doesn't compromise the correctness of the 3520 protocol. However, the speed with which one server detects the other 3521 server is up (or, more importantly, down) is more highly constrained. 3522 Generally one server should be able to detect that the other server 3523 is not communicating within a minute or less. 3525 These differing time constraints makes it difficult to use the same 3526 TCP connection for data transfer as well as to sense communications 3527 integrity. See section 3.5 for additional details on TCP. 3529 The solution to this problem is to require that some message be 3530 received by each end of the connection within a limited time or that 3531 the connection will be considered down. If no messages have been 3532 sent recently, then a CONTACT message is sent. 3534 In the case where there is no data queued to be sent, this is not a 3535 problem, but in the case where there is data queued to be sent to the 3536 partner, then the CONTACT message will not actually be transmitted 3537 until the queued data is sent. Section 3.5 explains why waiting for 3538 TCP to determine that the connection is down is not acceptable, and 3539 leads a requirement that the receiving server never block the sending 3540 server from sending CONTACT messages. 3542 In order to meet this requirement, each server tells the other server 3543 the number of outstanding BNDUPD messages that it will accept. The 3544 receiving server is required to always be able to accept that many 3545 BNDUPD messages off of the connection's input queue even if it cannot 3546 process them immediately, and to accept all other messages immedi- 3547 ately. 3549 Thus, the sending server's TCP is never blocked from sending a mes- 3550 sage except for very short periods, less than a few seconds unless 3551 the network connection itself has problems. In this case, if the 3552 CONTACT messages don't make it to the partner then the partner will 3553 close the connection. 3555 DISCUSSION: 3557 When implementing this capability, one needs to be careful when 3558 sending any message on the TCP connection as TCP can easily block 3559 the server if the local TCP send buffers are full. This can't be 3560 prevented because if the receiver is not reachable (via the net- 3561 work), the sending TCP can't send and thus it will be unable to 3562 empty the local TCP send buffers. So, all send operations either 3563 need to assume they may block for some time or non-blocking sends 3564 must be used. 3566 8.4. Using the TCP connection for binding data 3568 Binding data, in the form of BNDUPD messages and BNDACK messages to 3569 respond to them, are sent across the TCP connection. 3571 In order to support timely detection of any failure in the partner 3572 server, the TCP connection MUST NOT block for more than a very short 3573 time, on the order of a few seconds. Therefore, a server that is 3574 sending BNDUPD messages MUST send only a restricted number before 3575 receiving BNDACK messages about previous messages sent. 3577 The number of outstanding BNDUPD messages that each server will 3578 accept without causing TCP to block transmission of additional data 3579 (i.e, CONTACT messages) is sent by each server in the CONNECT and 3580 CONNECTACK messages in the max-unacked-bndupd option. 3582 8.5. Using the TCP connection for control messages 3584 The TCP connection is used for control messages: POOLREQ, UPDREQ, 3585 STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL- 3586 RESP, UPDDONE. A server MUST immediately accept all of these mes- 3587 sages from the TCP connection. A server MUST immediately accept any 3588 BNDACK which is received as well. 3590 8.6. Losing the TCP connection 3592 When the TCP connection is lost, then communications is not ok with 3593 the other server. A server which has lost communications SHOULD 3594 immediately attempt to reconnect to the other server, and should 3595 retry these connection attempts periodically. 3597 An acknowledgement message (BNDACK, POOLRESP, UPDDONE) message can 3598 only be sent in response to a request message (BNDUPD, POOLREQ, 3599 UPDREQ, UPDREQALL) on the same TCP connection from which the request 3600 was received, in part since the XID's in the request messages are 3601 guaranteed unique only during the life of a single TCP connection. 3603 When a connection to a partner server goes down, a server with unpro- 3604 cessed request messages MAY simply drop all of those messages, since 3605 it can be sure that the partner will resend them when they are next 3606 in communications. A server with unprocessed BNDUPD messages when a 3607 TCP connection goes down MAY instead choose to process those BNDUPD 3608 messages, but it MUST NOT send any BNDACK messages in response (again 3609 because of the issues surrounding XID uniqueness). 3611 When the TCP connection is closed explicitly, the DISCONNECT message 3612 with a reject-reason option (and, ideally, a message option) MUST be 3613 sent over the TCP connection. 3615 9. Failover Endpoint States 3617 This section discusses the various states that a failover endpoint 3618 may take, and the server actions required when entering the state, 3619 operating in the state, and leaving the state, as well as the events 3620 that cause transitions out of the state into another state. 3622 The state transition diagram in Figure 9.2-1 is relevant for this 3623 section. This is the common state transition diagram for both servers 3624 in a failover pair. In the event that the textual description of a 3625 state differs from the state transition diagram, the textual descrip- 3626 tion is to be considered authoritative. 3628 9.1. Server Initialization 3630 When a server starts it starts out in STARTUP state. See section 9.3 3631 below for details. 3633 9.2. Server State Transitions 3635 Whenever a server transitions into a new state, it MUST record the 3636 state and the time at which it entered that state in stable storage. 3637 If communications is "ok", it MUST also send a STATE message to its 3638 failover partner. 3640 Figure 9.2-1 is the diagram of the server state transitions. The 3641 remainder of this section contains information important to the 3642 understanding of that diagram. 3644 The server stays in the current state until all of the actions speci- 3645 fied on the state transition are complete. If communications fails 3646 during one of the actions, the server simply stays in the current 3647 state and attempts a transition whenever the conditions for a transi- 3648 tion are later fulfilled. 3650 In the state transition diagram below, the "+" or "-" in the upper 3651 right corner of each state is a notation about whether communication 3652 is ongoing with the other server. 3654 The legend "responsive", "balanced", or "unresponsive" in each state 3655 indicates whether the server is responsive to all DHCP client 3656 requests, running in load balanced mode, or totally unresponsive in 3657 the respective state. The terms "responsive" and "unresponsive" have 3658 the obvious meanings, while "balanced" means that a DHCP server may 3659 respond to all DHCPREQUEST messages that are RENEWAL or REBINDING, 3660 and to all other messages from clients for which the load balancing 3661 algorithm indicates that it MUST respond to. See sections 5.3 and 3662 9.6.2 for details on load balancing. 3664 In the state transition diagram below, when communication is reesta- 3665 blished between the two servers, each must record the state of the 3666 partner when communication was restored. State transitions on one 3667 server in some cases imply state transitions on the partner server, 3668 so a record of the current state of the partner server must be kept 3669 by each server. 3671 If the state of the partner changes while communicating a server 3672 moves through the communications-failed transition and into whatever 3673 state results. It then immediately moves through whatever state 3674 transition is appropriate given the current state of the partner 3675 server. A server performing this operation SHOULD NOT close the TCP 3676 connection to its partner. 3678 DISCUSSION: 3680 The point of this technique is simplicity, both in explanation of 3681 the protocol and in its implementation. The alternative to this 3682 technique of memory of partner state and automatic state transi- 3683 tion on change of partner state is to have every state in the fol- 3684 lowing diagram have a state transition for every possible state of 3685 the partner. With the approach adopted, only the states in which 3686 communications are reestablished require a state transition for 3687 each possible partner state. 3689 The current state of a server MUST be recorded in stable storage and 3690 thus be available to the server after a server restart. 3692 +---------------+ V +--------------+ 3693 | RECOVER - | | | STARTUP - | 3694 |(unresponsive) | +->|(unresponsive)| 3695 +---------------+ +--------------+ 3696 Comm. OK +-----------------+ 3697 Other State:-RECOVER | PARTNER DOWN - |<-----------------+ 3698 | | | (responsive) | | 3699 All POTENTIAL- +-----------------+ +--------------+ | 3700 Others CONFLICT------------ | --------+ | RESOLUTION -| | 3701 | Comm. OK | | INTERRUPTED | | 3702 UPDREQ(ALL) Other State: | +-| (responsive) | | 3703 Wait UPDDONE | | | | +--------------+ | 3704 Wait MCLT from fail RECOVER All Others| Comm. OK ^ | | 3705 +--------------+ | V V V | Ext. | 3706 |RECOVER-DONE +| +--+ +--------------+ Comm. Cmd. | 3707 |(unresponsive)| | | POTENTIAL + | Failed | | 3708 +--------------+ Wait for +>| CONFLICT |------+ +-->| 3709 Comm. OK Other | |(unresponsive)|<--------+ | 3710 +--Other State:-+ State: | +--------------+ | | 3711 | | | RECOVER | | | | 3712 | All POTENT. DONE | Resolve Conflict | | 3713 | Others: CONFLICT-- | ----+ (see 9.8) | | 3714 | Wait for V V | | 3715 | Other State: NORMAL +-----------------+ | | 3716 | V | NORMAL + | External | | 3717 | +--+----------+-->| (balanced) |-Command---+-- | -----+ 3718 | ^ ^ +-----------------+ | | 3719 | | | | | | 3720 | Wait for Comm. OK Comm. External | 3721 | Other Other Failed Command | 3722 | State: State: | or | | 3723 |RECOVER-DONE NORMAL Start Safe Safe | | 3724 | | COMM. INT. Period Timer Period | | 3725 | Comm. OK. | V expiration | 3726 | Other State: | +------------------+ | | 3727 | RECOVER +--| COMMUNICATIONS - |-----------+ | 3728 V +-------------| INTERRUPTED | Comm. OK | 3729 RECOVER | (responsive) |--Other State:-+ 3730 RECOVER-DONE--------->+------------------+ All Others 3732 Figure 9.2-1: Server state diagram. 3734 9.3. STARTUP state 3736 The STARTUP state affords an opportunity for a server to probe its 3737 partner server, before starting to service DHCP clients. 3739 DISCUSSION: 3741 Without the STARTUP state, a server would likely start in a state 3742 derived from its previously stored state (held in stable storage), 3743 if any. However, this may be inconsistent with the current state 3744 of the partner. The STARTUP state affords the opportunity for a 3745 server to potentially learn the partner's state and determine if 3746 that state is consistent with its derived starting state or 3747 whether some significant state change has occurred at the partner 3748 that forces the server to start in another state. This is 3749 especially critical if significant time has elapsed while the 3750 server was down. 3752 9.3.1. Operation while in STARTUP state 3754 Whenever a server is in STARTUP state, it MUST be unresponsive to 3755 DHCP client requests, and so the time spent in the STARTUP state is 3756 necessarily short, typically on the order of a few seconds to a few 3757 tens of seconds. The exact time spent in the STARTUP state is imple- 3758 mentation dependent, and the primary and secondary server are not 3759 required to spend the same amount of time in the STARTUP state. 3761 Whenever a STATE message is sent to the partner while in STARTUP 3762 state the STARTUP bit MUST be set in the server-flags option and the 3763 previously recorded failover state MUST be placed in the server-state 3764 option. 3766 9.3.2. Transition out of STARTUP state 3768 Each server starts out in startup state every time it initializes 3769 itself, and performs the following algorithm as part of its initiali- 3770 zation: 3772 1. Is there any record in stable storage of a previous failover 3773 state? If yes, set previous-state to the last recorded state 3774 in stable storage, and continue with step 2. 3776 Is there any configuration information that indicates that 3777 this server was previously running but lost its stable 3778 storage? Such information must typically come from some 3779 administrative intervention, since it is difficult for a 3780 server to distinguish first startup from a startup after it 3781 has lost its stable storage. If yes, then set the previous- 3782 state to RECOVER, and set the time-of-failure to whatever time 3783 was configured, and go on to step 2. This time-of-failure 3784 will be used in the transition out of the RECOVER state into 3785 the RECOVER-DONE state, below. 3787 If there is no record of any previous failover state in stable 3788 storage nor of any previous operational activity for this 3789 server, then set the previous-state to PARTNER-DOWN if this 3790 server is a primary and RECOVER if this server is a secondary, 3791 and set the time-of-failure to a time before the maximum- 3792 client-lead-time before now. If using standard Posix times, 0 3793 would typically do quite well. 3795 2. Is the previous-state NORMAL? If yes, set the previous-state 3796 to COMMUNICATIONS-INTERRUPTED. 3798 3. Start the STARTUP state timer. The time that a server remains 3799 in the STARTUP state (absent any communications with its 3800 partner) is implementation dependent and SHOULD be configur- 3801 able. It SHOULD be long enough for a TCP connection to be 3802 created to a heavily loaded partner across a slow network. 3804 4. Attempt to create a TCP connection to the failover partner. 3805 See section 8.2. 3807 5. Wait for "communications okay", i.e., the process discussed in 3808 section 8.2 "Creating the TCP Connection", to complete, 3809 including the receipt of a STATE message from the partner. 3811 When and if communications become "okay", clear the STARTUP 3812 flag, and set the current state to the previous-state. 3814 If the partner is in PARTNER-DOWN state, and if the time at 3815 which it entered PARTNER-DOWN state (as received in the 3816 start-time-of-state option in the STATE message) is later than 3817 the last recorded time of operation of this server, then set 3818 the current state to RECOVER. If the time at which it entered 3819 PARTNER-DOWN state is earlier than the last recorded time of 3820 operation of this server, then set the current state to 3821 POTENTIAL-CONFLICT. 3823 Then, transition to the current state and take the "communica- 3824 tions okay" state transition based on the current state of 3825 this server and the partner. 3827 7. If the startup time expires, take an implementation dependent 3828 action: The server MAY go to the previous-state, or the 3829 server MAY wait. 3831 Reasons to go to previous-state and begin processing: 3833 If the current server is the only operational server, then if 3834 it waits, there will be no operational DHCP servers. This 3835 situation could occur very easily where one server fails and 3836 then the other crashes and reboots. If the rebooting server 3837 doesn't start processing DHCP client requests without first 3838 being in communication with the other server, then the level 3839 of DHCP redundancy is not particularly high. This is an 3840 appropriate approach if the possibility of partition is low, 3841 or if the safe period expiration time is well beyond the time 3842 at which an operator would notice and react to a partition 3843 situation. It is also quite appropriate if the safe period 3844 will never expire. 3846 Reasons to wait: 3848 If the current server has been down for longer than the 3849 maximum-client-lead-time, and it is partitioned from the other 3850 server, then when it returns it will attempt to use its own 3851 available addresses to allocate to new DHCP clients, and the 3852 other server may well be in PARTNER-DOWN state and may have 3853 already allocated some of those available addresses to DHCP 3854 clients. In cases where the possibility of partition is high, 3855 and the safe period expiration time is less than the likely 3856 operator reaction time, this is a good approach to use. 3858 9.4. PARTNER-DOWN state 3860 PARTNER-DOWN state is a state either server can enter. When in this 3861 state, the server does not assume that the other server could still 3862 be operating and servicing a different set of clients, but instead 3863 assumes that it is the only server operating. If one server is in 3864 PARTNER-DOWN state, the other server MUST NOT be operating. 3866 9.4.1. Upon entry to PARTNER-DOWN state 3868 No special actions are required when entering PARTNER-DOWN state. 3870 The server should continue to attempt to connect to the partner 3871 periodically. 3873 9.4.2. Operation while in PARTNER-DOWN state 3875 A server in PARTNER-DOWN state MUST respond to DHCP client requests. 3876 It will allow renewal of all outstanding leases on IP addresses, and 3877 will allocate IP addresses from its own pool, and after a fixed 3878 period of time (the MCLT interval) has elapsed from entry into 3879 PARTNER-DOWN state, it will allocate IP addresses from the set of all 3880 available IP addresses. 3882 Once a server has entered NORMAL state, the PARTNER-DOWN state is 3883 entered only on command of an external agency (typically an adminis- 3884 trator of some sort) or after the expiration of an externally config- 3885 ured minimum safe-time after the beginning of COMMUNICATIONS- 3886 INTERRUPTED state. 3888 Any available IP address tagged as available for allocation by the 3889 other server (at entry to PARTNER-DOWN state) MUST NOT be allocated 3890 to a new client until the maximum-client-lead-time beyond the entry 3891 into PARTNER-DOWN state has elapsed. 3893 A server in PARTNER-DOWN state MUST NOT allocate an IP address to a 3894 DHCP client different from that to which it was allocated at the 3895 entrance to PARTNER-DOWN state until the maximum-client-lead-time 3896 beyond the maximum of the following times: client expiration time, 3897 most recently transmitted potential-expiration-time, most recently 3898 received ack of potential-expiration-time from the partner, and most 3899 recently acked potential-expiration-time to the partner. See section 3900 7.1.5 for details. If this time would be earlier than the current 3901 time plus the maximum-client-lead-time, then the time the server 3902 entered PARTNER-DOWN state plus the maximum-client-lead-time is used. 3904 Two options exist for lease times given out while in PARTNER-DOWN 3905 state, with different ramifications flowing from each. 3907 If the server wishes the Failover protocol to protect it from loss of 3908 stable storage in PARTNER-DOWN state, then it should ensure that the 3909 MCLT based lease time restrictions in Section 5.1 are maintained, 3910 even in PARTNER-DOWN state. 3912 If the server wishes to forego the protection of the Failover proto- 3913 col in the event of loss of stable storage, then it need recognize no 3914 restrictions on actual client lease times while in PARTNER-DOWN 3915 state. 3917 A server in PARTNER-DOWN state MUST continue to attempt to establish 3918 communications and synchronization with its partner. 3920 9.4.3. Transitions out of PARTNER-DOWN state 3922 When a server in PARTNER-DOWN state succeeds in establishing a con- 3923 nection to its partner, its actions are conditional on the state and 3924 flags received in the STATE message from the other server as part of 3925 the process of establishing the connection. 3927 If the STARTUP bit is set in the server-flags option of a received 3928 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 3929 transitions based on reestablishing communications. Essentially, if a 3930 server is in PARTNER-DOWN state, it ignores all STATE messages from 3931 its partner that have the STARTUP bit set in the server-flags option 3932 of the STATE message. 3934 If the STARTUP bit is not set in the server-flags option of a STATE 3935 message received from its partner, then a server in PARTNER-DOWN 3936 state takes the following actions based on the value of the server- 3937 state option in the received STATE message: 3939 o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN or 3940 POTENTIAL-CONFLICT state 3942 transition to POTENTIAL-CONFLICT state 3944 o partner in RECOVER state 3946 stay in PARTNER-DOWN state 3948 o partner in RECOVER-DONE state 3950 transition into NORMAL state 3952 9.5. RECOVER state 3954 This state indicates that the server has no information in its stable 3955 storage or that it is re-integrating with a server in PARTNER-DOWN 3956 state after it has been down. A server in this state MUST attempt to 3957 refresh its stable storage from the other server. 3959 9.5.1. Operation in RECOVER state 3961 A server in RECOVER MUST NOT respond to DHCP client requests. 3963 A server in RECOVER state will attempt to reestablish communications 3964 with the other server. 3966 9.5.2. Transitions out of RECOVER state 3968 If the other server is in POTENTIAL-CONFLICT state when communica- 3969 tions are reestablished, then the server in RECOVER state will move 3970 to POTENTIAL-CONFLICT state itself. 3972 If the other server is in RECOVER state, then this server SHOULD sig- 3973 nal an error and halt processing. 3975 If the other server is in any other state, then the server in RECOVER 3976 state will request an update of missing binding information by send- 3977 ing an UPDREQ message. If the server has been instructed (through 3978 configuration or other external agency) that it has lost its stable 3979 storage, or if it has deduced that from the fact that it has no 3980 record of ever having talked to its partner, while its partner does 3981 have a record of communicating with it, it MUST send an UPDREQALL 3982 message, otherwise it MUST send an UPDREQ message. 3984 It will wait for an UPDDONE message, and upon receipt of that message 3985 it will start a timer whose expiration is set to a time equal to the 3986 time the server went down (if known) or the current time (if the 3987 down-time is unknown) plus the maximum-client-lead-time. When this 3988 timer goes off, the server will transition into RECOVER-DONE state. 3989 This is to allow any IP addresses that were allocated by this server 3990 prior to loss of its client binding information in stable storage to 3991 contact the other server or to time out. 3993 See Figure 9.5.2-1. 3995 DISCUSSION: 3997 The actual requirement on this wait period in RECOVER is that it 3998 start not before the recovering server went down, not necessarily 3999 when it came back up. If the time when the recovering server 4000 failed is known, it could be communicated to the recovering server 4001 (perhaps through actions of the network administrator), and the 4002 wait period could be reduced to the maximum-client-lead-time less 4003 the difference between the current time and the time the server 4004 failed. In this way, the waiting period could be minimized. 4005 Various heuristics could be used to estimate this time, for exam- 4006 ple if the recovering server periodically updates stable storage 4007 with a time stamp, the wait period could be calculated to start at 4008 the time of the last update of stable storage plus the time 4009 required for the next update (which never occurred). This esti- 4010 mate is later than the server went down, but probably not too much 4011 later. 4013 If an UPDDONE message isn't received within an implementation 4014 dependent amount of time, and no BNDUPD messages are being received, 4015 the connection SHOULD be dropped. 4017 A B 4018 Server Server 4020 | | 4021 RECOVER PARTNER-DOWN 4022 | | 4023 | >--UPDREQ--------------------> | 4024 | | 4025 | <---------------------BNDUPD--< | 4026 | >--BNDACK--------------------> | 4027 ... ... 4028 | | 4029 | <---------------------BNDUPD--< | 4030 | >--BNDACK--------------------> | 4031 | | 4032 | <--------------------UPDDONE--< | 4033 | | 4034 Wait MCLT from last known | 4035 time of operation | 4036 | | 4037 RECOVER-DONE | 4038 | | 4039 | >--STATE-(RECOVER-DONE)------> | 4040 | NORMAL 4041 | <-------------(NORMAL)-STATE--< | 4042 NORMAL | 4043 | >---- State-(NORMAL)---------------> 4044 | | 4045 | | 4047 Figure 9.5.2-1: Transition out of RECOVER state 4049 9.6. NORMAL state 4051 NORMAL state is the state used by a server when it is communicating 4052 with the other server, and any required resynchronization has been 4053 performed. While some bindings database synchronization is performed 4054 in NORMAL state, potential conflicts are resolved prior to entry into 4055 NORMAL state as is binding database data loss. 4057 9.6.1. Upon entry to NORMAL state 4059 When entering NORMAL state, a server will send to the other server 4060 all currently unacknowledged binding updates as BNDUPD messages. 4062 When the above process is complete, if the server entering NORMAL 4063 state is a secondary server, then it will request IP addresses for 4064 allocation using the POOLREQ message. 4066 9.6.2. Processing DHCP client requests and load balancing 4068 In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or 4069 DHCPREQUEST/REBINDING request it receives. And, it processes other 4070 requests only for those clients as dictated by the load balancing 4071 algorithm specified in [LOADB]. 4073 As discussed in section 5.3, each server will take the client- 4074 identifier from each DHCP client request (or the client-hardware- 4075 address, i.e., the htype concatenated to the front of the chaddr if 4076 no client-identifier is present in the request) and use it as the 4077 'Request ID' specified in [LOADB]. After applying the algorithm 4078 specified in [LOADB] and comparing the result with the hash bucket 4079 assignment (performed during connect processing between failover 4080 servers), each failover server will be able to unambiguously deter- 4081 mine if it should process the DHCP client request. 4083 9.6.3. Operation in NORMAL state 4085 When in NORMAL state, for every DHCP client request that it 4086 processes, as determined by the algorithm described in section 9.6.2, 4087 above, a server will operate in the following manner: 4089 o Lease time calculations 4091 As discussed in section 5.2.1, "Control of lease time", the 4092 lease interval given to a DHCP client can never be more than the 4093 MCLT greater than the most recently received potential- 4094 expiration-time from the failover partner or the current time, 4095 whichever is later. 4097 As long as a server adheres to this constraint, the specifics of 4098 the lease interval that it gives to a DHCP client or the value 4099 of the potential-expiration-time sent to its failover partner 4100 are implementation dependent. One possible approach is dis- 4101 cussed in section 5.2.1, but that particular approach is in no 4102 way required by this protocol. 4104 See section 7.1.5 for details concerning the storage of time 4105 associated IP addresses and how to use these times when calcu- 4106 lating lease times for DHCP clients. 4108 o Lazy update of partner server 4110 After an ACK of a IP address binding, the server servicing a 4111 DHCP client request attempts to update its partner with the new 4112 binding information. The lease time used in the update of the 4113 secondary MUST be at least that given to the DHCP client in the 4114 DHCPACK, and the potential-expiration-time MUST be at least the 4115 lease time, and SHOULD be considerably longer. 4117 o Reallocation of IP addresses between clients 4119 Whenever a client binding is released or expires, a BNDUPD mes- 4120 sage must be sent to partner, setting the binding state to 4121 RELEASED or EXPIRED. However, until a BNDACK is received for 4122 this message, the IP address cannot be allocated to another 4123 client. It can be allocated to the same client again. 4125 In normal state, each server receives binding updates from its 4126 partner server in BNDUPD messages. It records these in its client 4127 binding database in stable storage and then sends a corresponding 4128 BNDACK message to the primary server. It MUST ensure that the infor- 4129 mation is recorded in stable storage prior to sending the BNDACK mes- 4130 sage back to its partner. 4132 9.6.4. Transitions out of NORMAL state 4134 If an external command is received by a server in NORMAL state 4135 informing it that its partner is down, then transition into PARTNER- 4136 DOWN state. Generally, this would be an unusual situation, where 4137 some external agency knew the partner server was down. Using the 4138 command in this case would be appropriate if the polling interval and 4139 timeout were long. 4141 If a server in NORMAL state fails to receive acks to messages sent to 4142 its partner for an implementation dependent period of time, it MAY 4143 move into COMMUNICATIONS-INTERRUPTED state. This situation might 4144 occur if the partner server was capable of maintaining the TCP con- 4145 nection between the server and also capable of sending a CONTACT mes- 4146 sage every tSend seconds, but was (for some reason) incapable of pro- 4147 cessing BNDUPD messages. 4149 If the communications is determined to not be "ok" (as defined in 4150 section 8), then transition into COMMUNICATIONS-INTERRUPTED state. 4152 If a server in NORMAL state receives any messages from its partner 4153 where the partner has changed state from that expected by the server 4154 in NORMAL state, then the server should transition into 4155 COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- 4156 sition from there. For example, it would be expected for the partner 4157 to transition from POTENTIAL-CONFLICT into NORMAL state, but not for 4158 the partner to transition from NORMAL into POTENTIAL-CONFLICT state. 4160 9.7. COMMUNICATIONS-INTERRUPTED State 4162 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 4163 unable to communicate with the other server. Primary and secondary 4164 servers cycle automatically (without administrative intervention) 4165 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 4166 connection between them fails and recovers, or as the partner server 4167 cycles between operational and non-operational. No duplicate IP 4168 address allocation can occur while the servers cycle between these 4169 states. 4171 9.7.1. Upon entry to COMMUNICATIONS-INTERRUPTED state 4173 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 4174 configured to support an automatic transition out of COMMUNICATIONS- 4175 INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period" 4176 has been configured, see section 10), then a timer MUST be started 4177 for the length of the configured safe period. 4179 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 4180 the NORMAL state SHOULD raise some alarm condition to alert adminis- 4181 trative staff to a potential problem in the DHCP subsystem. 4183 9.7.2. Operation in COMMUNICATIONS-INTERRUPTED State 4185 In this state a server MUST respond to all DHCP client requests, and 4186 the algorithm for load balancing described in section 5.3 MUST NOT be 4187 used. When allocating new IP addresses, each server allocates from 4188 its own IP address pool, where the primary MUST allocate only FREE IP 4189 addresses, and the secondary MUST allocate only BACKUP IP addresses. 4190 When responding to renewal requests, each server will allow continued 4191 renewal of a DHCP client's current lease on an IP address irrespec- 4192 tive of whether that lease was given out by the receiving server or 4193 not, although the renewal period MUST NOT exceed the maximum client 4194 lead time (MCLT) beyond the latest of: 1) the potential-expiration- 4195 time already acknowledged by the other server, or 2) the lease- 4196 expiration-time, or 3) the potential-expiration-time received from 4197 the partner server. 4199 However, since the server cannot communicate with its partner in this 4200 state, the acknowledged-potential-expiration time will not be updated 4201 in any new bindings. This is likely to eventually cause the actual- 4202 client-lease-times to be the current time plus the maximum-client- 4203 lead-time (unless this is greater than the desired-client-lease- 4204 time). 4206 9.7.3. Transition out of COMMUNICATIONS-INTERRUPTED State 4208 If the safe period timer expires while a server is in the 4209 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 4210 PARTNER-DOWN state. 4212 If an external command is received by a server in COMMUNICATIONS- 4213 INTERRUPTED state informing it that its partner is down, it will 4214 transition immediately into PARTNER-DOWN state. 4216 If communications is restored with the other server, then the server 4217 in COMMUNICATIONS-INTERRUPTED state will transition into another 4218 state based on the state of the partner: 4220 o partner in NORMAL or COMMUNICATIONS-INTERRUPTED 4222 The partner SHOULD NOT be in NORMAL state here, since upon res- 4223 toration of communications it MUST have created a new TCP con- 4224 nection which would have forced it into COMMUNICATIONS- 4225 INTERRUPTED state. Still, we should account for every state 4226 just in case. 4228 Transition into the NORMAL state. 4230 o partner in RECOVER 4232 Stay in COMMUNICATIONS-INTERRUPTED state. 4234 o partner in RECOVER-DONE 4236 Transition into NORMAL state. 4238 o partner in PARTNER-DOWN or POTENTIAL-CONFLICT 4240 Transition into POTENTIAL-CONFLICT state. 4242 o partner in PAUSED 4244 Stay in COMMUNICATIONS-INTERRUPTED state. 4246 o partner in SHUTDOWN 4248 Transition into PARTNER-DOWN state. 4250 The following figure illustrates the transition from NORMAL to 4251 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 4253 Primary Secondary 4254 Server Server 4256 NORMAL NORMAL 4257 | >--CONTACT-------------------> | 4258 | <--------------------CONTACT--< | 4259 | [TCP connection broken] | 4260 COMMUNICATIONS : COMMUNICATIONS 4261 INTERRUPTED : INTERRUPTED 4262 | [attempt new TCP connection] | 4263 | [connection succeeds] | 4264 | | 4265 | >--CONNECT-------------------> | 4266 | <-----------------CONNECTACK--< | 4267 | <-------------------STATE-----< | 4268 | NORMAL 4269 | >--STATE---------------------> | 4270 NORMAL | 4271 | >--BNDUPD--------------------> | 4272 | <---------------------BNDACK--< | 4273 | | 4274 | <---------------------BNDUPD--< | 4275 | >------BNDACK----------------> | 4276 ... ... 4277 | | 4278 | <--------------------POOLREQ--< | 4279 | >--POOLRESP-(2)--------------> | 4280 | | 4281 | >--BNDUPD-(#1)---------------> | 4282 | <---------------------BNDACK--< | 4283 | | 4284 | <--------------------POOLREQ--< | 4285 | >--POOLRESP-(0)--------------> | 4286 | | 4287 | >--BNDUPD-(#2)---------------> | 4288 | <---------------------BNDACK--< | 4289 | | 4291 Figure 9.7.3-1: Transition from NORMAL to COMMUNICATIONS- 4292 INTERRUPTED and back (example with 2 4293 addresses allocated to secondary) 4295 9.8. POTENTIAL-CONFLICT state 4297 This state indicates that the two servers are attempting to re- 4298 integrate with each other, but at least one of them was running in a 4299 state that did not guarantee automatic reintegration would be 4300 possible. In POTENTIAL-CONFLICT state the servers may determine that 4301 the same IP address has been offered and accepted by two different 4302 DHCP clients. 4304 It is a goal of this protocol to minimize the possibility that 4305 POTENTIAL-CONFLICT state is ever entered. 4307 9.8.1. Upon entry to POTENTIAL-CONFLICT state 4309 When a primary server enters POTENTIAL-CONFLICT state it should 4310 request that the secondary send it all updates of which it is 4311 currently unaware by sending an UPDREQ message to the secondary 4312 server. 4314 A secondary server entering POTENTIAL-CONFLICT state will wait for 4315 the primary to send it an UPDREQ message. 4317 9.8.2. 4319 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 4320 DHCP requests. 4322 9.8.3. Transitions out of POTENTIAL-CONFLICT state 4324 If communications fails with the partner while in POTENTIAL-CONFLICT 4325 state, then the server will transition to RESOLUTION-INTERRUPTED 4326 state. 4328 Whenever either server receives an UPDDONE message from its partner 4329 while in POTENTIAL-CONFLICT state, it MUST transition to NORMAL 4330 state. This will cause the primary server to leave POTENTIAL- 4331 CONFLICT state prior to the secondary, since the primary sends an 4332 UPDREQ message and receives an UPDDONE before the secondary sends an 4333 UPDREQ message and receives its UPDDONE message. 4335 When a secondary server receives an indication that the primary 4336 server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it 4337 SHOULD send an UPDREQ message to the primary server. 4339 Primary Secondary 4340 Server Server 4342 | | 4343 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 4344 | | 4345 | >--UPDREQ--------------------> | 4346 | | 4347 | <---------------------BNDUPD--< | 4348 | >--BNDACK--------------------> | 4349 ... ... 4350 | | 4351 | <---------------------BNDUPD--< | 4352 | >--BNDACK--------------------> | 4353 | | 4354 | <--------------------UPDDONE--< | 4355 NORMAL | 4356 | >--STATE--(NORMAL)-----------> | 4357 | <---------------------UPDREQ--< | 4358 | | 4359 | >--BNDUPD--------------------> | 4360 | <---------------------BNDACK--< | 4361 ... ... 4362 | >--BNDUPD--------------------> | 4363 | <---------------------BNDACK--< | 4364 | | 4365 | >--UPDDONE-------------------> | 4366 | NORMAL 4367 | | 4368 | <--------------------POOLREQ--< | 4369 | >------POOLRESP-(n)----------> | 4370 | addresses | 4372 Figure 9.8.3-1: Transition out of POTENTIAL-CONFLICT 4374 9.9. RESOLUTION-INTERRUPTED state 4376 This state indicates that the two servers were attempting to re- 4377 integrate with each other in POTENTIAL-CONFLICT state, but 4378 communications failed prior to completion of re-integration. 4380 If the servers remained in POTENTIAL-CONFLICT while communications 4381 was interrupted, neither server would be responsive to DHCP client 4382 requests, and if one server had crashed, then there might be no 4383 server able to process DHCP requests. 4385 9.9.1. Upon entry to RESOLUTION-INTERRUPTED state 4387 When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an 4388 alarm condition to alert administrative staff of a problem in the 4389 DHCP subsystem. 4391 9.9.2. Operation in RESOLUTION-INTERRUPTED state 4393 In this state a server MUST respond to all DHCP client requests, and 4394 any load balancing (described in section 5.3) MUST NOT be used. When 4395 allocating new IP addresses, each server SHOULD allocate from its own 4396 IP address pool (if that can be determined), where the primary SHOULD 4397 allocate only FREE IP addresses, and the secondary SHOULD allocate 4398 only BACKUP IP addresses. When responding to renewal requests, each 4399 server will allow continued renewal of a DHCP client's current lease 4400 on an IP address irrespective of whether that lease was given out by 4401 the receiving server or not, although the renewal period MUST not 4402 exceed the maximum client lead time (MCLT) beyond the latest of: 1) 4403 the potential-expiration-time already acknowledged by the other 4404 server or 2) the lease-expiration-time or 3) `potential-expiration- 4405 time received from the partner server. 4407 However, since the server cannot communicate with its partner in this 4408 state, the acknowledged-potential-expiration time will not be updated 4409 in any new bindings. 4411 9.9.3. Transitions out of RESOLUTION-INTERRUPTED state 4413 If an external command is received by a server in RESOLUTION- 4414 INTERRUPTED state informing it that its partner is down, it will 4415 transition immediately into PARTNER-DOWN state. 4417 If communications is restored with the other server, then the server 4418 in RESOLUTION-INTERRUPTED state will transition into POTENTIAL- 4419 CONFLICT state. 4421 9.10. RECOVER-DONE state 4423 This state exists to allow an interlocked transition for one server 4424 from RECOVER state and another server from PARTNER-DOWN or 4425 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 4427 9.10.1. Operation in RECOVER-DONE state 4429 A server in RECOVER-DONE state MUST respond only to 4430 DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 4432 9.10.2. Transitions out of RECOVER-DONE state 4434 When a server in RECOVER-DONE state determines that its partner 4435 server has entered NORMAL state, then it will transition into NORMAL 4436 state as well. 4438 If communications fails while in RECOVER-DONE state, a server will 4439 stay in RECOVER-DONE state. 4441 9.11. PAUSED state 4443 This state exists to allow one server to inform another that it will 4444 be out of service for what is predicted to be a relatively short 4445 time, and to allow the other server to transition to COMMUNICATIONS- 4446 INTERRUPTED state immediately and to begin servicing all DHCP clients 4447 with no interruption in service to new DHCP clients. 4449 A server which is aware that it is shutting down temporarily SHOULD 4450 send a STATE message with the server-state option containing PAUSED 4451 state and close the TCP connection. 4453 While a server may or may not transition internally into PAUSED 4454 state, the 'previous' state determined when it is restarted MUST be 4455 the state the server was in prior to receiving the command to shut- 4456 down and restart and which precedes its entry into the PAUSED state. 4457 See section 9.3.2 concerning the use of the previous state upon 4458 server restart. 4460 9.11.1. Upon entry to PAUSED state 4462 When entering PAUSED state, the server MUST store the previous state 4463 in stable storage, and use that state as the previous state when it 4464 is restarted. 4466 9.11.2. Transitions out of PAUSED state 4468 A server transitions out of PAUSED state by being restarted. At that 4469 time, the previous state MUST be the state the server was in prior to 4470 entering the PAUSED state. 4472 9.12. SHUTDOWN state 4474 This state exists to allow one server to inform another that it will 4475 be out of service for what is predicted to be a relatively long time, 4476 and to allow the other server to transition immediately to PARTNER- 4477 DOWN state, and take over completely for the server going down. 4479 A server which is aware that it is shutting down SHOULD send a STATE 4480 message with the server-state field containing SHUTDOWN. 4482 While a server may or may not transition internally into SHUTDOWN 4483 state, the 'previous' state determined when it is restarted MUST be 4484 the state active prior to the command to shutdown. See section 9.3.2 4485 concerning the use of the previous state upon server restart. 4487 9.12.1. Upon entry to SHUTDOWN state 4489 When entering SHUTDOWN state, the server MUST record the previous 4490 state in stable storage for use when the server is restarted. It 4491 also MUST record the current time as the last time operational. 4493 A server which is aware that it is shutting down SHOULD send a STATE 4494 message with the server-state field containing SHUTDOWN. 4496 9.12.2. Operation in SHUTDOWN state 4498 A server in SHUTDOWN state MUST NOT respond to any DHCP client input. 4500 If a server receives any message indicating that the partner has 4501 moved to PARTNER-DOWN state while it is in SHUTDOWN state then it 4502 MUST record RECOVER state as the previous state to be used when it is 4503 restarted. 4505 A server SHOULD wait for a few seconds after informing the partner of 4506 entry into SHUTDOWN state (if communications are okay) to determine 4507 if the partner entered PARTNER-DOWN state. 4509 9.12.3. Transitions out of SHUTDOWN state 4511 A server transitions out of SHUTDOWN state by being restarted. 4513 10. Safe Period 4515 Due to the restrictions imposed on each server while in 4516 COMMUNICATIONS-INTERRUPTED state, long-term operation in this state 4517 is not feasible for either server. One reason that these states 4518 exist at all, is to allow the servers to easily survive transient 4519 network communications failures of a few minutes to a few days 4520 (although the actual time periods will depend a great deal on the 4521 DHCP activity of the network in terms of arrival and departure of 4522 DHCP clients on the network). 4524 Eventually, when the servers are unable to communicate, they will 4525 have to move into a state where they no longer can re-integrate 4526 without some possibility of a duplicate IP address allocation. There 4527 are two ways that they can move into this state (known as PARTNER- 4528 DOWN). 4530 They can either be informed by external command that, indeed, the 4531 partner server is down. In this case, there is no difficulty in mov- 4532 ing into the PARTNER-DOWN state since it is an accurate reflection of 4533 reality and the protocol has been designed to operate correctly (even 4534 during reintegration) as long as, when in PARTNER-DOWN state the 4535 partner is, indeed, down. 4537 The more difficult scenario is when the servers are running unat- 4538 tended for extended periods, and in this case an option is provided 4539 to configure something called a "safe-period" into each server. This 4540 OPTIONAL safe-period is the period after which either the primary or 4541 secondary server will automatically transition to PARTNER-DOWN from 4542 COMMUNICATIONS-INTERRUPTED state. If this transition is completed 4543 and the partner is not down, then the possibility of duplicate IP 4544 address allocations will exist. 4546 The goal of the "safe-period" is to allow network operations staff 4547 some time to react to a server moving into COMMUNICATIONS-INTERRUPTED 4548 state. During the safe-period the only requirement is that the net- 4549 work operations staff determine if both servers are still running -- 4550 and if they are, to either fix the network communications failure 4551 between them, or to take one of the servers down before the expira- 4552 tion of the safe-period. 4554 The length of the safe-period is installation dependent, and depends 4555 in large part on the number of unallocated IP addresses within the 4556 subnet address pool and the expected frequency of arrival of previ- 4557 ously unknown DHCP clients requiring IP addresses. Many environments 4558 should be able to support safe-periods of several days. 4560 During this safe period, either server will allow renewals from any 4561 existing client. The only limitation concerns the need for IP 4562 addresses for the DHCP server to hand out to new DHCP clients and the 4563 need to re-allocate IP addresses to different DHCP clients. 4565 The number of "extra" IP addresses required is equal to the expected 4566 total number of new DHCP clients encountered during the safe period. 4567 This is dependent only on the arrival rate of new DHCP clients, not 4568 the total number of outstanding leases on IP addresses. 4570 In the unlikely event that a relatively short safe period of an hour 4571 is all that can be used (given a dearth of IP addresses or a very 4572 high arrival rate of new DHCP clients), even that can provide sub- 4573 stantial benefits in allowing the DHCP subsystem to ride through 4574 minor problems that could occur and be fixed within that hour. In 4575 these cases, no possibility of duplicate IP address allocation 4576 exists, and re-integration after the failure is solved will be 4577 automatic and require no operator intervention. 4579 11. Security 4581 The Failover protocol communicates DHCP lease activity and this data 4582 is generally easily discovered via other means, such as by pinging 4583 addresses and doing DNS lookups. Therefore, the need to encrypt the 4584 data over the wire is likely not great (though some sites may feel 4585 differently). 4587 However, it is very desirable to assure the integrity of failover 4588 partners and to thus ensure proper operation of the servers. For 4589 example, denial of service attacks are possible by the communication 4590 of invalid state information to one or both servers. 4592 Therefore, the Failover protocol MUST be capable of being secured by 4593 using a simple shared secret message digest which covers each mes- 4594 sage. This provides authentication of the servers, but does not pro- 4595 vide encryption of the data exchange. 4597 The Failover protocol MAY also be secured by using TLS [RFC 2246] 4598 (Transport Layer Security) if encryption of the data exchange is 4599 desired. The use of the shared secret or TLS will not protect 4600 against TCP or IP layer attacks (such as someone sending fake TCP RST 4601 segments). IPsec SHOULD be used to protect against most (if not all) 4602 of these kinds of attacks. 4604 11.1. Simple shared secret 4606 Messages between the failover partners are authenticated through the 4607 use of a shared secret, which is never sent over the network and must 4608 be known by each server. How each server is told about this shared 4609 secret and secures its storage of the shared secret is outside the 4610 scope of this document. If a server is configured with a shared 4611 secret for a partner, it MUST send the message-digest option in ALL 4612 messages to that partner and it MUST treat any messages received from 4613 that partner without a message-digest option as failing authentica- 4614 tion. 4616 If a server is not configured with a shared secret for a partner, it 4617 MUST NOT send the message-digest option in any message to that 4618 partner and it MUST treat any messages received from that partner 4619 with a message-digest option as failing authentication. 4621 The shared secret is used to calculate a 16 octet message-digest 4622 which is sent in every failover message as the message-digest option. 4623 See section 12.16. The message-digest contains a one-way 16 octet MD5 4624 [RFC 1321] hash calculated over a stream of octets consisting of the 4625 entire message concatenated with the shared secret. 4627 For calculation, the message includes the message-digest option with 4628 the message-digest data zeroed (16-octets of zero). Once the calcula- 4629 tion is complete, these 16 octets of zero are replaced by the 16- 4630 octet MD5 hash and the message is sent. 4632 For verification, the 16-octet message-digest is saved and replaced 4633 with 16-octets of zero and calculated per above. The resulting MD5 4634 hash is compared to the received hash and if they match, the message 4635 is assumed authenticated. 4637 A failover partner that fails to authenticate a received message or 4638 receives a message without a message-digest option when configured 4639 with a shared secret MUST close the connection immediately and take 4640 steps to notify operators. 4642 This use of the shared secret is very similar to that used for RADIUS 4643 Accounting [RFC 2139]. 4645 11.2. TLS 4647 TLS, Transport Layer Security, as specified in [RFC 2246] MAY be 4648 used. The use of TLS would be similar to the way it is used with 4649 SMTP [RFC 2487] and IMAP/POP3/ACAP [RFC 2595]. 4651 To request the use of TLS, the server that successfully opened a con- 4652 nection to its peer MUST send the TLS option as part of the CONNECT 4653 message. The server receiving the TLS option MUST respond with a 4654 TLS-reply option indicating its acceptance or rejection of the TLS- 4655 request in the CONNECT message. 4657 If the CONNECTACK message contained a TLS-reply of 1 , then both 4658 servers begin TLS negotiation. 4660 Upon completion of this negotiation, the server which originally sent 4661 the CONNECT message MUST resend its CONNECT message without any TLS- 4662 request, and must wait for a corresponding CONNECTACK. 4664 Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [RFC 2246] 4665 cipher suite is REQUIRED in Failover servers supporting TLS. This is 4666 important as it assures that any two compliant implementations can be 4667 configured to interoperate. 4669 12. Failover Options 4671 This section lists all of the options that are currently defined to 4672 be used with the failover protocol. See section 6.2 for details con- 4673 cerning time values. 4675 12.1. addresses-transferred 4677 A 32 bit unsigned long in network byte order. Reports the number of 4678 addresses transferred by the primary to the secondary server 4679 (addresses to be used for the secondary server's private address 4680 pool). 4682 Code Len Number of Addresses 4683 +-----+-----+-----+-----+----+-----+-----+-----+ 4684 | 0 | 1 | 0 | 4 | n1 | n2 | n3 | n4 | 4685 +-----+-----+-----+-----+----+-----+-----+-----+ 4687 12.2. assigned-IP-address 4689 The DHCP managed IP address to which this message refers. 4691 Code Len Address 4692 +-----+-----+-----+-----+----+-----+-----+-----+ 4693 | 0 | 2 | 0 | 4 | a1 | a2 | a3 | a4 | 4694 +-----+-----+-----+-----+----+-----+-----+-----+ 4696 12.3. binding-status 4698 This option is used to convey the current state of a binding. 4700 Code Len Type 4701 +-----+-----+-----+-----+-----+ 4702 | 0 | 3 | 0 | 1 | 1-7 | 4703 +-----+-----+-----+-----+-----+ 4705 Legal values for this option are: 4707 Value Binding Status 4708 ----- ------------------------------------------------ 4709 1 FREE Lease is currently available 4710 2 ACTIVE Lease is assigned to a client 4711 3 EXPIRED Lease has expired 4712 4 RELEASED Lease has been released by client 4713 5 ABANDONED A server, or client flagged address as unusable 4714 6 RESET Lease was freed by some external agent 4715 7 BACKUP Lease belongs to secondary's private address pool 4716 8 BACKUP-RESERVED Lease belongs to secondary's private address pool 4717 as well as primary's since it is reserved on primary. 4719 12.4. client-identifier 4721 This is the client-identifier for the client associated with a 4722 binding. The client-identifier data is subject to the same 4723 conventions as DHCP option 81 [RFC 2132]. 4725 Code Len Client Identifier 4726 +-----+-----+-----+-----+----+-----+--- 4727 | 0 | 4 | 0 | n | i1 | i2 | ... 4728 +-----+-----+-----+-----+----+-----+-- 4730 12.5. client-hardware-address 4732 This is the hardware address for the client associated with a 4733 binding. Byte t1 (type) MUST be set to the proper ARP hardware 4734 address code, as defined in the ARP section of RFC 1700 (it MUST NOT 4735 be zero!) 4737 Code Len htype chaddr 4738 +-----+-----+-----+-----+----+-----+-----+--- 4739 | 0 | 5 | 0 | n | t1 | c1 | c2 | ... 4740 +-----+-----+-----+-----+----+-----+-----+--- 4742 12.6. client-last-transaction-time 4744 The time at which this server last received a DHCP request from a 4745 particular client expressed as an absolute time (see section 6.2). 4747 Code Len client last transaction time 4748 +-----+-----+-----+-----+----+-----+-----+-----+ 4749 | 0 | 6 | 0 | 4 | t1 | t2 | t3 | t4 | 4750 +-----+-----+-----+-----+----+-----+-----+-----+ 4752 12.7. client-reply-options 4754 This option contains options from a DHCP server's reply to a DHCP 4755 client request. It is sent in a BNDUPD message. The first 4 bytes 4756 of the option contain the "magic number" of the option area from 4757 which the DHCP reply options were taken and serves to define the 4758 format of the rest of the sub-options contained in this option. 4759 After the magic number, the options included are in the normal 4760 options format appropriate for that magic number. 4762 A server SHOULD NOT include all of the options in a DHCP server's 4763 reply to a client's request in this option, but rather a server 4764 SHOULD include only those options which are of likely interest to its 4765 partner server. See section 7.1 for details. 4767 Code Len Magic Number Embedded options 4768 +-----+-----+-----+-----+----+----+----+----+----+----+-- 4769 | 0 | 7 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 4770 +-----+-----+-----+-----+----+----+----+----+----+----+-- 4772 12.8. client-request-options 4774 This option contains options from a DHCP client's request. It is 4775 sent in a BNDUPD message. The first 4 bytes of the option contain 4776 the "magic number" of the option area from which the DHCP client's 4777 request options were taken and serves to define the format of the 4778 rest of the sub-options contained in this option. After the magic 4779 number, the options included are in the normal options format 4780 appropriate for that magic number. 4782 A server SHOULD NOT include all of the options in a DHCP client 4783 request in this option, but rather a server SHOULD include only those 4784 options which are of likely interest to its partner server. See 4785 section 7.1 for details. 4787 Code Len Magic Number Embedded options 4788 +-----+-----+-----+-----+----+----+----+----+----+----+-- 4789 | 0 | 8 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 4790 +-----+-----+-----+-----+----+----+----+----+----+----+-- 4792 12.9. DDNS 4794 If an implementation supports Dynamic DNS updates, this option is 4795 used to communicate the status of the DDNS update associated with a 4796 particular lease binding. The Flags field conveys the types of DNS 4797 RRs that are to be updated by the DHCP server, and the status of the 4798 DDNS update. The Domain Name field conveys the DNS FQDN that the 4799 DHCP server is using to refer to the client, in DNS encoding as 4800 specified in [RFC 1035]. 4802 Code Len Flags Domain Name 4803 +-----+-----+-----+-----+-----+------+------+-----+------ 4804 | 0 | 9 | 0 | n | flags | d1 | d2 | ... 4805 +-----+-----+-----+-----+-----+------+------+-----+------ 4807 The Flags field is a 16-bit field; several bit positions are 4808 specified here. 4810 1 1 1 1 1 1 4811 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 4812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4813 |C|A|D|P| MBZ | 4814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4816 The bits (numbered from the least-significant bit in network 4817 byte-order) are used as follows: 4819 0 (C): A RR update successfully completed 4820 1 (A): Server is controlling A RR on behalf of the client 4821 2 (D): PTR RR update successfully completed (Done) 4822 3 (P): Server is controlling PTR RR on behalf of the client 4823 4-15 : Must be zero 4825 All of the unspecified bit positions SHOULD be set to 0 by servers 4826 sending the Failover-DDNS option, and they MUST be ignored by servers 4827 receiving the option. 4829 12.10. delayed-service-parameter 4831 The delayed-service-parameter is an optional load balancing tuning 4832 parameter, defined in [LOADB]. If it is used, it MUST be sent in the 4833 same message as the hash-bucket-assignment option (see section 4834 12.11). Format : 4836 Code Len Seconds 4837 +-----+-----+-----+-----+----+ 4838 | 0 | 10 | 0 | 1 | S | 4839 +-----+-----+-----+-----+----+ 4841 S is a one byte value, 1..255. 4843 12.11. hash-bucket-assignment 4845 A set of load balancing hash values for the secondary server. See 4846 section 5.3 for more information on how this option is used. 4848 The format and usage of the data in this option is defined in 4849 [LOADB]. 4851 Code Len Hash Buckets 4852 +-----+-----+-----+-----+-----+-----+-----+-----+ 4853 | 0 | 11 | 0 | 32 | b1 | b2 | ... | b32 | 4854 +-----+-----+-----+-----+-----+-----+-----+-----+ 4856 12.12. lease-expiration-time 4858 The lease expiration time is the lease interval that a DHCP server 4859 has ACKed to a DHCP client added to the time at which that ACK was 4860 transmitted -- expressed as an absolute time (see section 6.2). 4862 Code Len Time 4863 +-----+-----+-----+-----+----+-----+-----+-----+ 4864 | 0 | 12 | 0 | 4 | t1 | t2 | t3 | t4 | 4865 +-----+-----+-----+-----+----+-----+-----+-----+ 4867 12.13. max-unacked-bndupd 4869 The maximum number of BNDUPD message that this server is prepared to 4870 accept over the TCP connection without causing the TCP connection to 4871 block. A 32 bit unsigned integer value, in network byte order. 4873 Code Len Maximum Unacked BNDUPD 4874 +-----+-----+-----+-----+----+-----+-----+-----+ 4875 | 0 | 13 | 0 | 4 | n1 | n2 | n3 | n4 | 4876 +-----+-----+-----+-----+----+-----+-----+-----+ 4878 12.14. MCLT 4880 Maximum Client Lead Time, an interval, in seconds. A 32 bit unsigned 4881 integer value, in network byte order. 4883 Code Len Time 4884 +-----+-----+-----+-----+----+-----+-----+-----+ 4885 | 0 | 14 | 0 | 4 | t1 | t2 | t3 | t4 | 4886 +-----+-----+-----+-----+----+-----+-----+-----+ 4888 12.15. message 4890 This option is used to supply a human readable message text. It may 4891 be used in association with the Reject Reason Code to provide a human 4892 readable error message for the reject. 4894 Code Len Text 4895 +-----+-----+-----+-----+------+-----+-- 4896 | 0 | 15 | 0 | n | c1 | c2 | ... 4897 +-----+-----+-----+-----+------+-----+-- 4899 12.16. message-digest 4901 The message digest for this message. 4903 This option consists of a variable number of bytes which contain the 4904 message digest of the message prior to the inclusion of this option. 4906 When this option appears in a message, it MUST appear as the last 4907 option in the message. It MUST appear in every message if message 4908 digests are required. 4910 Code Len Message Digest 4911 +-----+-----+-----+-----+----+-----+----- 4912 | 0 | 16 | 0 | n | d1 | d2 | ... 4913 +-----+-----+-----+-----+----+-----+----- 4915 12.17. potential-expiration-time 4917 The potential expiration time is the time that one server tells 4918 another server that it may wish to grant in a lease to a DHCP client. 4919 It is an absolute time. See section 6.2. 4921 Code Len Time 4922 +-----+-----+-----+-----+----+-----+-----+-----+ 4923 | 0 | 17 | 0 | 4 | t1 | t2 | t3 | t4 | 4924 +-----+-----+-----+-----+----+-----+-----+-----+ 4926 12.18. receive-timer 4928 The number of seconds (an interval) within which the server must 4929 receive a message from its partner, or it will assume that 4930 communications with the partner is not ok. An unsigned 32 bit 4931 integer in network byte order. 4933 Code Len Receive Timer 4934 +-----+-----+-----+-----+----+-----+-----+-----+ 4935 | 0 | 18 | 0 | 4 | s1 | s2 | s3 | s4 | 4936 +-----+-----+-----+-----+----+-----+-----+-----+ 4938 12.19. protocol-version 4940 The protocol version being used by the server. It is only sent in the 4941 CONNECT and CONNECTACK messages. The current value for the version 4942 is 1. 4944 Code Len Version 4945 +-----+-----+-----+-----+-----+ 4946 | 0 | 19 | 0 | 1 | 1 | 4947 +-----+-----+-----+-----+-----+ 4949 12.20. reject-reason 4951 This option is used to selectively reject binding updates. It MAY be 4952 used in a BNDACK message or a CONNECTACK message, always associated 4953 with an assigned-IP-address option, which contains the IP address of 4954 the update being rejected. 4956 Code Len Reason Code 4957 +-----+-----+-----+-----+-----+ 4958 | 0 | 20 | 0 | 1 | R1 | 4959 +-----+-----+-----+-----+-----+ 4961 Reason codes : 4963 0 Reserved 4964 1 Illegal IP address (not part of any address pool). 4965 2 Fatal conflict exists: address in use by other client. 4966 3 Missing binding information. 4967 4 Connection rejected, time mismatch too great. 4968 5 Connection rejected, invalid MCLT. 4969 6 Connection rejected, unknown reason. 4970 7 Connection rejected, duplicate connection. 4971 8 Connection rejected, invalid failover partner. 4972 9 TLS not supported. 4973 10 TLS supported but not configured. 4974 11 TLS required but not supported by partner. 4975 12 Message digest not supported. 4976 13 Message digest not configured. 4977 14 Protocol version mismatch. 4978 15 Outdated binding information. 4979 16 Less critical binding information. 4980 17 No traffic within sufficient time. 4981 18 Hash bucket assignment conflict. 4982 19-253, reserved. 4983 254 Unknown: Error occurred but does not match any reason code. 4984 255 Reserved for code expansion. 4986 12.21. sending-server-IP-address 4988 The IP address of the server sending this message. This option is 4989 required for all messages if the message digest option used. 4991 Code Len Address 4992 +-----+-----+-----+-----+----+-----+-----+-----+ 4993 | 0 | 21 | 0 | 4 | a1 | a2 | a3 | a4 | 4994 +-----+-----+-----+-----+----+-----+-----+-----+ 4996 12.22. server-flags 4998 This option is used to convey the current flags of the failover 4999 endpoint in the sending server. 5001 Code Len Server Flags 5002 +-----+-----+-----+-----+-------+ 5003 | 0 | 22 | 0 | 1 | flags | 5004 +-----+-----+-----+-----+-------+ 5006 The flags field is an 8-bit field; one bit position is 5007 specified here. 5009 0 1 2 3 4 5 6 7 5010 +-+-+-+-+-+-+-+-+ 5011 |S| MBZ | 5012 +-+-+-+-+-+-+-+-+ 5014 The bits (numbered from the least-significant bit in network 5015 byte-order) are used as follows: 5017 0 (S): STARTUP, 5018 Bit 0 MUST be set to 1 whenever the server is in STARTUP state, 5019 and set to 0 otherwise. (Note that when in STARTUP state, the 5020 state transmitted in the server-state option is usually the last 5021 recorded state from stable storage, but see section 9.3 for 5022 details.) 5023 1-7 : Must be zero 5025 12.23. server-state 5027 This option is used to convey the current state of the failover 5028 endpoint in the sending server. 5030 Code Len Server State 5031 +-----+-----+-----+-----+-----+ 5032 | 0 | 23 | 0 | 1 | 1-9 | 5033 +-----+-----+-----+-----+-----+ 5035 Legal values for this option are: 5037 Value Server State 5038 ----- ------------------------------------------------------------- 5039 0 reserved 5040 1 STARTUP Startup state (1) 5041 2 NORMAL Normal state 5042 3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe) 5043 4 PARTNER-DOWN Partner down (unsafe mode) 5044 5 POTENTIAL-CONFLICT Synchronizing 5045 6 RECOVER Recovering bindings from partner 5046 7 PAUSED Shutting down for a short period. 5047 8 SHUTDOWN Shutting down for an extended 5048 period. 5049 9 RECOVER-DONE Interlock state prior to NORMAL 5050 10 RESOLUTION-INTERRUPTED Comm. failed during resolution 5052 (1) The STARTUP state is never sent to the partner server, it is 5053 indicated by the STARTUP bit in the server-flags options (see section 5054 12.22). 5056 12.24. start-time-of-state 5058 This option is used for different states in different messages. In a 5059 BNDUPD message it represents the start time of the state of the lease 5060 in the BNDUPD message. In a STATE message, it represents the start 5061 time of the partner server's failover state. In all cases it is an 5062 absolute time. 5064 Code Len Start Time of State 5065 +-----+-----+-----+-----+----+-----+-----+-----+ 5066 | 0 | 24 | 0 | 4 | t1 | t2 | t3 | t4 | 5067 +-----+-----+-----+-----+----+-----+-----+-----+ 5069 12.25. TLS-reply 5071 This option contains information relating to TLS security 5072 negotiation. It is sent in a CONNECTACK message 5074 A t1 value of 0 indicates no TLS operation, a value of 1 indicates 5075 that TLS operation is required. 5077 Code Len TLS 5078 +-----+-----+-----+-----+-----+ 5079 | 0 | 25 | 0 | 1 | t1 | 5080 +-----+-----+-----+-----+-----+ 5082 12.26. TLS-request 5084 This option contains information relating to TLS security 5085 negotiation. It is sent in a CONNECT message. 5087 The t1 byte is the TLS request from this server. A value of 0 5088 indicates no TLS operation (to communicate the other server MUST NOT 5089 require TLS), a value of 1 indicates that TLS operation is desired 5090 but not required (to communicate, the other server MAY utilize TLS), 5091 and a value of 2 indicates that TLS operation is required (to 5092 communicate the other server MUST utilize TLS) to establish 5093 communications with this server. 5095 Code Len TLS 5096 +-----+-----+-----+-----+-----+ 5097 | 0 | 26 | 0 | 1 | t1 | 5098 +-----+-----+-----+-----+-----+ 5100 12.27. vendor-class-identifier 5102 A string which identifies the vendor of the failover protocol 5103 implementation. 5105 Code Len vendor class string 5106 +-----+-----+-----+-----+----+-----+--- 5107 | 0 | 27 | 0 | n | c1 | c2 | ... 5108 +-----+-----+-----+-----+----+-----+--- 5110 12.28. vendor-specific-options 5112 This option is used to convey options specific to a particular 5113 vendor's implementation. The vendor class identifier is used to 5114 specify which option space the embedded options are drawn from. 5116 It functions similarly to the vendor class identifier and vendor 5117 specific options in the DHCP protocol. 5119 This option contains other options in the same two byte code, two 5120 byte length format. If this option appears in a message without a 5121 corresponding vendor class identifier, it MUST be ignored. 5123 Code Len Embedded options 5124 +-----+-----+-----+-----+----+-----+--- 5125 | 0 | 28 | 0 | n | c1 | c2 | ... 5126 +-----+-----+-----+-----+----+-----+--- 5128 13. IANA Considerations 5130 This document defines several number spaces (failover options, fail- 5131 over message types, and failover reject reason codes). For all of 5132 these number spaces, certain values are defined in this specifica- 5133 tion. New values may only be defined by IETF Consensus, as described 5134 in [RFC 2434]. Basically, this means that they are defined by RFCs 5135 approved by the IESG. 5137 14. Acknowledgments 5139 Ralph Droms started it all, by sketching out an initial interserver 5140 draft that embodied ideas from several past IETF meetings. In that 5141 draft, he acknowledged contributions by Jeff Mogul, Greg Minshall, 5142 Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group. 5144 Kim Kinnear and Bob Cole each extended that draft, separately and 5145 then together, until they created an interserver draft that supported 5146 any number of servers. The complexity of that approach was just too 5147 great, and that draft wasn't greeted with enthusiasm by many, includ- 5148 ing its authors. 5150 It did however lead to a much simpler approach embodied in the first 5151 Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph 5152 Droms. This draft posited only two servers -- a primary and a 5153 secondary. 5155 Kim Kinnear then wrote the Safe Failover draft to layer on top of the 5156 Failover Draft and increase its robustness in the face of certain 5157 rare network failures. 5159 At the spring 1998 IETF meeting in LA, the DHC working group said 5160 that they wanted a merged Failover and Safe Failover draft. Steve 5161 Gonczi and Bernie Volz stepped up and produced the raw material for 5162 such a merged draft, along with a new message format designed around 5163 DHCP options and other extensions and clarifications. Kim Kinnear 5164 edited their work into draft format and made other changes in time 5165 for the Summer Chicago IETF meeting. 5167 During the summer and fall of 1998, two groups worked on separate 5168 implementations of the UDP failover draft. Bernie Volz and Steve 5169 Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul 5170 Fox made up the other. These two groups worked together to produce 5171 considerable changes and simplifications of the protocol during that 5172 period, and Steve Gonczi and Kim Kinnear edited those changes into 5173 -03 draft in time for submission to the December 1998 Orlando IETF 5174 meeting. 5176 In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting on 5177 people interested in the failover draft. During that meeting a gen- 5178 eral agreement was reached to recast the failover protocol to use TCP 5179 instead of UDP. In addition, the group together brainstormed a work- 5180 able load-balancing technique. Kim Kinnear rewrote the entire draft 5181 to include the changes made at that meeting as well as to restructure 5182 the draft along guidelines suggested by Thomas Narten. The result 5183 was the -04 draft, submitted prior to the Oslo IETF meeting. 5185 The initial idea for a hash-based load balancing approach was offered 5186 by Ted Lemon, and the determination of an algorithm and its integra- 5187 tion into the draft was done by Steve Gonczi. The security section 5188 was spearheaded by Bernie Volz. Both contributed considerably to the 5189 ideas and text in the rest of the draft with several reviews. 5191 In early October of 1999, three conference calls were held to discuss 5192 the -04 draft. The -05 includes changes as a result of those calls, 5193 perhaps the largest of which was to remove the load balancing 5194 approach into a separate draft. Thanks to all of the many people 5195 who participated in the conference calls. Changes were made because 5196 of contributions by: Ted Lemon, David Erdmann, Richard Jones, Rob 5197 Stevens, Thomas Narten, Diana Lane, and Andre Kostur. 5199 Another conference call was held in mid-January of 2000, and the -06 5200 draft was produced to tighten up the the -05 draft both technically 5201 as well as editorially. 5203 This, the -07 draft was edited by Kim Kinnear and was based in part 5204 on reviews by Richard Jones, Bernie Volz, and Steve Gonczi. It embo- 5205 dies several technical updates as well as numerous editorial revi- 5206 sions that enhance both correctness as well as clarity. 5208 These most recent changes have not been widely circulated among the 5209 other authors prior to submission to the IETF. 5211 Many people have reviewed the various earlier drafts that went into 5212 this result. At American Internet, ideas were contributed by Brad 5213 Parker. At Cisco Systems Paul Fox and Ellen Garvey contributed to 5214 the design of the protocol. 5216 Glenn Waters of Nortel Networks contributed ideas and enthusiasm to 5217 make a Failover protocol that was both "safe" and "lazy". 5219 15. References 5221 [AGENTINFO] Patrick, M., "draft-ietf-dhc-agent-options-11.txt", July, 5222 2000. 5224 [DDNS] Rekhter, Y., Stapp, M., "draft-ietf-dhc-dhcp-dns-12.txt", 5225 March, 2000. 5227 [LOADB] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "draft-ietf- 5228 dhc-loadb-02.txt", July, 1999. 5230 [RFC 1035] Mockapetris, P., "Domain Names - Implementation and 5231 Specification", November, 1987. 5233 [RFC 1321] Rivest, R., and Dusse, S., "The MD5 Message-Digest Algo- 5234 rithm", RFC 1321, MIT Laboratory for Computer Science, RSA Data 5235 Security Inc., April 1992. 5237 [RFC 1534] Droms, R., "Interoperation between DHCP and BOOTP", RFC 5238 1534, October 1993. 5240 [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate 5241 Requirement Levels", RFC 2119. 5243 [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 5244 2131, March 1997. 5246 [RFC 2132] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor 5247 Extensions", Internet RFC 2132, March 1997. 5249 [RFC 2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic 5250 Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April 5251 1997 5253 [RFC 2139] Rigney, C., "Radius Accounting", RFC 2139, Livingston 5254 Enterprises, April 1997. 5256 [RFC 2246] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, 5257 January 1999. 5259 [RFC 2434] Alvestrand, H. and T. Narten, "Guidelines for Writing an 5260 IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 5261 1998. 5263 [RFC 2487] Hoffman, P., "SMTP Service Extension for Secure SMTP over 5264 TLS", RFC 2487, January 1999. 5266 [RFC 2595] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC 5267 2595, June 1999. 5269 [USERCLASS] Droms, R., Demirtjis A., Stump, G., Gu, Y., Vyaghrapuri, 5270 R., Beser, B., Privat, J. "draft-ietf-dhc-userclass-08.txt", July, 5271 2000. 5273 16. Author's information 5275 Ralph Droms 5276 323 Dana Engineering 5277 Bucknell University 5278 Lewisburg, PA 17837 5280 Phone: (717) 524-1145 5281 EMail: droms@bucknell.edu 5283 Kim Kinnear 5284 Mark Stapp 5285 Cisco Systems 5286 250 Apollo Drive 5287 Chelmsford, MA 01824 5289 Phone: (978) 244-8000 5291 EMail: kkinnear@cisco.com 5292 mjs@cisco.com 5294 Bernie Volz 5295 IPWorks, Inc. 5296 959 Concord St. 5297 Framingham, MA 01701 5299 Phone: (508) 879-1809 5301 EMail: volz@ipworks.com 5303 Steve Gonczi 5304 Network Engines, Inc. 5305 25 Dan Road 5306 Canton, MA 02021-2817 5308 Phone: (781) 332-1165 5310 Email: steve.gonczi@networkengines.com 5312 Greg Rabil, Mike Dooley, Arun Kapur 5313 Lucent Technologies 5314 400 Lapp Road 5315 Malvern, PA 19355 5317 Phone: (800) 208-2747 5319 EMail: grabil@lucent.com 5320 mdooley@lucent.com 5321 akapur@lucent.com 5323 17. Full Copyright Statement 5325 Copyright (C) The Internet Society (1999). All Rights Reserved. 5327 This document and translations of it may be copied and furnished to oth- 5328 ers, and derivative works that comment on or otherwise explain it or 5329 assist in its implementation may be prepared, copied, published and dis- 5330 tributed, in whole or in part, without restriction of any kind, provided 5331 that the above copyright notice and this paragraph are included on all 5332 such copies and derivative works. However, this document itself may not 5333 be modified in any way, such as by removing the copyright notice or 5334 references to the Internet Society or other Internet organizations, 5335 except as needed for the purpose of developing Internet standards in 5336 which case the procedures for copyrights defined in the Internet Stan- 5337 dards process must be followed, or as required to translate it into 5338 languages other than English. 5340 The limited permissions granted above are perpetual and will not be 5341 revoked by the Internet Society or its successors or assigns. 5343 This document and the information contained herein is provided on an "AS 5344 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 5345 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 5346 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 5347 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT- 5348 NESS FOR A PARTICULAR PURPOSE. 5350 Open Issues 5352 These issues need to be resolved: 5354 1. Get another port number for connections. 5356 2. Resolve how to handle secondary IP address allocation. 5358 3. Figure out a better way to identify vendors. How about an 5359 SNMP Enterprise MIB value? 5361 4. Need to tie reject-reasons to text of draft, remove obsolete 5362 reject-reasons.