idnits 2.17.1 draft-ietf-dhc-failover-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 112 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([RFC2131]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1584 has weird spacing: '...od ends addre...' == Line 2140 has weird spacing: '...eserved not...' == Line 2671 has weird spacing: '... accept acc...' == Line 2672 has weird spacing: '... accept acc...' == Line 2673 has weird spacing: '... accept acc...' == (7 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests, and any load balancing (described in section 5.3) MUST NOT be used. When allocating new IP addresses, each server SHOULD allocate from its own IP address pool (if that can be determined), where the primary SHOULD allocate only FREE IP addresses, and the secondary SHOULD allocate only BACKUP IP addresses. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease on an IP address irrespective of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the latest of: 1) the potential-expiration-time already acknowledged by the other server or 2) the lease-expiration-time or 3) `potential-expiration-time received from the partner server. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2002) is 8130 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '3' on line 2325 -- Looks like a reference, but probably isn't: '4' on line 2836 -- Looks like a reference, but probably isn't: '9' on line 2955 -- Looks like a reference, but probably isn't: '7' on line 3018 -- Looks like a reference, but probably isn't: '8' on line 3063 -- Looks like a reference, but probably isn't: '1' on line 3102 -- Looks like a reference, but probably isn't: '2' on line 3153 -- Looks like a reference, but probably isn't: '5' on line 3191 -- Looks like a reference, but probably isn't: '6' on line 3386 -- Looks like a reference, but probably isn't: '10' on line 3558 -- Looks like a reference, but probably isn't: '11' on line 3606 -- Looks like a reference, but probably isn't: '12' on line 3629 == Unused Reference: 'RFC 2139' is defined on line 5780, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'DHCID' -- Possible downref: Non-RFC (?) normative reference: ref. 'DNSRES' -- Possible downref: Non-RFC (?) normative reference: ref. 'FQDN' ** Downref: Normative reference to an Informational RFC: RFC 2104 ** Obsolete normative reference: RFC 2139 (Obsoleted by RFC 2866) ** Obsolete normative reference: RFC 2246 (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 2487 (Obsoleted by RFC 3207) Summary: 12 errors (**), 0 flaws (~~), 10 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Ralph Droms 3 INTERNET DRAFT Kim Kinnear 4 Mark Stapp 5 Cisco Systems 7 Bernie Volz 8 Ericsson 10 Steve Gonczi 11 Network Engines 13 Greg Rabil 14 Mike Dooley 15 Arun Kapur 16 Lucent Technologies 18 July 2001 19 Expires January 2002 21 DHCP Failover Protocol 22 24 Status of this Memo 26 This document is an Internet-Draft and is in full conformance with 27 all provisions of Section 10 of RFC2026. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet- Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/ietf/1id-abstracts.txt 42 The list of Internet-Draft Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html. 45 Copyright Notice 47 Copyright (C) The Internet Society (2001). All Rights Reserved. 49 Abstract 51 DHCP [RFC 2131] allows for multiple servers to be operating on a 52 single network. Some sites are interested in running multiple 53 servers in such a way so as to provide redundancy in case of server 54 failure. In order for this to work reliably, the cooperating primary 55 and secondary servers must maintain a consistent database of the 56 lease information. This implies that servers will need to coordinate 57 any and all lease activity so that this information is synchronized 58 in case of failover. 60 This document defines a protocol to provide such synchronization 61 between two servers. One server is designated the "primary" server, 62 the other is the "secondary" server. This document also describes a 63 way to integrate the failover protocol with the DHCP load balancing 64 approach. 66 Table of Contents 68 1. Introduction................................................. 4 69 2. Terminology.................................................. 5 70 2.1. Requirements terminology................................... 5 71 2.2. DHCP and failover terminology.............................. 5 72 3. Background and External Requirements......................... 9 73 3.1. Key aspects of the DHCP protocol........................... 9 74 3.2. BOOTP relay agent implementation........................... 11 75 3.3. What does it mean if a server can't communicate with its partner? 12 76 3.4. Challenging scenarios for a Failover protocol.............. 13 77 3.5. Using TCP to detect partner server failure................. 14 78 4. Design Goals................................................. 15 79 4.1. Design goals for this protocol............................. 15 80 4.2. Limitations of this protocol............................... 17 81 5. Protocol Overview............................................ 17 82 5.1. Messages and States........................................ 17 83 5.2. Fundamental guarantees..................................... 20 84 5.3. Load balancing............................................. 27 85 5.4. IP address allocations between servers..................... 28 86 5.5. Operating in NORMAL state.................................. 30 87 5.6. Operating in COMMUNICATIONS-INTERRUPTED state.............. 31 88 5.7. Operating in PARTNER-DOWN state............................ 31 89 5.8. Operating in RECOVER state................................. 31 90 5.9. Operating in STARTUP state................................. 31 91 5.10. Time synchronization between servers...................... 32 92 5.11. IP address binding-status................................. 32 93 5.12. DNS dynamic update considerations......................... 36 94 5.13. Reservations and failover................................. 41 95 5.14. Dynamic BOOTP and failover................................ 42 96 5.15. Guidelines for selecting MCLT............................. 43 97 5.16. What is sent in response to an UPDREQ or UPDREQALL message? 43 98 5.17. How do you determine that your partner is "up to date" for 45 99 6. Common Message Format........................................ 45 100 6.1. Message header format...................................... 46 101 6.2. Common option format....................................... 48 102 6.3. Batching multiple binding update transactions in one BNDUPD mes- 49 103 7. Protocol Messages............................................ 51 104 7.1. BNDUPD message [3]......................................... 51 105 7.2. BNDACK message [4]......................................... 62 106 7.3. UPDREQ message [9]......................................... 65 107 7.4. UPDREQALL message [7]...................................... 66 108 7.5. UPDDONE message [8]........................................ 67 109 7.6. POOLREQ message [1]........................................ 68 110 7.7. POOLRESP message [2]....................................... 69 111 7.8. CONNECT message [5]........................................ 70 112 7.9. CONNECTACK message [6]..................................... 74 113 7.10. STATE message [10]........................................ 78 114 7.11. CONTACT message [11]...................................... 79 115 7.12. DISCONNECT message [12]................................... 80 116 8. Connection Management........................................ 81 117 8.1. Connection granularity..................................... 81 118 8.2. Creating the TCP connection................................ 81 119 8.3. Using the TCP connection for determining communications status 83 120 8.4. Using the TCP connection for binding data.................. 85 121 8.5. Using the TCP connection for control messages.............. 85 122 8.6. Losing the TCP connection.................................. 85 123 9. Failover Endpoint States..................................... 86 124 9.1. Server Initialization...................................... 86 125 9.2. Server State Transitions................................... 86 126 9.3. STARTUP state.............................................. 90 127 9.4. PARTNER-DOWN state......................................... 93 128 9.5. RECOVER state.............................................. 95 129 9.6. RECOVER-WAIT state......................................... 97 130 9.7. RECOVER-DONE state......................................... 98 131 9.9. COMMUNICATIONS-INTERRUPTED State........................... 101 132 9.10. POTENTIAL-CONFLICT state.................................. 105 133 9.11. RESOLUTION-INTERRUPTED state.............................. 107 134 9.12. CONFLICT-DONE state....................................... 108 135 9.13. PAUSED state.............................................. 108 136 9.14. SHUTDOWN state............................................ 109 137 10. Safe Period................................................. 110 138 11. Security.................................................... 111 139 11.1. Simple shared secret...................................... 112 140 11.2. TLS....................................................... 113 141 12. Failover Options............................................ 113 142 12.1. addresses-transferred..................................... 114 143 12.2. assigned-IP-address....................................... 114 144 12.3. binding-status............................................ 114 145 12.4. client-identifier......................................... 115 146 12.5. client-hardware-address................................... 115 147 12.6. client-last-transaction-time.............................. 115 148 12.7. client-reply-options...................................... 116 149 12.8. client-request-options.................................... 116 150 12.9. DDNS...................................................... 117 151 12.10. delayed-service-parameter................................ 118 152 12.11. hash-bucket-assignment................................... 118 153 12.12. IP-flags................................................. 119 154 12.13. lease-expiration-time.................................... 120 155 12.14. max-unacked-bndupd....................................... 120 156 12.15. MCLT..................................................... 120 157 12.16. message.................................................. 121 158 12.17. message-digest........................................... 121 159 12.18. potential-expiration-time................................ 122 160 12.19. receive-timer............................................ 122 161 12.20. protocol-version......................................... 122 162 12.21. reject-reason............................................ 123 163 12.22. relationship-name........................................ 124 164 12.23. server-flags............................................. 124 165 12.24. server-state............................................. 125 166 12.25. start-time-of-state...................................... 125 167 12.26. TLS-reply................................................ 126 168 12.27. TLS-request.............................................. 126 169 12.28. vendor-class-identifier.................................. 126 170 12.29. vendor-specific-options.................................. 127 171 13. IANA Considerations......................................... 127 172 14. Acknowledgments............................................. 127 173 15. References.................................................. 129 174 16. Author's information........................................ 131 175 17. Full Copyright Statement.................................... 132 177 1. Introduction 179 DHCP [RFC 2131] allows for multiple servers to be operating on a sin- 180 gle network. Some sites are interested in running multiple servers 181 in such a way so as to provide redundancy in case of server failure 182 since the DHCP subsystem is in many cases a critical part of the net- 183 work infrastructure. 185 This document defines a protocol to provide synchronization between 186 two servers in order that each can take over for the other should 187 either one fail or become unreachable. 189 One server is designated the "primary" server, the other is the 190 "secondary" server, and most DHCP client requests are sent to each 191 server (see section 3.1.1 for details). 193 In order to provide a high availability DHCP service, these 194 cooperating primary and secondary servers must maintain a consistent 195 database of lease information. This implies that servers will need 196 to coordinate all lease activity so that this information is syn- 197 chronized in case failover is required. The protocol messages and 198 processing techniques required to maintain a consistent database are 199 specified in the protocol described here. 201 The failover protocol also contains a way to integrate the DHCP load- 202 balancing algorithm described in [RFC 3074] with the failover proto- 203 col. 205 2. Terminology 207 This section discusses both the generic requirements terminology com- 208 mon to many IETF protocol specifications as well as specialized DHCP 209 and failover protocol specific terminology. 211 2.1. Requirements terminology 213 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 214 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 215 document are to be interpreted as described in RFC 2119 [RFC 2119]. 217 2.2. DHCP and failover terminology 219 This document uses the following terms: 221 o "available IP address" 223 An IP address is "available" if it may be allocated by a 224 specific DHCP server. An IP address is considered (for the 225 purposes of this document) to be available to a single server 226 for allocation unless otherwise noted. An IP address available 227 for allocation on a primary server has state FREE, and an IP 228 address available for allocation on a secondary server has 229 state BACKUP. 231 o "binding" 232 A binding is a collection of configuration parameters, includ- 233 ing at least an IP address, associated with or "bound to" a 234 DHCP client. Bindings are managed by DHCP servers. 236 o "binding database" 238 The collection of bindings managed by a primary and secondary. 240 o "binding update transaction" 242 A binding update transaction refers to the set of information 243 (contained in options) necessary to perform a binding update 244 for a single IP address. It will be comprised of the 245 assigned-IP-address option, the binding-status option, along 246 with other options as appropriate. 248 o "binding-status" 250 The binding-status is the status of an IP address with respect 251 to its association with a client. There are specific binding- 252 status values defined for use by the failover protocol, e.g., 253 ACTIVE, FREE, RELEASED, ABANDONED, etc. These are designed to 254 map more or less directly onto the binding-status values used 255 internally in most DHCP server implementations. The term 256 binding-status refers to the concept also sometimes known as 257 "lease state" or "IP address state", but in this document the 258 term "state" is reserved for the failover state of a failover 259 endpoint, and binding-status is always used to refer to the 260 state associated with an IP address or lease. 262 o "DHCP client" or "client" 264 A DHCP client is an Internet host using DHCP to obtain confi- 265 guration parameters such as a network address. The term 266 "client" used within this document always means a DHCP client, 267 and never one of the two failover servers. 269 o "DHCP server" or "server" 271 A DHCP server is an Internet host that returns configuration 272 parameters to DHCP clients. 274 o "DDNS" 276 An abbreviation for "Dynamic DNS", which refers to the capabil- 277 ity to update a DNS server's name (actually resource record) 278 database using an on-the-wire protocol defined in [RFC 2136]. 280 o "DNS" 282 An abbreviation for "Domain Name System", a scheme where a cen- 283 tral name repository is used to map names to IP addresses and IP 284 addresses to names. 286 o "failover endpoint" 288 The failover protocol allows for there to be a unique failover 289 endpoint per partner per role (where role is primary or secon- 290 dary). This failover endpoint can take actions and hold unique 291 states. There are thus a maximum of two failover endpoints per 292 server per partner (one for each partner as a primary and one 293 for that same partner as a secondary.) 295 o "FQDN" 297 An FQDN is a "fully qualified domain name". A fully qualified 298 domain name generally is a host name with at least one zone 299 name, for example "www.dhcp.org" is a fully qualified domain 300 name. 302 o "lazy update" 304 Lazy update refers to the requirement placed on a server imple- 305 menting a failover protocol to update its failover partner when- 306 ever the binding database changes. A failover protocol which 307 didn't support lazy update would require the failover partner 308 update to be complete before a DHCP server could respond to a 309 DHCP client request with a DHCPACK. A failover protocol which 310 does support lazy update places no such restriction on the 311 update of the failover partner server, and so a server can allo- 312 cate an IP address or extend a lease on an IP address and then 313 update its failover partner as time permits. A failover proto- 314 col which supports lazy update not only removes the requirement 315 to update the failover partner prior to responding to a DHCP 316 client with a DHCPACK, but also allows gathering up batches of 317 updates from one failover server to its partner. 319 o "MCLT" 321 The MCLT refers to maximum client lead time. This time is con- 322 figured on the primary server and transmitted from the primary 323 to the secondary server in the CONNECT message. It is the max- 324 imum amount of time that one server can extend a lease for a 325 client's binding beyond the time known by the partner server. 326 See section 5.2.1 for details. 328 o "partner" 330 A "partner", for the purposes of this document, refers to a 331 failover server, typically the other failover server. In many 332 (if not most) cases, the failover protocol is symmetric with 333 respect to the primary or secondary nature of the servers, and 334 so it is often appropriate to discuss "updating the partner 335 server", since it could be a primary server updating a secondary 336 server or a secondary server updating a primary server. 338 o "Primary server" or "Primary" 340 A DHCP server configured to provide primary service to a set of 341 DHCP clients for a particular set of subnet address pools. 343 o "RR" 345 "RR" is an abbreviation for "resource record". All records in 346 the DNS are resource records. The resource records of most 347 relevance to this document are the "A" resource record, which 348 maps a DNS name to a particular IP address, the "PTR" resource 349 record, which allows a "reverse map", from the IP address back 350 to a DNS name, and the "KEY" resource record, which is used in 351 ways defined in [FQDN] to tag a DNS name with the identity of 352 the DHCP client with which it is associated. 354 o "Secondary server" or "Secondary" 356 A DHCP server configured to act as backup to a primary server 357 for a particular set of subnet address pools. 359 o "stable storage" 361 Every DHCP server is assumed to have some form of what is called 362 "stable storage". Stable storage is used to hold information 363 concerning IP address bindings (among other things) so that this 364 information is not lost in the event of a server failure which 365 requires restart of the server. 367 o "state" 369 In this document, the term "state" refers exclusively to the 370 state of a failover endpoint, for example: NORMAL, 371 COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN. It is not used to 372 refer to any attributes of an IP address or a binding of an IP 373 address. See "binding-status". 375 o "subnet address pool" 376 A subnet address pool is the set of IP addresses which is asso- 377 ciated with a particular network number and subnet mask. In the 378 simple case, there is a single network number and subnet mask 379 and a set of IP addresses. In the more complex case (sometimes 380 called "secondary subnets", sometimes "superscopes"), several 381 (apparently unrelated) network number and subnet mask combina- 382 tions with their associated IP addresses may all be configured 383 together into one subnet address pool. 385 3. Background and External Requirements 387 This section highlights key aspects of the DHCP protocol on which the 388 failover protocol depends. It also discusses the requirements that 389 the failover protocol places on other aspects of the network infras- 390 tructure, and some general issues surrounding server failure detec- 391 tion. Some failure scenarios that provide particular challenges to a 392 failover protocol are discussed. Finally, the challenges inherent in 393 using a TCP connection as a means to detect failure of a partner 394 server are elaborated. 396 3.1. Key aspects of the DHCP protocol 398 The failover protocol is designed to augment the DHCP protocol as 399 described in RFC 2131 [RFC 2131]. There are several key aspects of 400 the DHCP protocol which are required by the failover protocol in 401 order to successfully meet its design goals. 403 3.1.1. Broadcast behavior 405 There are two aspects of the broadcast behavior of the DHCP protocol 406 which are key to making the failover protocol operate successfully. 407 The first is simply that the DHCP protocol requires a DHCP client to 408 broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages. 409 Because of this requirement, a DHCP client who was communicating with 410 one server will automatically be able to communicate with another 411 server if one is available. 413 The second aspect of broadcast behavior is similar to the first, but 414 involves the distinction between a DHCPREQUEST/RENEW and 415 DHCPREQUEST/REBINDING. A DHCPREQUEST/RENEW is the message that a 416 DHCP client uses to extend its lease. It is unicast to the DHCP 417 server from which it acquired the lease. However, the DHCP protocol 418 (in a farsighted move), was explicitly designed so that in the event 419 that a DHCP client cannot contact the server from which it received a 420 lease on an IP address using a DHCPREQUEST/RENEW, the client is 421 required to broadcast its renewal using a DHCPREQUEST/REBINDING to 422 any available DHCP server. Since all DHCP clients were required to 423 implement this algorithm, the failover protocol can have a different 424 server from the one that initially granted a lease be the server to 425 renew a lease. Thus, one server can take over for another with no 426 interruption in the service as experienced by the DHCP client or its 427 associated applications software. 429 3.1.2. Client responsibility 431 In the DHCP protocol the DHCP clients are entrusted with a consider- 432 able responsibility. In particular, after they are granted a lease 433 on an IP address, they are enjoined to only use that IP address while 434 their lease is valid. Every DHCP client is expected to stop using an 435 IP address if the expiration time on the lease has passed and if it 436 cannot get an extension on the lease for that IP address from some 437 DHCP server. Thus, the correct behavior of every DHCP client in this 438 regard is required to ensure the integrity of the DHCP service. On 439 the other hand, incorrect behavior by a client in this area will tend 440 to adversely affect at most one other DHCP client. 442 Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or 443 DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or 444 broadcast for a REBINDING) MUST still have time to run on the lease 445 for that IP address. The DHCP server sends the DHCPACK back unicast 446 to the IP address from which the RENEW or REBINDING originated. 448 Given the existing responsibility placed on the client to only use an 449 IP address when the lease is valid, and to only send in a RENEW or 450 REBINDING if the lease is valid, the failover protocol relies on DHCP 451 clients to perform responsibly and will, in the absence of conflict- 452 ing information, believe a DHCP client that is attempting to RENEW or 453 REBIND a lease on an IP address is the legitimate owner of that IP 454 address. 456 If clients do not follow these rules, it is possible for an address 457 to be in use by more than one client. For a single server, this hap- 458 pens because the server has leased the expired address to another 459 client and the original client is also attempting to use the address. 460 The server would NAK the renewal request. This is made slightly worse 461 in the failover protocol if the two servers are unable to communicate 462 with each other and one server leases an available address to a new 463 client while the other server receives a renewal from a different 464 client. In this case, both servers lease the same address to dif- 465 ferent clients for the MCLT time. 467 One troublesome issue is that of the DHCP client responsibility when 468 sending in DHCPREQUEST/INIT-REBOOT requests. While the original DHCP 469 RFC was written to require a DHCP client to have time left to run on 470 the lease for an IP address if the client is sending an INIT-REBOOT 471 request, it was sufficiently unclear that some client vendors didn't 472 realize this until recently. Since the INIT-REBOOT request was sent 473 with the IP address in the dhcp-requested-address option and not in 474 the ciaddr (for perfectly good reasons), the similarity to the RENEW 475 and REBINDING case was lost on many people. 477 At present, the failover protocol does not assume that a client send- 478 ing in an INIT-REBOOT request necessarily has a valid lease on the IP 479 address appearing in the dhcp-requested-address option in the INIT- 480 REBOOT request. 482 The implications of this are as follows: Assume that there is a DHCP 483 client that gets a lease from one server while that server is unable 484 to communicate with its failover partner. Then, assume that after 485 that client reboots it is able only to communicate with the other 486 failover server. If the failover servers have not been able to com- 487 municate with each other during this process, then the DHCP client 488 will get a new IP address instead of being able to continue to use 489 its existing IP address. This will affect no applications on the DHCP 490 client, since it is rebooting. However, it will use up an additional 491 IP address in this marginal case. 493 3.1.3. Stable storage update before DHCPACK 495 The DHCP protocol allocates resources, and in order to operate 496 correctly it requires that a DHCP server update some form of stable 497 storage prior to sending a DHCPACK to a DHCP client in order to grant 498 that client a lease on an IP address. 500 One of the goals of the failover protocol is that it not add signifi- 501 cant additional time to this already time consuming requirement to 502 update stable storage prior to a DHCPACK. In particular, adding a 503 requirement to communicate with another server prior to sending a 504 DHCPACK would greatly simplify the failover protocol, but it would 505 unacceptably limit the potential scalability of any DHCP server which 506 employed the failover protocol. 508 3.2. BOOTP relay agent implementation 510 Many DHCP clients are not resident on the same network segment as a 511 DHCP server. In order to support this form of network architecture, 512 most contemporary routers implement something known as a BOOTP Relay 513 Agent. This capability inside of a router listens for all broadcasts 514 at the DHCP port, port 67, and will relay any broadcasts that it 515 receives on to a DHCP server. The IP address of the DHCP server must 516 have been previously configured into the router. As part of the 517 relay process, the relay agent will place the address of the inter- 518 face on which it received the broadcast into the giaddr field of the 519 DHCP packet. 521 Since the failover protocol requires two DHCP servers to receive any 522 broadcast DHCP messages, in order to work with DHCP clients which are 523 not local to the DHCP server, the BOOTP relay agent on the router 524 closest to the DHCP client must be configured to point at more than 525 one DHCP server. 527 Most BOOTP relay agent implementations allow this duplication of 528 packets. 530 If this is not possible, an administrator might be able to configure 531 the relay agent with a subnet broadcast address, but in this case the 532 primary and secondary DHCP servers in a failover pair must both 533 reside on the same subnet. 535 3.3. What does it mean if a server can't communicate with its partner? 537 In any protocol designed to allow one server to take over some 538 responsibilities from a partner server in the event of "failure" of 539 that partner server, there is an inherent difficulty in determining 540 when that partner server has failed. 542 In fact, it is fundamentally impossible for one server to distinguish 543 a network communications failure from the outright failure of the 544 server to which it is trying to communicate. In the case where each 545 server is handing out resources (in this case IP addresses) to a 546 client community, mistaking an inability to communicate with a 547 partner server for failure of that partner server could easily cause 548 both servers to be handing out the same IP addresses to different 549 clients. 551 One way that this is sometimes handled is for there to be more than 552 two servers. In the case of an odd number of servers, the servers 553 that can still communicate with a majority of other servers will con- 554 sider themselves operational, and any server which can't communicate 555 to a majority of other servers must immediately cease operations. 557 While this technique works in some domains, having the only server to 558 which a DHCP client can communicate voluntarily shut itself down 559 seems like something worth avoiding. 561 The failover protocol will operate correctly while both servers are 562 unable to communicate, whether they are both running or not. At some 563 point there may be resource contention, and if one of the servers is 564 actually down, then the operator can inform the operational server 565 and the operational server will be able to use all of the failed 566 server's resources. 568 The protocol also allows detection of an orderly shutdown of a parti- 569 cipating server. 571 3.4. Challenging scenarios for a Failover protocol 573 There exist two failure scenarios which provide particular challenges 574 to the correctness guarantees of a failover protocol. 576 3.4.1. Primary Server crash before "lazy" update: 578 In the case where the primary server sends a DHCPACK to a client for 579 a newly allocated IP address and then crashes prior to sending the 580 corresponding update to the secondary server, the secondary server 581 will have no record of the IP address allocation. When the secondary 582 server takes over, it may well try to allocate that IP address to a 583 different client. In the case where the first client to receive the 584 IP address is not on the net at the time (yet while there was still 585 time to run on its lease), an ICMP echo (i.e., ping) will not prevent 586 the secondary server from allocating that IP address to a different 587 client. 589 The failover protocol deals with this situation by having the primary 590 and secondary servers allocate addresses for new clients from dis- 591 joint address pools. See section 5.5 for details. 593 A more likely (in that DHCPREQUEST/RENEWs are presumably more common 594 than DHCPDISCOVERs) and more subtle version of this problem is where 595 the primary server crashes after extending a client's lease time, and 596 before updating the secondary with a new time using a lazy update. 597 After the secondary takes over, if the client is not connected to the 598 network the secondary will believe the client's lease has expired 599 when, in fact, it has not. In this case as well, the IP address 600 might be reallocated to a different client while the first client is 601 still using it. 603 This scenario is handled by the failover protocol through control of 604 the lease time and the use of the maximum client lead time (MCLT). 605 See section 5.2.1 for details. 607 3.4.2. Network partition where DHCP servers can't communicate but each 608 can talk to clients: 610 Several conditions are required for this situation to occur. First, 611 due to a network failure, the primary and secondary servers cannot 612 communicate. As well, some of the DHCP clients must be able to com- 613 municate with the primary server, and some of the clients must now 614 only be able to communicate with the secondary server. When this 615 condition occurs, both primary and secondary servers could attempt to 616 allocate IP addresses for new clients from the same pool of available 617 addresses. At some point, then, two clients will end up being allo- 618 cated the same IP address. This will cause problems when the network 619 failure that created this situation is corrected. 621 The failover protocol deals with this situation by having the primary 622 and secondary servers allocate addresses for new clients from dis- 623 joint address pools. See section 5.5 for details. 625 3.5. Using TCP to detect partner server failure 627 There are several characteristics of TCP that are important to the 628 functioning of the failover protocol, which uses one TCP connection 629 for both bulk data transfer as well as to assess communications 630 integrity with the other server. Reliable and ordered message 631 delivery are chief among these important characteristics. 633 It would be nice to use the capabilities built in to TCP to allow it 634 to determine if communications integrity exists to the failover 635 partner but this strategy contains some problems which require 636 analysis. There exist three fundamental cases for an open TCP con- 637 nection that must be examined. 639 1. When no data is being sent on a TCP connection, the TCP layer 640 also does not exchange any signaling messages to assure that 641 the peer is still up. 643 2. When data is queued to be sent, and the receiver has not 644 blocked the sending of additional data, then messages are 645 flowing across the TCP connection containing the applications 646 data. 648 3. When data is queued to be sent, and the receiver has blocked 649 the transmission of additional data, then persist messages are 650 flowing from the receiver to the sender to ensure that the 651 sender doesn't miss the receiver opening the window for 652 further transmissions. 654 The first case can be turned into the second case by sending 655 application-level keep-alive messages periodically when there is no 656 other data queued to be sent. Note TCP keep-alive messages might be 657 used as well, but they present additional problems. 659 Thus, we can ensure that the TCP connection has messages flowing 660 periodically across the connection fairly easily. The question 661 remains as to what TCP will do if the other end of the connection 662 fails to respond (either because of network partition or because the 663 receiving server crashes). TCP will attempt to retransmit a message 664 with an exponential backoff, and will eventually timeout that 665 retransmission. However, the length of that timeout cannot, in gen- 666 eral, be set on a per-connection basis, and is frequently as long as 667 nine minutes, though in some cases it may be as short as two minutes. 668 On some systems it can be set system-wide, while on other systems it 669 cannot be changed at all. 671 A value for this timeout that would be appropriate for the failover 672 protocol, say less than 1 minute, could have unpleasant side-effects 673 on other applications running on the same server, assuming that it 674 could be changed at all on the host operating system. 676 Nine minutes is a long time for the DHCP service to be unavailable to 677 any new clients that were being served by the server which has 678 crashed, when there is another server running that could respond to 679 them as soon as it determines that its partner is not operational. 681 The conclusion drawn from this analysis is that TCP provides very 682 useful support for the failover protocol in the areas of reliable and 683 ordered message delivery, but cannot by itself be relied upon to 684 detect partner server failure in a fashion acceptable to the needs of 685 the failover protocol. Additional failover protocol capabilities 686 have been created to support timely detection of partner server 687 failure. See section 8.3 for details on this mechanism. 689 4. Design Goals 691 This section lists the design goals and the limitations of the fail- 692 over protocol. 694 4.1. Design goals for this protocol 696 The following is a list of goals that are met by this protocol. They 697 are listed in priority order. 699 1. Implementations of this protocol must work with existing DHCP 700 client implementations based on the DHCP protocol [RFC 2131]. 702 2. Implementations of the protocol must work with existing BOOTP 703 relay agent implementations. 705 3. The protocol must provide failover redundancy between servers 706 that are not located on the same subnet. 708 4. Provide for continued service to DHCP clients through an 709 automated mechanism in the event of failure of the primary 710 server. 712 5. Avoid binding an IP address to a client while that binding is 713 currently valid for another client. In other words, do not 714 allocate the same IP address to two clients. 716 6. Minimize any need for manual administrative intervention. 718 7. Introduce no additional delays in server response time as a 719 result of the network communications required to implement the 720 failover protocol, i.e., don't require communications with the 721 partner between the receipt of a DHCPREQUEST and the 722 corresponding DHCPACK. 724 8. Share IP address ranges between primary and secondary servers; 725 i.e., impose no requirement that the pool of available 726 addresses be manually or permanently divided between servers. 728 9. Continue to meet the goals and objectives of this protocol in 729 the event of server failure or network partition. 731 10. Provide graceful reintegration of full protocol service after 732 server failure or network partition. 734 11. Allow for one computer to act as a secondary server for multi- 735 ple primary servers. The protocol must allow failover primary 736 and secondary configuration choices to be made at a granular- 737 ity smaller than "all of the subnets served by a single 738 server", though individual implementations may not choose to 739 allow such flexibility. 741 12. Ensure that an existing client can keep its existing IP 742 address binding if it can communicate with either the primary 743 or secondary DHCP server implementing this protocol - not just 744 whichever server that originally offered it the binding. 746 13. Ensure that a new client can get an IP address from some 747 server. Ensure that in the face of partition, where servers 748 continue to run but cannot communicate with each other, the 749 above goals and requirements may be met. In addition, when 750 the partition condition is removed, allow graceful automatic 751 re-integration without requiring human intervention. 753 14. If either primary or secondary server loses all of the infor- 754 mation that it has stored in stable storage, ensure that it be 755 able to refresh its stable storage from the other server. 757 15. Support load balancing between the primary and secondary 758 servers, and allow configuration of the percentage of the 759 client population served by each with a moderately fine 760 granularity. 762 4.2. Limitations of this protocol 764 The following are explicit limitations of this protocol. 766 1. This protocol provides only one level of redundancy through a 767 single secondary server for each primary server. 769 2. A subset of the address pool is reserved for secondary server 770 use. In order to handle the failure case where both servers 771 are able to communicate with DHCP clients, but unable to com- 772 municate with each other, a subset of the IP address pool must 773 be set aside as a private address pool for the secondary 774 server. The secondary can use these to service newly arrived 775 DHCP clients during such a period. The required size of this 776 private pool is based only on the arrival rate of new DHCP 777 clients and the length of expected downtime, and is not influ- 778 enced in any way by the total number of DHCP clients supported 779 by the server pair. 781 The failover protocol can be used in a mode where both the 782 primary and secondary servers can share the load between them 783 when both are operating. In this load balancing mode, the 784 addresses allocated by the primary server to the secondary 785 server are not unused, but are used instead to service the 786 portion of the client base to which the secondary server is 787 required to respond. See section 5.3 for more information on 788 load balancing. 790 3. The primary and secondary servers do not respond to client 791 requests at all while recovering from a failure that could 792 have resulted in duplicate IP assignments. (When synchroniz- 793 ing in POTENTIAL-CONFLICT state). 795 5. Protocol Overview 797 This section will discuss the failover protocol at a relatively high 798 level of detail. In the event that a description in this section 799 conflicts (or appears to conflict due to the overview nature of this 800 section) with information in later sections of this draft, the infor- 801 mation in the later sections should be considered authoritative. 803 5.1. Messages and States 805 This protocol is centered around the message exchange used by one 806 server to update the other server of binding database changes result- 807 ing from DHCP client activity: 809 o Communication of binding database changes 811 The binding update (BNDUPD) message is used to send the binding 812 database changes to the partner server, and the partner server 813 responds with a binding acknowledgement (BNDACK) message when it 814 has successfully committed those changes to its own stable 815 storage. 817 All of the other messages involve ancillary issues: 819 o Management of available IP addresses 821 The pool request (POOLREQ) message is used by the secondary 822 server to request an allocation of IP addresses from the primary 823 server. The pool response (POOLRESP) message is used by the 824 primary server to inform the secondary server how many IP 825 addresses were allocated to the secondary server as the result 826 of the pool request. 828 o Synchronization of the binding databases between the servers 829 after they've been out of communications 831 The update request (UPDREQ) message is used by one server to 832 request that its partner send it all binding database informa- 833 tion that it has not already seen. The update request all 834 (UPDREQALL) message is used by one server to request that all 835 binding database information be sent in order to recover from a 836 total loss of its binding database by the requesting server. 837 The update done (UPDDONE) message is used by the responding 838 server to indicate that all requested updates have been sent the 839 responding server and acked by the requesting server. 841 o Connection establishment 843 The connect (CONNECT) message is used by the primary server to 844 establish a high level connection with the other server, and to 845 transmit several important configuration data items between the 846 servers. The connect acknowledgement message (CONNECTACK) is 847 used by the secondary server to respond to a CONNECT message 848 from the primary server. The disconnect (DISCONNECT) message is 849 used by either server when closing a connection. 851 o Server synchronization 853 The state change (STATE) message is used by either server to 854 inform the other server of a change of failover state. 856 o Connection integrity management 858 The contact (CONTACT) message is used by either server to ensure 859 that the other server continues to see the connection as opera- 860 tional. It MUST be transmitted periodically over every esta- 861 blished connection if other message traffic is not flowing, and 862 it MAY be sent at any time. 864 5.1.1. Failover endpoints 866 The proper operation of the failover protocol requires more than the 867 transmission of messages between one server and the other. Each end- 868 point might seem to be a single DHCP server, but in fact there are 869 many situations where additional flexibility in configuration is use- 870 ful. 872 For instance, there might be several servers which are each primary 873 for a distinct set of address pools, and one server which is secon- 874 dary for all of those address pools. The situation with the pri- 875 maries is straightforward, but the secondary will need to maintain a 876 separate failover state, partner state, and communications up/down 877 status for each of the separate primary servers for which it is act- 878 ing as a secondary. 880 The failover protocol calls for there to be a unique failover end- 881 point per partner per role (where role is primary or secondary). 882 This failover endpoint can take actions and hold unique states. 883 There are thus a maximum of two failover endpoints per partner (one 884 for the partner as a primary and one for that same partner as a 885 secondary.) 887 Thus, in the case where there are two primary servers A and B each 888 backed up by a single common secondary server C, there is one fail- 889 over endpoint on each of A and B, and two different failover end- 890 points on C. The two different failover endpoints on C each have 891 unique states and independent TCP connections. 893 This document frequently describes the behavior of the protocol in 894 terms of primary and secondary servers, not primary and secondary 895 failover endpoints. However, it is important to remember that every 896 'server' described in this document is in reality a failover endpoint 897 that resides in a particular process, and that many failover end- 898 points may reside in the same process. 900 It is not the case that there is a unique failover endpoint for each 901 subnet address pool that participates in a failover relationship. On 902 one server, there is one failover endpoint per partner per role, 903 regardless of how many subnet address pools are managed by that com- 904 bination of partner and role. Conversely, on a particular server, 905 any given subnet address pool will be associated with exactly one 906 failover endpoint. 908 When a connection is received from the partner, the unique failover 909 endpoint to which the message is directed is determined solely by the 910 IP address of the partner and the port to which the connection is 911 directed by the partner. See section 8.2. 913 5.2. Fundamental guarantees 915 There a several fundamental restrictions this protocol places on what 916 one server can do in the absence of knowledge of the other server. 917 Operating within these restrictions allows certain guarantees to be 918 made to the partner server, and these are key to the correct opera- 919 tion of the protocol. 921 5.2.1. Control of lease time 923 The key problem with lazy update is that when a server fails after 924 updating a client with a particular lease time and before updating 925 its partner, the partner will believe that a lease has expired even 926 though the client still retains a valid lease on that IP address. 928 In order to handle this problem, a period of time known as the "Max- 929 imum Client Lead Time" (MCLT) is defined and must be known to both 930 the primary and secondary servers. Proper use of this time interval 931 places an upper bound on the difference allowed between the lease 932 time provided to a DHCP client by a server and the lease time known 933 by that server's partner. However, the MCLT is typically much less 934 than the lease time that a server has been configured to offer a 935 client, and so some strategy must exist to allow a server to offer 936 the configured lease time to a client. During a lazy update the 937 updating server typically updates its partner with a potential 938 expiration time which is longer than the lease time previously given 939 to the client and which is longer than the lease time that the server 940 has been configured to give a client. This allows that server to 941 give a longer lease time to the client the next time the client 942 renews its lease, since the time that it will give to the client will 943 not exceed the MCLT beyond the potential expiration time acknowledged 944 by its partner. 946 The PARTNER-DOWN state exists so that a server can be sure that its 947 partner is, indeed, down. Correct operation while in that state 948 requires (generally) that the server wait the MCLT after anything 949 that happened prior to its transition into PARTNER-DOWN state (or, 950 more accurately, when the other server went down if that is known). 951 Thus, the server MUST wait the MCLT after the partner server went 952 down before allocating any of the partner's addresses which were 953 available for allocation. In the event the partner was not in com- 954 munication prior to going down, it might have allocated one or more 955 of its FREE addresses to a DHCP client and been unable to inform the 956 server entering PARTNER-DOWN prior to going down itself. By waiting 957 the MCLT after the time the partner went down, the server in 958 PARTNER-DOWN state ensures that any clients which have a lease on one 959 of the partner's FREE addresses will either time out or contact the 960 server in PARTNER-DOWN by the time that period ends. 962 In addition, once a server has made a transition to PARTNER-DOWN 963 state, it MUST NOT reallocate an IP address from one client to 964 another client until the longer of the following two times: 966 o The MCLT after the time the partner server went down (see 967 above). 969 o An additional MCLT interval after the lease by the original 970 client expires. (Actually, until the maximum client lead time 971 after what it believes to be the lease expiration time of the 972 client.) 974 Some optimizations exist for this restriction, in that it only 975 applies to leases that were issued BEFORE entering PARTNER-DOWN. Once 976 a server has entered PARTNER-DOWN and it leases out an address, it 977 need not wait this time as long as it has never communicated with the 978 partner since the lease was given out. 980 The fundamental relationship on which much of the correctness of this 981 protocol depends is that the lease expiration time known to a DHCP 982 client MUST NOT be more than the maximum client lead time greater 983 than the potential expiration time known to a server's partner. 985 The remainder of this section makes the above fundamental relation- 986 ship more explicit. 988 This protocol requires a DHCP server to deal with several different 989 lease intervals and places specific restrictions on their relation- 990 ships. The purpose of these restrictions is to allow the other server 991 in the pair to be able to make certain assumptions in the absence of 992 an ability to communicate between servers. 994 The different lease times are: 996 o desired lease interval 997 The desired lease interval is the lease interval that a DHCP server 998 would like to give to a DHCP client in the absence of any restric- 999 tions imposed by the Failover protocol. Its determination is out- 1000 side of the scope of this protocol. Typically this is the result of 1001 external configuration of a DHCP server. 1003 o actual lease interval 1005 The actual lease internal is the lease interval that a DHCP server 1006 gives out to a DHCP client in the dhcp-lease-time option of a 1007 DHCPACK packet. It may be shorter than the desired client lease 1008 interval (as explained below). 1010 o potential lease interval 1012 The potential lease interval is the lease expiration interval the 1013 local server tells to its partner in the potential-expiration-time 1014 option of a BNDUPD message. 1016 o acknowledged potential lease interval 1018 The acknowledged potential lease interval is the potential lease 1019 interval the partner server has most recently acknowledged in the 1020 potential-expiration-time option of a BNDACK message. 1022 The key restriction (and guarantee) that any server makes with 1023 respect to lease intervals is that the actual client lease interval 1024 never exceeds the acknowledged potential lease interval (if any) by 1025 more than a fixed amount. This fixed amount is called the "Maximum 1026 Client Lead Time" (MCLT). 1028 The MCLT MAY be configurable on the primary server, but for correct 1029 server operation it MUST be the same and known to both the primary 1030 and secondary servers. The secondary server determines the MCLT from 1031 the MCLT option sent from the primary server to the secondary server 1032 in the CONNECT message. 1034 A server MUST record in its stable storage both the actual lease 1035 interval and the most recently acknowledged potential lease interval 1036 for each IP address binding. It is assumed that the desired client 1037 lease interval can be determined through techniques outside of the 1038 scope of this protocol. See section 7.1.5 for more details concern- 1039 ing the times that the server MUST record in its stable storage and 1040 the way that they interact with the lease time that may be offered to 1041 a DHCP client. 1043 Again, the fundamental relationship among these times which MUST be 1044 maintained is: 1046 actual lease interval < 1047 ( acknowledged potential lease interval + MCLT ) 1049 Figure 5.2.1-1 illustrates an initial lease to a client using the 1050 rules discussed in the example which follows it. Note that this is 1051 only one example -- as long as the fundamental relationship is 1052 preserved, the actual times used could be quite different. 1054 DHCP Primary Secondary 1055 time Client Server Server 1057 | (time in intervals) | (absolute time) | 1058 | | | 1059 | >-DHCPDISCOVER-> | | 1060 | <---DHCPOFFER-< | | 1061 | lease-time=MCLT | | 1062 | | | 1063 | >-DHCPREQUEST-> | | 1064 | (selecting) | | 1065 | | | 1066 t | <--------DHCPACK-< | | 1067 | lease-time=MCLT | | 1068 | | >-BNDUPD--> | 1069 | | lease-expiration=t+MCLT 1070 | | potential-expiration=t+(MCLT/2)+X 1071 | | | 1072 | | <-BNDACK-< | 1073 | | potential-expiration=t+(MCLT/2)+X 1074 ... ... ... 1075 | | | 1076 t+MCLT/2 | >-DHCPREQUEST-> | | 1077 | (renew) | | 1078 | | | 1079 t1 | <--------DHCPACK-< | | 1080 | lease-time=X | | 1081 | | >-BNDUPD--> | 1082 | | lease-expiration=t1+X 1083 | | potential-expiration=t1+(X/2)+X 1084 | | | 1085 | | <-BNDACK-< | 1086 | | potential-expiration=t1+(X/2)+X 1087 ... ... ... 1089 Figure 5.2.1-1: Lazy Update Message Traffic 1090 X = Desired Lease Interval 1091 Assumes renewal interval = lease interval / 2 1093 DISCUSSION: 1095 This protocol mandates only that the above fundamental relation- 1096 ship concerning lease intervals is preserved. 1098 In the interests of clarity, however, let's examine a specific 1099 example. The MCLT in this case is 1 hour. The desired lease 1100 interval is 3 days, and its renewal time is half the lease inter- 1101 val. 1103 The rules for this example are: 1105 o What to tell the client: 1107 Take the remainder of the acknowledged potential lease interval. 1108 If this is a new lease, then this value will be zero. If this 1109 remainder plus the MCLT is greater than the desired lease inter- 1110 val, give the client the desired lease interval else give the 1111 client the remainder plus the MCLT. 1113 o What to tell the failover partner server: 1115 Take the renewal interval (typically half of the actual client 1116 lease interval), add to it the desired lease interval, and add 1117 it to the current time to yield the value that goes into the 1118 potential-expiration-time option. 1120 Also tell the failover partner the actual lease interval by 1121 adding it to the current time to yield the value that goes into 1122 the lease-expiration option. 1124 In operation this might work as follows: 1126 When a server makes an offer for a new lease on an IP address to a 1127 DHCP client, it determines the desired lease interval (in this 1128 case, 3 days). It then examines the acknowledged potential lease 1129 interval (which in this case is zero) and determines the remainder 1130 of the time left to run, which is also zero. To this it adds the 1131 MCLT. Since the actual lease interval cannot be allowed to exceed 1132 the remainder of the current acknowledged potential lease interval 1133 plus the MCLT, the offer made to the client is for the remainder 1134 of the current acknowledged potential lease interval (i.e., zero) 1135 plus the MCLT. Thus, the actual lease interval is 1 hour. 1137 Once the server has performed the DHCPACK to the DHCP client, it 1138 will update the secondary server with the lease information. How- 1139 ever, the desired potential lease interval will be composed of one 1140 half of the current actual lease interval added to the desired 1141 lease interval. Thus, the secondary server is updated with a 1142 BNDUPD with a lease interval of 3 days + 1/2 hour specified in the 1143 potential-expiration-time option. 1145 When the primary server receives a BNDACK to its update of the 1146 secondary server's (partner's) potential lease interval, it 1147 records that as the acknowledged potential lease interval. A 1148 server MUST NOT send a BNDACK in response to a BNDUPD message 1149 until it is sure that the information in the BNDUPD message 1150 resides in its stable storage. Thus, the primary server in this 1151 case can be sure that the secondary server has recorded the poten- 1152 tial lease interval in its stable storage when the primary server 1153 receives a BNDACK message from the secondary server. 1155 When the DHCP client attempts to renew at T1 (approximately one 1156 half an hour from the start of the lease), the primary server 1157 again determines the desired lease interval, which is still 3 1158 days. It then compares this with the remaining acknowledged 1159 potential lease interval (3 days + 1/2 hour) and adjusts for the 1160 time passed since the secondary was last updated (1/2 hour). Thus 1161 the time remaining of the acknowledged potential lease interval is 1162 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which 1163 is more than the desired lease interval of 3 days. So the client 1164 is renewed for the desired lease interval -- 3 days. 1166 When the primary DHCP server updates the secondary DHCP server 1167 after the DHCP client's renewal ACK is complete, it will calculate 1168 the desired potential lease interval as the T1 fraction of the 1169 actual client lease interval (1/2 of 3 days this time = 1.5 days). 1170 To this it will add the desired client lease interval of 3 days, 1171 yielding a total desired partner server lease interval of 4.5 1172 days. In this way, the primary attempts to have the secondary 1173 always "lead" the client in its understanding of the client's 1174 lease interval so as to be able to always offer the client the 1175 desired client lease interval. 1177 Once the initial actual client lease interval of the MCLT is past, 1178 the protocol operates effectively like the DHCP protocol does 1179 today in its behavior concerning lease intervals. However, the 1180 guarantee that the actual client lease interval will never exceed 1181 the remaining acknowledged partner server lease interval by more 1182 than the MCLT allows full recovery from a variety of failures. 1184 5.2.2. Controlled re-allocation of IP addresses 1186 When in PARTNER-DOWN state there is a waiting period after which an 1187 IP address can be re-allocated to another client. For IP addresses 1188 which are available when the server enters PARTNER-DOWN state, the 1189 period is the MCLT from entry into PARTNER-DOWN state. For IP 1190 addresses which are not available when the server enters PARTNER-DOWN 1191 state, the period is the MCLT after the IP address becomes available. 1192 See section 9.4.2 for more details. 1194 In any other state, a server cannot reallocate an address from one 1195 client to another without first notifying its partner (through a 1196 BNDUPD message) and receiving acknowledgement (through a BNDACK mes- 1197 sage) that its partner is aware that that first client is not using 1198 the address. 1200 This could be modeled in the following way. Though this specific 1201 implementation is in no way required, it may serve to better illus- 1202 trate the concept. 1204 An "available" IP address on a server may be allocated to any client. 1205 An IP address which was leased to a client and which expired or was 1206 released by that client would take on a new state, EXPIRED or 1207 RELEASED respectively. The partner server would then be notified 1208 that this IP address was EXPIRED or RELEASED through a BNDUPD. When 1209 the sending server received the BNDACK for that IP address showing it 1210 was FREE, it would move the IP address from EXPIRED or RELEASED to 1211 FREE, and it would be available for allocation by the primary server 1212 to any clients. 1214 A server MAY reallocate an IP address in the EXPIRED or RELEASED 1215 state to the same client with no restrictions provided it has not 1216 sent a BNDUPD message to its partner. This situation would exist if 1217 the lease expired or was released after the transition into PARTNER- 1218 DOWN state, for instance. 1220 5.3. Load balancing 1222 In order to implement load balancing between a primary and secondary 1223 server pair, each server must respond to DHCPDISCOVER requests from 1224 some clients and not from other clients. In order to do this suc- 1225 cessfully, each server must be able to determine immediately upon 1226 receipt of a DHCP client request whether it is to service this 1227 request or to ignore it in order to allow the other server to service 1228 the request. 1230 In addition, it should be possible to configure the percentage of 1231 clients which will be serviced by either the primary or secondary 1232 server. This configuration should be more or less continuous, from 1233 all clients serviced by the primary through an even split with half 1234 serviced by each, to all clients serviced by the secondary. 1236 The technique chosen to support these goals is described in [RFC 1237 3074]. 1239 A bitmap-style Hash Bucket Assignment (as described in [RFC 3074]) is 1240 used to determine which DHCP clients can be processed. There are two 1241 potential HBA's in a failover server -- a server HBA and a failover 1242 HBA. The way that a server acquires a server HBA is outside of the 1243 scope of the failover protocol, but both servers in a failover pair 1244 MUST have the same server HBA. The failover HBA (which specifies the 1245 clients that the secondary is supposed to process) is sent by the 1246 primary server to the secondary server whenever a connection is esta- 1247 blished, using the hash-bucket-assignment option defined in section 1248 12.11. 1250 When using the server HBA (if any) and the failover HBA (if any), to 1251 decide whether to process a DHCP request, the server HBA always 1252 applies in every failover state, and the failover HBA (which MUST be 1253 a subset of the server HBA) is used by the secondary server to decide 1254 which packets to process when in NORMAL state. 1256 5.4. IP address allocations between servers 1258 The failover protocol allows a DHCP server which implements it to 1259 operate correctly in spite of the uncertainty over whether its 1260 partner has failed or whether the communications link to its partner 1261 has failed. This is made possible in part by the existence of 1262 separate address pools on each server for allocation to newly arrived 1263 DHCP clients. 1265 Thus, each server has its own pool of available IP addresses. Note 1266 that an IP address is not "owned" by a particular server throughout 1267 its entire lifetime. Only an IP address which is available is 1268 "owned" by a particular server -- once it has been leased to a DHCP 1269 client, it is not owned by either failover partner. When it finally 1270 becomes available again, it will be owned initially by the primary 1271 server, and it may or may not be allocated to the secondary server by 1272 the primary server. 1274 So, the flow of IP address ownership is as follows: initially an IP 1275 address is owned by the primary server. It may be allocated to the 1276 secondary server if it is available, and then it is owned by the 1277 secondary server. Either server can allocate available IP addresses 1278 which they own to DHCP clients, in which case they cease to own them. 1279 When the DHCP client releases the address or the lease on it expires, 1280 it will again become available and will be owned by the primary. 1282 An IP address will not become owned by the server which allocated it 1283 initially when it is released or the lease expires because, in gen- 1284 eral, that server will have had to replenish its pool of available 1285 addresses well in advance of any likely lease expirations. Thus, 1286 having a particular IP address cycle back to the secondary might well 1287 put the secondary more out of balance with respect to the primary 1288 instead of enhancing the balance of available addresses between them. 1290 These address pools are used when in COMMUNICATIONS-INTERRUPTED state 1291 and while waiting for the MCLT expiration in PARTNER-DOWN state. In 1292 addition, when using load balancing, these pools are used when in 1293 NORMAL state as well. 1295 This allocation and maintenance of these address pools is an area of 1296 some sensitivity, since the goal is to maintain a more or less con- 1297 stant ratio of available addresses between the two servers. 1299 The initial allocation when the servers first integrate is triggered 1300 by the POOLREQ message from the secondary to the primary. This is 1301 followed by the POOLRESP message where the primary tells the secon- 1302 dary how many IP addresses it allocated to the secondary. Then, the 1303 primary sends the allocated IP addresses to the secondary via BNDUPD 1304 messages. l The POOLREQ/POOLRESP message is a trigger to the primary 1305 to perform a scan of its database and to ensure that the secondary 1306 has enough IP addresses (based on some configured ratio). 1308 The actual IP addresses are sent to the secondary using the BNDUPD 1309 message with a state of BACKUP, which indicates the IP address is now 1310 available for allocation by the secondary. Once the message is sent, 1311 the primary MUST NOT use these addresses for allocation to DHCP 1312 clients. 1314 The POOLREQ/POOLRESP message exchange initiated by the secondary is 1315 valid at any time, and the primary server SHOULD, whenever it 1316 receives the POOLREQ message, scan its database of address pools and 1317 determine if the secondary needs more IP addresses from any of the IP 1318 address pools. 1320 However, in order to support a reasonably dynamic balance of the IP 1321 addresses between the failover partners, the primary server needs to 1322 do additional work to ensure that the secondary server has as many IP 1323 addresses as it needs (but that it doesn't have *more* than it needs 1324 either). 1326 The primary server SHOULD examine the balance of available addresses 1327 between the primary and secondary for a particular address pool when- 1328 ever the number of available addresses for either the primary or 1329 secondary changes. The primary server SHOULD adjust the available 1330 address balance as required to ensure the configured address balance, 1331 excepting that the primary server SHOULD employ some threshold 1332 mechanism to such a balance adjustment in order to minimize the over- 1333 head of maintaining this balance. 1335 An example of a threshold approach is: do not attempt to re-balance 1336 the available pools on the primary and secondary until the out of 1337 balance value exceeds a configured value. 1339 The primary server can, at any time, send an available IP address to 1340 the secondary using a BNDUPD with the state BACKUP. The primary 1341 server can attempt to take an available IP address away from the 1342 secondary by sending a BNDUPD with the state FREE. If the secondary 1343 accepts the BNDUPD, then it is now available to the PRIMARY and not 1344 available to the secondary. Of course, the secondary MUST reject 1345 that BNDUPD if it has already used that IP address for a DHCP client. 1347 Whenever the primary server examines the possible available IP 1348 addresses which it could send to the secondary server, the primary 1349 server SHOULD take into account whether load balancing is in use, and 1350 it SHOULD attempt to send to the secondary any IP addresses whose 1351 most recent client would be processed by the secondary under the 1352 current load balancing regime in use. Likewise, when removing avail- 1353 able IP addresses from the secondary server when load balancing is in 1354 use, the primary server SHOULD first remove those IP addresses whose 1355 most recent client would be processed by the primary server under the 1356 current load balancing regime in use. 1358 5.5. Operating in NORMAL state 1360 When in NORMAL state, each server services DHCPDISCOVER's and all 1361 other DHCP requests other than DHCPREQUEST/RENEWAL or 1362 DHCPREQUEST/REBINDING from the client set defined by the load balanc- 1363 ing algorithm [RFC 3074]. Each server services DHCPREQUEST/RENEWAL 1364 or DHCPDISCOVER/REBINDING requests from any client. 1366 In general, whenever the binding database is changed in stable 1367 storage (other than a change resulting from receiving a BNDUPD from 1368 the failover partner), then a BNDUPD message is sent with the con- 1369 tents of that change to the partner server. The partner server then 1370 writes the information about that binding in its bindings database in 1371 stable storage and replies with a BNDACK message. 1373 The binding database in a DHCP server would normally be changed as a 1374 result of DHCP protocol activity with a DHCP client (e.g., granting 1375 a lease to a DHCP client through the familiar 1376 DISCOVER/OFFER/REQUEST/ACK cycle or extending a lease due to a 1377 renewal from a DHCP client) or possibly (on some servers) because a 1378 lease has expired or undergone another state change that must be 1379 recorded in the DHCP binding database. These are the state changes 1380 that would be communicated to the partner server using a BNDUPD mes- 1381 sage. Of course, receipt of a BNDUPD message itself will normally 1382 cause an update of the binding database for all of the IP addresses 1383 contained in the BNDUPD, and a binding database change such as this 1384 MUST NOT trigger a corresponding BNDUPD message to the partner. 1386 5.6. Operating in COMMUNICATIONS-INTERRUPTED state 1388 When operating in COMMUNICATIONS-INTERRUPTED state, each server is 1389 operating independently, but does not assume that its partner is not 1390 operating. The partner server might be operating and simply unable 1391 to communicate with this server, or might not be operating. 1393 Each server responds to the full range of DHCP client messages that 1394 it receives (subject to server load balancing [RFC 3074]), but in 1395 such a way that graceful reintegration is always possible when its 1396 partner comes back into contact with it. 1398 5.7. Operating in PARTNER-DOWN state 1400 When operating in PARTNER-DOWN state, a server assumes that its 1401 partner is not currently operating, but does make allowances for the 1402 possibility that that server was operating in the past, though possi- 1403 bly out of communications with this server. It responds to all DHCP 1404 client requests in PARTNER-DOWN state (subject to server load balanc- 1405 ing [RFC 3074]). 1407 5.8. Operating in RECOVER state 1409 A server operating in RECOVER state assumes that it is reintegrating 1410 with a server that has been operating in PARTNER-DOWN state, and that 1411 it needs to update its bindings database before it services DHCP 1412 client requests. 1414 A server may also operate in RECOVER state in order to fully recover 1415 its bindings database from its partner server. 1417 5.9. Operating in STARTUP state 1419 A server operating in STARTUP state assumes that failover is opera- 1420 tional, and it spends a short time whenever it comes up attempting to 1421 contact the partner. During this short time, the server is unrespon- 1422 sive to DHCP client requests. This period exists in order to give a 1423 server a chance to determine that its partner has changed state since 1424 it was last in communications, and to react to that changed state (if 1425 any) prior to responding to DHCP client requests. 1427 The startup period SHOULD be conditioned on the length of time the 1428 server has been down (if that can be determined). If the server has 1429 been down less than the MCLT then it can wait only a few (say 5 or 1430 10) seconds. If it has been down a longer time (such that the 1431 partner may well have moved to PARTNER-DOWN state), a considerably 1432 longer startup period of 30 to 60 seconds may be warranted, since the 1433 consequences of running while the partner is in PARTNER-DOWN state 1434 are unpleasant. 1436 The period of time a server remains in STARTUP state SHOULD be long 1437 enough to ensure that it will connect to the other server if that 1438 server is available for connections. 1440 5.10. Time synchronization between servers 1442 The failover protocol is designed to operate between two servers 1443 which have time values which differ by an arbitrarily large amount. 1444 A particular implementation MAY choose to only support servers whose 1445 time values differ by an arbitrarily small amount. 1447 In any event, whether large or only small differences in time values 1448 are supported, every message that is received MUST be tagged with a 1449 time value as soon as possible after receipt. This time value is 1450 used along with the time value that is sent in every message between 1451 the failover partners to develop a delta time between the servers. 1452 This delta time is used during the connection process to establish a 1453 baseline delta time between the servers, and upon receipt of each 1454 message, the delta time for that message is used to refine the delta 1455 time for the server pair. 1457 While the algorithm for this refinement of delta time is not speci- 1458 fied as part of this protocol, a server SHOULD allow the delta time 1459 value for a pair of failover servers to be periodically updated to 1460 account for time drift. In addition, the delta time value between 1461 servers SHOULD be smoothed in some fashion, so that transient network 1462 delays will not cause it to vary wildly. 1464 A server SHOULD recognize a drastic change in the delta time value as 1465 an event to be signaled to a network administrator, as well as reset- 1466 ting the time delta between the failover partners. 1468 The specific definitions of a minor or drastic change in delta time 1469 as well as the algorithm used to smooth minor changes into the run- 1470 ning delta time are implementation issues and are not further 1471 addressed in this document. 1473 5.11. IP address binding-status 1475 In most DHCP servers an IP address can take on several different 1476 binding-status values, sometimes also called states. While no two 1477 DHCP servers probably have exactly the same possible binding-status 1478 values, the DHCP RFC enforces some commonality among the general 1479 semantics of the binding-status values used by various DHCP server 1480 implementations. 1482 In order to transmit binding database updates between one server and 1483 another using the failover protocol, some common denominator 1484 binding-status values must be defined. It is not expected that these 1485 binding-status-values correspond with any actual implementation of 1486 the DHCP protocol in a DHCP server, but rather that the binding- 1487 status values defined in this document should be a common denominator 1488 of those in use by many DHCP server implementations. It is a goal of 1489 this protocol that any DHCP server can map the various IP address 1490 binding-status values that it uses internally into these failover IP 1491 address binding-status values on transmission of binding database 1492 updates to its partner, and likewise that it can map any failover IP 1493 address binding-status values it received in a binding update into 1494 its internal IP address binding-status values. 1496 The IP address binding-status values defined for the failover proto- 1497 col are listed below. Unless otherwise noted below, there MAY be 1498 client information associated with each of these binding-status 1499 values. 1501 o ACTIVE -- Lease is assigned to a client. Client identification 1502 MUST appear. 1504 o EXPIRED -- indicates that a client's binding on an IP address 1505 has expired. When the partner server ACK's the BNDUPD of an 1506 EXPIRED IP address, the server sets its internal state to FREE. 1507 It is then available for allocation to any client of the primary 1508 server. It may be allocated to the same client on the server 1509 where the lease expired if a BNDUPD containing the EXPIRED state 1510 has not yet been sent to the partner (e.g., in the event that 1511 the servers are not in communication). Client identification 1512 SHOULD appear. 1514 o RELEASED -- indicates that a DHCP client sent in a DHCPRELEASE 1515 message. When the partner server ACK's the BNDUPD of an 1516 RELEASED IP address, the server sets its internal state to FREE, 1517 and it is available for allocation by the primary server to any 1518 DHCP client. It may be allocated to the same client if a BNDUPD 1519 has not yet been sent to the partner. Client identification 1520 SHOULD appear. 1522 o FREE -- is used when a DHCP server needs to communicate that an 1523 IP address is unused by any DHCP client, but it was not just 1524 released, expired, or reset by a network administrator. When 1525 the partner server ACK's the BNDUPD of a FREE IP address, the 1526 server sets its internal state such that it is available for 1527 allocation by the primary DHCP server to any DHCP client. (Note 1528 that in PARTNER-DOWN state, after waiting the MCLT, the IP 1529 address MAY be allocated to a DHCP client by the secondary 1530 server.) 1532 Note that when an IP address that was allocated by the secondary 1533 reverts to the FREE state, it must (like any other IP address) 1534 be assigned to the secondary through the POOLREQ/BNDUPD process 1535 before the secondary can reallocate it. 1537 Client identification MAY appear. 1539 o ABANDONED -- indicates that an IP address is considered unusable 1540 by the DHCP subsystem. An IP address for which a valid PING 1541 response was received SHOULD be set to ABANDONED. An IP address 1542 for which a DHCPDECLINE was received should be set to ABANDONED. 1543 Client identification MUST NOT appear. 1545 o RESET -- indicates that this IP address was made available by 1546 operator command. This is a distinct state so that the reason 1547 that the IP address became FREE can be determined. Client iden- 1548 tification MAY appear. 1550 o BACKUP -- indicates that this IP address can be allocated by the 1551 secondary server to a DHCP client at any time. When the MCLT has 1552 passed after its time of entry into PARTNER-DOWN state, the IP 1553 address may be allocated by the primary to any DHCP client. 1554 Client identification MAY appear. 1556 These binding-status values are communicated from one failover 1557 partner to another using the binding-status option, see section 12.3 1558 for details of this option. Unless otherwise noted above there MAY 1559 be client information associated with each of these binding-status 1560 values. 1562 An IP address will move between these binding-status values using the 1563 following state transition diagram: 1565 DHCP client DECLINE or 1566 server detected problem 1567 from any state 1568 | 1569 V 1570 +----------+ +--+------+ 1571 External >---->| RESET | (3) |ABANDONED| 1572 command | +<--------+ | 1573 +----------+ +---------+ 1574 | 1575 Comm w/Parter(1) 1576 V 1577 +---------+ Comm(1) +----------+ Comm(1) +---------+ 1578 | EXPIRED |--------->| FREE |<----------| RELEASED| 1579 | | w/Parter | | w/Partner | | 1580 +---------+ +----------+ +---------+ 1581 ^ ^ | | +-----------+ ^ 1582 | | | | | | 1583 | Exp. grace IP | IP addr alloc. IP addr | 1584 | period ends address to sec.(2) reserved | 1585 | | leased V | | 1586 | | by | +----------+ | | 1587 | | primary | BACKUP |<---+ | 1588 | wait for | | | | 1589 | grace period | +----------+ | 1590 | | | | | 1591 | | | IP addr leased by | 1592 | Expired grace | secondary | 1593 | period exists V V | 1594 | | +----------+ | 1595 | | Lease on | ACTIVE | DHCPRELEASE | 1596 +-----+-IP addr---| |------------------+ 1597 expires +----------+ 1599 Figure 5.11-1: Transitions between binding-status values. 1601 (1) This transition MAY also occur if the server is in 1602 PARTNER-DOWN state and the MCLT has passed since the entry 1603 in the RELEASED, EXPIRED, or RESET states. 1605 (2) This transition MAY occur if the server is the secondary 1606 and the MCLT has passed since its entry into PARTNER-DOWN state. 1608 (3) This transition MAY occur due to an implementation specific 1609 handling of ABANDONED IP addresses. 1611 Again, note that a DHCP server implementing the failover protocol 1612 does not have to implement either this state machine or use these 1613 particular binding-status values in its normal operation of allocat- 1614 ing IP addresses to DHCP clients. It only needs to map its internal 1615 binding-status-values onto these "standard" binding-status values, 1616 and map these "standard" binding-status values back into its internal 1617 binding-status values. For example, a server which implements a 1618 grace period for a IP address binding SHOULD simply wait to update 1619 its partner server until the grace period on that binding has run 1620 out. 1622 The process of setting an IP address to FREE deserves some detailed 1623 discussion. When an IP address is moved to the EXPIRED,RELEASED, or 1624 RESET binding-status on a server, it will send a BNDUPD with the 1625 binding-status of EXPIRED, RELEASED, or RESET to its partner. If its 1626 partner agrees that is acceptable (see sections 7.1.2 and 7.1.3 con- 1627 cerning why a server might not accept a BNDUPD) it will return a 1628 BNDACK with no reject-reason, signifying that it accepted the update. 1629 As part of the BNDUPD processing, the server returning the BNDACK 1630 will set the binding-status of the IP address to FREE, and upon 1631 receipt of the BNDACK the server which sent the BNDUPD will set the 1632 binding-status of the IP address to FREE. Thus, the EXPIRED, 1633 RELEASED, or RESET binding-status is something of a transitory state. 1634 This process is encoded in the transition diagram above by "Comm 1635 w/Partner". 1637 5.12. DNS dynamic update considerations 1639 DHCP servers (and clients) can use DNS Dynamic Updates as described 1640 in [RFC 2136] to maintain DNS name-mappings as they maintain DHCP 1641 leases. Many different administrative models for DHCP-DNS integra- 1642 tion are possible. Descriptions of several of these models, and 1643 guidelines that DHCP servers and clients should follow in carrying 1644 them out, are laid out in [FQDN]. The nature of the DHCP failover 1645 protocol introduces some issues concerning dynamic DNS updates that 1646 are not part of non-failover DHCP environments. This section 1647 describes these issues, and defines the information which failover 1648 partners should exchange and the protocol which they should follow in 1649 order to ensure consistent behavior. The presence of this section 1650 should not be interpreted as requiring that implementations of the 1651 DHCP failover protocol must also support DDNS updates. The purpose 1652 of this discussion is to clarify the areas where the DHCP failover 1653 and DHCP-DDNS protocols intersect for the benefit of implementations 1654 which support both protocols, not to introduce a new requirement into 1655 the DHCP failover protocol. Thus, a DHCP server which implements the 1656 failover protocol MAY also support dynamic DNS updates, but if it 1657 does support dynamic DNS updates it SHOULD utilize the techniques 1658 described here in order to correctly distribute them between the 1659 failover partners. See [FQDN], [DNSRES], and [DHCID] for details of 1660 how DHCP servers update DNS. 1662 From the standpoint of the failover protocol, there is no reason why 1663 a server which is utilizing the DDNS protocol to update a DNS server 1664 should not be a partner with a server which is not utilizing the DDNS 1665 protocol to update a DNS server. However, a server which is not able 1666 to support DDNS or is not configured to support DDNS SHOULD output a 1667 warning message when it receives BNDUPD messages which indicate that 1668 its failover partner is configured to support the DDNS protocol to 1669 update a DNS server. An implementation MAY consider this an error 1670 and refuse to operate, or it MAY choose to operate anyway, having 1671 warned the user of the problem in some way. 1673 5.12.1. Relationship between failover and dynamic DNS update 1675 The failover protocol describes the conditions under which each fail- 1676 over server may renew a lease to its current DHCP client, and 1677 describes the conditions under which it may grant a lease to a new 1678 DHCP client. An analogous set of conditions determines when a fail- 1679 over server should initiate a DDNS update, and when it should attempt 1680 to remove records from the DNS. The failover protocol's conditions 1681 are based on the desired external behavior: avoiding duplicate 1682 address assignments; allowing clients to continue using leases which 1683 they obtained from one failover partner even if they can only commun- 1684 icate with the other partner; allowing the backup DHCP server to 1685 grant new leases even if it is unable to communicate with the primary 1686 server. The desired external DDNS behavior for DHCP failover servers 1687 is: 1689 1. Allow timely DDNS updates from the server which grants a 1690 client a lease. Recognize that there is often a DDNS update 1691 lifecycle which parallels the DHCP lease lifecycle. This is 1692 likely to include the addition of records when the lease is 1693 granted, and the removal of DNS records when the lease is sub- 1694 sequently made available for allocation to a different client. 1696 2. Communicate enough information between the two failover 1697 servers to allow one to complete the DDNS update 'lifecycle' 1698 even if the other server originally granted the lease. 1700 3. Avoid redundant or overlapping DDNS updates, where both fail- 1701 over servers are attempting to perform DDNS updates for the 1702 same lease-client binding. Avoid situations where one partner 1703 is attempting to add RRs related to a lease binding while the 1704 other partner is attempting to remove RRs related to the same 1705 lease binding. 1707 5.12.2. Use of the DDNS option 1709 In order for either server to be able to complete a DDNS update, or 1710 to remove DNS records which were added by its partner, both servers 1711 need to know the FQDN associated with the lease-client binding. The 1712 FQDN associated with the client's A RR and PTR RR SHOULD be communi- 1713 cated from the server which adds records into the DNS to its partner. 1714 The initiating server SHOULD use the DDNS option in the BNDUPD mes- 1715 sages to inform the partner server of the status of any DDNS updates 1716 associated with a lease binding. Failover servers MAY choose not to 1717 include the DDNS option in BNDUPD messages if there has been no 1718 change in the status of any DDNS update related to the lease binding. 1719 The partner server receiving BNDUPD messages containing the DDNS 1720 option SHOULD compare the status flags and the FQDN contained in the 1721 option data with the current DDNS information it has associated with 1722 the lease binding, and update its notion of the DDNS status accord- 1723 ingly. 1725 The initiating server MAY send a BNDUPD to its partner before the 1726 DDNS update has been successfully completed. If it does so, it SHOULD 1727 leave the 'C' bit in the Flags field clear, to indicate to the 1728 partner that the DDNS update may not be complete. When the DDNS 1729 update has been successfully acknowledged by the DNS server, the ini- 1730 tiating DHCP server SHOULD include the DDNS option in its next BNDUPD 1731 message about the binding, so that the partner server will be able to 1732 record the final status of the DDNS update. The initiating server 1733 SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc- 1734 cessfully accepted by the DNS server. 1736 Some implementations will choose to send a BNDUPD without waiting for 1737 the DDNS update to complete, and then will send a second BNDUPD once 1738 the DDNS update is complete. Other implementations will delay sending 1739 the partner a BNDUPD until the DDNS update has been acknowledged by 1740 the DNS server, or until some time-limit has elapsed, in order to 1741 avoid sending a second BNDUPD. 1743 The Domain Name field in the DDNS option contains the FQDN that will 1744 be associated with the A RR (if the server is performing an A RR 1745 update for the client) and the PTR RR. This FQDN may be composed in 1746 any of several ways, depending on server configuration and the infor- 1747 mation provided by the client in its DHCP messages. The client may 1748 supply a hostname which it would like the server to use in forming 1749 the FQDN, or it may supply the entire FQDN. The server may be config- 1750 ured to attempt to use the information the client supplies, it may be 1751 configured with an FQDN to use for the client, or it may be 1752 configured to synthesize an FQDN. The responsive server SHOULD 1753 include the FQDN that it will be using in DDNS updates it initiates 1754 when it sends the DDNS option. 1756 Since the responsive server may not have completed the DDNS update at 1757 the time it sends the first BNDUPD about the lease binding, there may 1758 be cases where the FQDN in later BNDUPD messages does not match the 1759 FQDN included in earlier messages. For example, the responsive 1760 server may be configured to handle situations where two or more DHCP 1761 client FQDNs are identical by modifying the most-specific label in 1762 the FQDNs of some of the clients in an attempt to generate unique 1763 FQDNs for them (a process sometimes called "disambiguation"). Alter- 1764 natively, at sites which use some or all of the information which 1765 clients supply to form the FQDN, it's possible that a client's confi- 1766 guration may be changed so that it begins to supply new data. The 1767 responsive server may react by removing the DNS records which it ori- 1768 ginally added for the client, and replacing them with records that 1769 refer to the client's new FQDN. In such cases, the responsive server 1770 SHOULD include the actual FQDN that was used in subsequent DDNS 1771 options. The responsive server SHOULD include relevant client-option 1772 data in the client-request-options option in its BNDUPD messages. 1773 This information may be necessary in order to allow the non- 1774 responsive partner to detect client configuration changes that change 1775 the hostname or FQDN data which the client includes in its DHCP 1776 requests. 1778 5.12.3. Adding RRs to the DNS 1780 A failover server which is going to perform DDNS updates SHOULD ini- 1781 tiate the DDNS update when it grants a new lease to a client. The 1782 non-responsive partner SHOULD NOT initiate a DDNS update when it 1783 receives the BNDUPD after the lease has been granted. The failover 1784 protocol ensures that only one of the partners will grant a lease to 1785 any individual client, so it follows that this requirement will 1786 prevent both partners from initiating updates simultaneously. The 1787 server initiating the update SHOULD follow the protocol in [FQDN]. 1788 The server may be configured to perform an A RR update on behalf of 1789 its clients, or not. Ordinarily, a failover server will not initiate 1790 DDNS updates when it renews leases. In two cases, however, a failover 1791 server MAY initiate a DDNS update when it renews a lease to its 1792 existing client: 1794 1. When the lease was granted before the server was configured to 1795 perform DDNS updates, the server MAY be configured to perform 1796 updates when it next renews existing leases. Since both 1797 servers are responsive to renewals in NORMAL state, it is not 1798 enough to simply require the non-responsive server to avoid a 1799 DNS update in this case. The server which would be responsive 1800 to a DHCPDISCOVER from this client (even though the current 1801 request is a DHCPREQUEST/RENEW) is the server which should 1802 initiate the DDNS update. 1804 2. If a server is in PARTNER-DOWN state, it can conclude that its 1805 partner is no longer attempting to perform an update for the 1806 existing client. If the remaining server has not recorded that 1807 an update for the binding has been successfully completed, the 1808 server MAY initiate a DDNS update. It MAY initiate this 1809 update immediately upon entry to PARTNER-DOWN state, it may 1810 perform this in the background, or it MAY initiate this update 1811 upon next hearing from the DHCP client. 1813 5.12.4. Deleting RRs from the DNS 1815 The failover server which makes an IP address FREE SHOULD initiate 1816 any DDNS deletes, if it has recorded that DNS records were added on 1817 behalf of the client. 1819 A server not in PARTNER-DOWN state "makes an IP address FREE" when it 1820 initiates a BNDUPD with a binding-status of FREE, EXPIRED, or 1821 RELEASED. Its partner confirms this status by acking that BNDUPD, 1822 and upon receipt of the ACK the server has "made the IP address 1823 FREE". Conversely, a server in PARTNER-DOWN state "makes an IP 1824 address FREE" when it sets the binding-status to FREE, since in 1825 PARTNER-DOWN state no communications is required with the partner. 1827 It is at this point that it should initiate the DDNS operations to 1828 delete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS 1829 deletes for DNS records related to the lease binding as part of send- 1830 ing the BNDACK message. The partner MAY have issued BNDUPD messages 1831 with a binding-status of FREE, EXPIRED, or RELEASED previously, but 1832 the other server will have NAKed these BNDUPD messages. 1834 The failover protocol ensures that only one of the two partner 1835 servers will be able to make a lease FREE. The server making the 1836 lease FREE may be doing so while it is in NORMAL communication with 1837 its partner, or it may be in PARTNER-DOWN state. If a server is in 1838 PARTNER-DOWN state, it may be performing DDNS deletes for RRs which 1839 its partner added originally. This allows a single remaining partner 1840 server to assume responsibility for all of the DDNS activity which 1841 the two servers were undertaking. 1843 Another implication of this approach is that no DDNS RR deletes will 1844 be performed while either server is in COMMUNICATIONS-INTERRUPTED 1845 state, since no IP addresses are moved into the FREE state during 1846 that period. 1848 5.13. Reservations and failover 1850 Some DHCP servers support a capability to offer specific pre- 1851 configured IP addresses to DHCP clients. These are real DHCP 1852 clients, they do the entire DHCP protocol, but these servers always 1853 offer the client a specific pre-configured IP address -- and they 1854 offer that IP address to no other clients. Such a capability has 1855 several names, but it is sometimes called a "reservation", in that 1856 the IP address is reserved for a particular DHCP client. 1858 In a situation where there are two DHCP servers serving the same sub- 1859 net without using failover, the two DHCP server's need to have dis- 1860 joint IP address pools, but identical reservations for the DHCP 1861 clients. 1863 In a failover context, both servers need to be configured with the 1864 proper reservations in an identical manner, but if we stop there 1865 problems can occur around the edge conditions where reservations are 1866 made for an IP address that has already been leased to a different 1867 client. Different servers handle this conflict in different ways, 1868 but the goal of the failover protocol is to allow correct operation 1869 with any server's approach to the normal processing of the DHCP pro- 1870 tocol. 1872 The general solution with regards to reservations is as follows. 1873 Whenever a reserved IP address becomes FREE (i.e., when first config- 1874 ured or whenever a client frees it or it expires or is reset), the 1875 primary server MUST show that IP address as FREE (and thus available 1876 for its own allocation) and it MUST send it to the secondary server 1877 with the R bit set in the IP-flags option and the binding-status 1878 BACKUP. 1880 Note that this implies that a reserved IP address goes through the 1881 normal state changes from FREE to ACTIVE (and possibly back to FREE). 1882 The failover protocol supports this approach to reservations, i.e., 1883 where the IP address undergoes the normal state changes of any IP 1884 address, but it can only be offered to the client for which it is 1885 reserved. Other approaches to the support of reservations exist in 1886 some DHCP server implementations (e.g., where the IP address is 1887 apparently leased to a particular client forever, without any expira- 1888 tion). The goal is for the failover protocol to support any of the 1889 usual approaches to reservations, both those that allow an IP address 1890 to go through different states when reserved, and those that don't. 1892 From the above, it follows that a reservation soley on the secondary 1893 will not necessarily allow the secondary to offer that address to 1894 client to whom it is reserved. The reservation must also appear on 1895 the primary as well for the secondary to be able to offer the IP 1896 address to the client to which is is reserved. 1898 When the reservation on an IP address is cancelled, if the IP address 1899 is currently FREE and the server is the primary, or BACKUP and the 1900 server is the secondary, the server MUST send a BNDUPD to the other 1901 server with the binding-status FREE and the R bit clear. 1903 5.14. Dynamic BOOTP and failover 1905 Some DHCP servers support a capability to offer IP addresses to BOOTP 1906 clients without having a particular address previously allocated for 1907 those clients. This capability is often called something like 1908 "dynamic BOOTP". It is discussed briefly in RFC 1534 [RFC 1534]. 1910 This capability has a negative interaction with the fundamental ele- 1911 ments of the failover protocol, in that an address handed out to a 1912 BOOTP device has no term (or effectively no term, in that usually 1913 they are considered leases for "forever"). There is no opportunity 1914 to hand out a lease which is only the MCLT long when first hearing 1915 from a BOOTP device, because they may only interact once with the 1916 DHCP server and they have no notion of a lease expiration time. Thus 1917 the entire concept of the MCLT and waiting the MCLT after entering 1918 PARTNER-DOWN state is defeated when dealing with BOOTP devices. 1920 With some restrictions, however, dynamic BOOTP devices can be sup- 1921 ported in a server on a subnet where failover is supported. The only 1922 restriction (and it is not small) is that on any portion of the sub- 1923 net (in any address pool) where dynamic BOOTP devices can be allo- 1924 cated IP addresses, a DHCP server MUST NOT ever use any of the IP 1925 addresses which were previously available for allocation by its fail- 1926 over partner. Thus, the addresses allocated by the primary to the 1927 secondary for allocation that might have been allocated to BOOTP dev- 1928 ices MUST NOT ever be used by the primary server even if it is in 1929 PARTNER-DOWN state and has waited the MCLT after entering that state. 1930 Conversely, addresses available for allocation by the primary MUST 1931 NOT be used by the secondary even it is in PARTNER-DOWN state. The 1932 reason for this is because one of those IP address could have been 1933 allocated by the secondary server to a BOOTP device, and the primary 1934 server would have no way of ever knowing that happened. 1936 Whenever a server sends BNDUPD message to its partner, if the client 1937 associated with the IP address is a BOOTP client, then the server 1938 MUST set the B bit in the IP-flags option. 1940 There is a very slight possibility that a BOOTP client could get an 1941 IP address on each server of a failover pair. When these two servers 1942 eventually attempt to resolve this conflict, they SHOULD agree to 1943 disagree, since it is not possible to know which IP address the BOOTP 1944 client will actually use -- indeed, it could use both. Operator 1945 intervention will, in general, be required to rectify this situation. 1946 Fortunately, it is extremely unlikely to ever actually occur. 1948 5.15. Guidelines for selecting MCLT 1950 There is no one correct value for the MCLT. There is an explicit 1951 tradeoff between various factors in selecting an MCLT value. 1953 5.15.1. Short MCLT 1955 A short MCLT value will mean that after entering PARTNER-DOWN state, 1956 a server will only have to wait a short time before it can start 1957 allocating its partner's IP addresses to DHCP clients. Furthermore, 1958 it will only have to wait a short time after the expiration of a 1959 lease on an IP address before it can reallocate that IP address to 1960 another DHCP client. 1962 However the downside of a short MCLT value is that the initial lease 1963 interval that will be offered to every new DHCP client will be short, 1964 which will cause increased traffic as those clients will need to send 1965 in their first renew in a half of a short MCLT time. In addition, 1966 the lease extensions that a server in COMMUNICATIONS-INTERRUPTED 1967 state can give will be only the MCLT after the server has been in 1968 COMMUNICATIONS-INTERRUPTED for around the desired client lease 1969 period. If a server stays in COMMUNICATIONS-INTERRUPTED for that 1970 long, then the leases it hands out will be short and that will 1971 increase the load on that server, possibly causing difficulty. 1973 5.15.2. Long MCLT 1975 A long MCLT value will mean that the initial lease period will be 1976 longer and the time that a server in COMMUNICATIONS-INTERRUPTED state 1977 will be able to extend leases (after it has been in COMMUNICATIONS- 1978 INTERRUPTED state for around the desired client lease period) will be 1979 longer. 1981 However, a server entering PARTNER-DOWN state will have to wait the 1982 longer MCLT before being able to allocate its partner's IP addresses 1983 to new DHCP clients. This may mean that additional IP addresses are 1984 required in order to cover this time period. Further, the server in 1985 PARTNER-DOWN will have to wait the longer MCLT from every lease 1986 expiration before it can reallocate an IP address to a different DHCP 1987 client. 1989 5.16. What is sent in response to an UPDREQ or UPDREQALL message? 1991 In section 7.3, the UPDREQ message is defined, and it says that the 1992 receiving server sends to the requesting server "all of the binding 1993 database information that it has not already seen". In section 1994 7.4.2, the UPDREQALL message is defined, and it says that the receiv- 1995 ing server sends to the requesting server "all binding database 1996 information". 1998 Both of these statements need further elaboration. 2000 First, for the UPDREQ message, the information to be sent in BNDUPD 2001 messages concerns "all of the binding database information it has not 2002 already seen". Since every BNDUPD is acked by the receiving server, 2003 the sending server need only keep track of which IP addresses have 2004 binding database changes not yet seen by the partner, and when they 2005 are finally acked by the partner it can record that. Thus, at any 2006 time, it knows which IP addresses have unacked binding database 2007 information. This is less simple when, across reconfigurations of 2008 the servers, an IP address can change the failover partner to which 2009 it is associated. In that case, it is important to reset the indica- 2010 tion that the partner has seen this binding information. See section 2011 5.17, below, for a more complete discussion of this issue. 2013 Second, in the event that a failover server's binding database infor- 2014 mation is restored from a backup, it will be partially out of date. 2015 In this case, its partner's indication of which binding database 2016 information the restored server has seen will be also be out of date. 2018 The solution to this problem is for a server which is connecting with 2019 its partner to check the partner's last communicated time, and if it 2020 is very much ahead of its own last communicated time, go to into 2021 RECOVER state and transmit an UPDREQALL to allow it to refresh its 2022 state. See section 9.3.2, step 5. If the partner's last communi- 2023 cated time is very much behind its own record of when it last commun- 2024 icated with the partner, then it SHOULD invalidate its information on 2025 which binding database information the partner server knows, so that 2026 it will send all of its relevant binding database information to the 2027 partner. 2029 Third, in the event that a server receives a UPDREQALL message, what 2030 constitutes "all binding database information"? At first glance this 2031 would seem to be information on every configured IP address in the 2032 server. While this would be technically correct, it may impose a 2033 serious and unacceptable performance penalty on servers which have 2034 millions of configured IP addresses. What can be done to lessen the 2035 data that must be sent for an UPDREQALL? 2037 When sending "all binding database information", if the sending 2038 server sends only information concerning IP addresses which have been 2039 at some time associated with clients, it will send enough information 2040 to satisfy the needs of the failover protocol. It need not send 2041 information on any IP addresses that have never been used, since 2042 presumably they will be initialized as available to the primary 2043 server (i.e. FREE) on any server employing failover. 2045 5.17. How do you determine that your partner is "up to date" for 2046 specific binding? 2048 Throughout this document, one server is assumed to know for each IP 2049 address binding whether or not its partner is "up to date" for that 2050 binding. There are some subtle issues involved in recording this "up 2051 to date" information about a specific binding. 2053 In a steady state world, it would suffice to have a single bit in the 2054 binding database to represent the information about whether the 2055 partner was or was not up to date. 2057 In a more complex environment a configuration change affecting a par- 2058 ticular IP address may change the failover endpoint with which it is 2059 associated, and if this should happen, any "up to date" bit which is 2060 written into the bindings database will be accurate for only the pre- 2061 vious failover endpoint, but not the current failover endpoint. If 2062 failover is disabled and then re-enabled (and the "up to date" bits, 2063 if used, are not cleared) problems can also occur. 2065 A server MUST have be able to relate the "up to date" condition to a 2066 particular failover endpoint and even a particular instantiation of 2067 that failover endpoint. The techniques to do this are implementation 2068 dependent. 2070 In addition, section 7.4 requires that a server be able to remember 2071 that an UPDREQALL message has been received and to treat every UPDREQ 2072 message as an UPDREQALL message until the first UPDDONE message is 2073 sent. One way to do this is to clear all of the "up to date" indica- 2074 tions for an entire failover endpoint upon receipt of an UPDREQALL 2075 message, thereby ensuring that every active binding will be sent to 2076 the partner whether through the completion of this UPDREQALL or 2077 through processing of a subsequent UPDREQ message. This is actually 2078 better than remembering that an UPDREQALL was received and turning 2079 every UPDREQ into an UPDREQALL, since any information sent in an 2080 incomplete UPDREQALL (or subsequent UPDREQ messages turned into "all" 2081 messages) will be remembered and not re-sent. 2083 6. Common Message Format 2085 This section discusses the common message format that all failover 2086 messages have in common, including the message header format as well 2087 as the common option format. See section 12 for the the definitions 2088 of the specific options used in the failover protocol. 2090 6.1. Message header format 2092 The options contained in the payload data section of the failover 2093 message all use a two byte option number and two byte length format. 2095 All failover protocol messages are sent over the TCP connection 2096 between failover endpoints and encoded using a message format 2097 specific to the failover protocol. 2099 There exists a common message format for all failover messages, which 2100 utilizes the options in a way similar to the DHCP protocol. For each 2101 message type, some options are required and some are optional. In 2102 addition, when a message is received any options that are not under- 2103 stood by the receiving server MUST be ignored. 2105 All of the fields in the fixed portion of the message MUST be filled 2106 with correct data in every message sent. 2108 0 1 2 3 2109 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2110 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2111 | message length (2) | msg type (1) |payload off (1)| 2112 +---------------+---------------+---------------+---------------+ 2113 | time (4) | 2114 +---------------------------------------------------------------+ 2115 | xid (4) | 2116 +---------------------------------------------------------------+ 2117 | 0 or more additional header bytes (variable) | 2118 +---------------------------------------------------------------+ 2119 | payload data (variable) | 2120 | | 2121 | formatted as DHCP-style options | 2122 | using a two byte option code and two byte length | 2123 | See section 6.2 for details. | 2124 +---------------------------------------------------------------+ 2126 message length - 2 bytes, network byte order 2128 This is the length of the message in bytes. It includes the two byte 2129 message length itself. The maximum length is 2048 bytes. The 2130 minimum length is 12. 2132 msg type - 1 byte 2134 The message type field is used to distinguish between messages. 2136 The following message types are defined: 2138 Value Message Type 2139 ----- ------------ 2140 0 reserved not used 2141 1 POOLREQ request allocation of addresses 2142 2 POOLRESP respond with allocation count 2143 3 BNDUPD update partner with binding info 2144 4 BNDACK acknowledge receipt of binding update 2145 5 CONNECT establish connection with the secondary 2146 6 CONNECTACK respond to attempt to establish connection with partner 2147 7 UPDREQALL request full transfer of binding info 2148 8 UPDDONE ack send and ack of req'd binding info 2149 9 UPDREQ request transfer of un-acked binding info 2150 10 STATE inform partner of current state or state change 2151 11 CONTACT probe communications integrity with partner 2152 12 DISCONNECT close a connection 2154 New message types should be defined in one of two ranges, 0-127 or 2155 129-255. The range of 0-127 is used for messages that MUST be sup- 2156 ported by every server, and if a server receives a message in the 2157 range of 0-127 that it doesn't understand, it MUST close the TCP con- 2158 nection. The range of 128-255 is used for messages which MAY be sup- 2159 ported but are not required, and if a server receives a message in 2160 this range that it does not understand it SHOULD ignore the message. 2162 payload offset - 1 byte 2164 The byte offset of the Payload Data, from the beginning of the 2165 failover message header. The value for the current protocol version 2166 (version 1) is 8. 2168 time - 4 bytes, network byte order 2170 The absolute time in GMT when the message was transmitted, 2171 represented as seconds elapsed since Jan 1, 1970 (i.e., similar to 2172 the ANSI C time_t time value representation). While the ANSI C 2173 time_t value is signed, the value used in this specification is 2174 unsigned. 2176 A server SHOULD set this time as close to the actual transmission of 2177 the message as possible. 2179 xid - 4 bytes, network byte order 2181 This is the transaction id of the failover message. The sender of a 2182 failover protocol message is responsible for setting this number, and 2183 the receiver of the message copies the number over into any response 2184 message, treating it as opaque data. The sender MUST ensure that 2185 every message sent from a particular failover endpoint over the 2186 associated TCP connection has a unique transaction id. 2188 For failover messages that have no corresponding response message, 2189 the XID value is meaningless, but MUST be supplied. The XID value is 2190 used solely by the receiver of a response message to determine the 2191 corresponding request message. 2193 Request messages where the XID is used in the corresponding response 2194 messages are: POOLREQ, BNDUPD, CONNECT, UPDREQALL, and UPDREQ. The 2195 corresponding response messages are POOLRESP, BNDACK, CONNECTACK, 2196 UPDDONE, and UPDDONE, respectively. 2198 As requests/responses don't survive connection reestablishment, XIDs 2199 only need to be unique during a specific connection. 2201 payload data - variable length 2203 The options are placed after the header, after skipping payload 2204 offset bytes from beginning of the message. The payload data options 2205 are not preceded by a "cookie" value. 2207 The payload data is formatted as DHCP style options using two byte 2208 option codes and two byte option lengths. The option codes are in a 2209 namespace which is unique to the failover protocol. 2211 The maximum length of the payload data in octets is 2048 less the 2212 size of the header, i.e., the maximum message length is 2048 octets. 2214 6.2. Common option format 2216 The options contained in the payload data section of the failover 2217 message all use a two byte option number and two byte length format. 2219 The option numbers are drawn from an option number space unique to 2220 the failover protocol. All of the message types share a common 2221 option number space and common options definitions, though not all 2222 options are required or meaningful for every message. 2224 In contrast to the options which appear in DHCP client and server 2225 messages, the options in failover message are ordered. That is, for 2226 some messages the order in which the options appear in the payload 2227 data area is significant. The messages for which option ordering is 2228 significant explicitly describe the ordering requirements. If no 2229 ordering requirements are mentioned, then the order is not signifi- 2230 cant for that message. 2232 For all options which refer to time, they all use an absolute time in 2233 GMT. Time synchronization has already been achieved between the 2234 source and the target server using the CONNECT message and is updated 2235 and refined using the time in every packet. 2237 The time value is an unsigned 32 bit integer in network byte order 2238 giving the number of seconds since 00:00 UTC, 1st January 1970. This 2239 can be converted to an NTP timestamp by adding decimal 2208988800. 2240 This time format will not wrap until the year 2106. Until sometime 2241 in 2038, it is equal to the ANSI C time_t value (which is a signed 32 2242 bit value and will overflow into a negative number in 2038). 2244 Options should appear once only in each message (except for BNDUPD 2245 and BNDACK messages where bulking is used, see section 6.3 for 2246 details.) An option that appears twice is not concatenated, but 2247 treated as an error. 2249 Specific option values are described in section 12. 2251 See section 13 for how to define additional options. 2253 6.3. Batching multiple binding update transactions in one BNDUPD mes- 2254 sage 2256 Implementations of this protocol MAY send multiple binding update 2257 transactions in one BNDUPD message, where a binding update transac- 2258 tion is defined as the set of options which are associated with the 2259 update of a single IP address. All implementations of this protocol 2260 MUST be prepared to receive BNDUPD messages which contain multiple 2261 binding update transactions and respond correctly to them, including 2262 replying with a BNDACK message which contains status for the multiple 2263 binding update transactions contained in the BNDUPD message. 2265 In the discussion of sending and receiving BNDUPD messages in section 2266 7.1 and BNDACK messages in section 7.2, each BNDUPD message and 2267 BNDACK message is assumed to contain a single binding update transac- 2268 tion in order to reduce the complexity of the discussions in section 2269 7. 2271 Multiple binding update transactions MAY be batched together in one 2272 BNDUPD protocol message with the data sets for the individual tran- 2273 sactions delimited by the assigned-IP-address option, which MUST 2274 appear first in the option set for each transaction. Ordering of 2275 options between the assigned-IP-address options is not significant. 2276 This is illustrated in the following schematic representation: 2278 Non-IP Address/Non-client specific options first 2279 assigned-IP-address option for the first IP address 2280 Options pertaining to first address, including at least the 2281 binding-status option and others as required. 2282 assigned-IP-address option for the second IP address 2283 Options pertaining to second address, including at least the 2284 binding-status option and others as required. 2285 ... 2286 Trailing options (message digest). 2288 There MUST be a one-to-one correspondence between BNDUPD and BNDACK 2289 messages, and every BNDACK message MUST contain status for all of the 2290 binding update transactions in the corresponding BNDUPD message. 2292 The BNDACK message corresponding to a BNDUPD message MUST contain 2293 assigned-IP-address options for all of the binding update transac- 2294 tions in the BNDUPD message. Thus, every BNDACK message contains 2295 exactly the same assigned-IP-address options as does its correspond- 2296 ing BNDUPD message. The order of the assigned-IP-address options 2297 MAY, however, be different. Here is a schematic representation of a 2298 BNDACK: 2300 Non-IP Address/Non-client specific options first 2301 assigned-IP-address option for the first IP address 2302 If rejected, reject-reason option and message option. 2303 assigned-IP-address option for the second IP address 2304 If rejected, reject-reason option and message option. 2305 ... 2306 Trailing options (message digest). 2308 In case the server chooses to reject some or all of the IP address 2309 binding information in a BNDUPD message in a BNDACK reply, the BNDACK 2310 message MUST contain a reject-reason option following every failed 2311 assigned-IP-address option in order to indicate that the binding 2312 update transaction for that IP address was not accepted and why. As 2313 with a BNDACK message containing a single binding update transaction, 2314 an assigned-IP-address option without any associated reject-reason 2315 option indicates a successful binding update transaction. 2317 7. Protocol Messages 2319 This section contains the detailed definition of the protocol mes- 2320 sages, including the information to include when sending the message, 2321 as well as the actions to take upon receiving the message. The mes- 2322 sage type for each message appears as [n] in the heading for the mes- 2323 sage (see section 6.1). 2325 7.1. BNDUPD message [3] 2327 The binding update (BNDUPD) message is used to send the binding data- 2328 base changes (known as binding update transactions) to the partner 2329 server, and the partner server responds with a binding acknowledge- 2330 ment (BNDACK) message when it has successfully committed those 2331 changes to its own stable storage. 2333 The rest of the failover protocol exists to determine whether the 2334 partner server is able to communicate or not, and to enable the 2335 partners to exchange BNDUPD/BNDACK messages in order to keep their 2336 binding databases in stable storage synchronized. 2338 The rest of this section is written as though every BNDUPD message 2339 contains only a single binding update transaction in order to reduce 2340 the complexity of the discussion. See section 6.3 for information on 2341 how to create and process BNDUPD and BNDACK messages which contain 2342 multiple binding update transactions. Note that while a server MAY 2343 generate BNDUPD messages with multiple binding update transactions, 2344 every server MUST be able to process a BNDUPD message which contains 2345 multiple binding update transactions and generate the corresponding 2346 BNDACK messages with status for multiple binding update transactions. 2348 The following table summarizes the various options for the BNDUPD 2349 message. 2351 binding-status BACKUP 2352 RESET 2353 ABANDONED 2354 Option ACTIVE EXPIRED RELEASED FREE 2355 ------ ------ ------- -------- ---- 2356 assigned-IP-address (3) MUST MUST MUST MUST 2357 IP-flags MUST(4) MUST(4) MUST(4) MUST(4) 2358 binding-status MUST MUST MUST MUST 2359 client-identifier MAY MAY MAY MAY(2) 2360 client-hardware-address MUST MUST MUST MAY(2) 2361 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 2362 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 2363 start-time-of-state SHOULD SHOULD SHOULD SHOULD 2364 client-last-trans.-time MUST SHOULD MUST MAY 2365 DDNS(1) SHOULD SHOULD SHOULD SHOULD 2366 client-request-options SHOULD SHOULD NOT SHOULD SHOULD NOT 2367 client-reply-options SHOULD SHOULD NOT SHOULD NOT SHOULD NOT 2369 (1) MUST if server is performing dynamic DNS for this IP address, else 2370 MUST NOT. 2371 (2) MUST NOT if binding-status is ABANDONED. 2372 (3) assigned-IP-address MUST be the first option for an IP address 2373 (4) IP-flags option MUST appear if any flags are non-zero, else it 2374 MAY appear. 2376 Table 7.1-1: Options used in a BNDUPD message 2378 7.1.1. Sending the BNDUPD message 2380 A BNDUPD message SHOULD be generated whenever any binding changes. A 2381 change might be in the binding-status, the lease-expiration-time, or 2382 even just the last-transaction-time. In general, any time a DHCP 2383 server writes its stable storage, a BNDUPD message SHOULD be gen- 2384 erated. This will often be the result of the processing of a DHCP 2385 client request, but it might also be the result of a successful 2386 dynamic DNS update operation. Stable storage updates due to BNDUPD 2387 or BNDACK messages SHOULD NOT result in additional BNDUPD messages. 2389 BNDUPD (and BNDACK) messages refer to the binding-status of the IP 2390 address, and this protocol defines a series of binding-statuses, dis- 2391 cussed in more detail below. Some servers may not support all of 2392 these binding-statuses, and so in those cases they will not be sent. 2393 Upon receipt of a BNDUPD message which contains an unsupported 2394 binding-status, a reasonable interpretation should be made (see sec- 2395 tion 5.10). 2397 All BNDUPD messages MUST contain the IP address of the binding update 2398 transaction in the assigned-IP-address option. 2400 All binding update transactions MUST contain an IP-flags option if 2401 the value of any of the flags would be non-zero. The IP-flags option 2402 MAY be omitted if all of the flags that it contains are zero. The 2403 IP-flags option contains a flag which indicates if the IP address is 2404 currently reserved on the server sending the BNDUPD message. It also 2405 contains a flag which indicates that the lease is associated with a 2406 client that used the BOOTP protocol (as opposed to the DHCP protocol) 2407 to interact with the DHCP server. 2409 All binding update transactions contain a binding-status option, and 2410 it will have one of the values found in section 5.11. Client infor- 2411 mation consists of client-hardware-address and possibly a client- 2412 identifier, and is explained in more detail later in this section. 2413 The following table indicates whether client information should or 2414 should not appear with each binding-status in a binding update tran- 2415 saction: 2417 binding-status includes client information 2418 ------------------------------------------------ 2419 ACTIVE MUST 2420 EXPIRED SHOULD 2421 RELEASED SHOULD 2422 FREE MAY 2423 ABANDONED MUST NOT 2424 RESET MAY 2425 BACKUP MAY 2427 Table 7.1.1-1: Client information required by various 2428 binding-status values. 2430 The ACTIVE binding-status requires some options to indicate the 2431 length of the binding: 2433 o lease-expiration-time 2435 The lease-expiration-time option MUST appear, and be set to the 2436 expiration time most recently ACKed to the DHCP client. Note 2437 that the time ACKed to a DHCP client is a lease duration in 2438 seconds, while the lease-expiration-time option in a BNDUPD mes- 2439 sage is an absolute time value. 2441 o potential-expiration-time 2442 The potential-expiration-time option MUST appear, and be set to 2443 a value beyond that of the lease-expiration time. This is the 2444 value that is ACKed by the BNDACK message. A server sending a 2445 BNDUPD message MUST be able to recover the potential- 2446 expiration-time sent in every BNDUPD, not just those that 2447 receive a corresponding BNDACK, in order to be able to protect 2448 against possible duplicate allocation of IP addresses after 2449 transitioning to PARTNER-DOWN state. See section 5.2.1 for 2450 details as to why the potential-expiration-time exists and 2451 guidelines for how to decide on the value. 2453 The following option information applies to all BNDUPD messages, 2454 regardless of the value of the binding-status, unless otherwise 2455 noted. 2457 o Identifying the client 2459 For many of the binding-status values a client MUST appear while 2460 for others a client MAY appear, and for some a client MUST NOT 2461 appear. 2463 A client is identified in a BNDUPD message by at least one and pos- 2464 sibly two options. The client-hardware-address option MUST appear 2465 any time that a client appears in a BNDUPD message, and contains 2466 the hardware type and chaddr information from the DHCP request 2467 packet. A failover client-identifier option MUST appear any time 2468 that a client appears in a BNDUPD message if and only if that 2469 client used a DHCP client-identifier option when communicating with 2470 the DHCP server. See section 12.5 and 12.4 for details of how to 2471 construct these two options from a DHCP request packet. 2473 o start-time-of-state 2475 The start-time-of-state SHOULD appear. It is set to the time at 2476 which this IP address first took on the state that corresponds to 2477 the current value of binding-status. 2479 o last-transaction-time 2481 The last-transaction-time value SHOULD appear. This is the time at 2482 which this DHCP server last received a packet from the DHCP client 2483 referenced by the client-identifier or client-hardware-address that 2484 was associated with the IP address referenced by the assigned-IP- 2485 address. 2487 o DDNS 2489 If the DHCP server is performing dynamic DNS operations on behalf 2490 of the DHCP client represented by the client-identifier or client- 2491 hardware-address, then it should include a DDNS option containing 2492 the domain name and status of any dynamic DNS operations enabled. 2494 o client-request-options 2496 If the BNDUPD was triggered by a request from a DHCP client (typi- 2497 cally those with binding-status of ACTIVE and RELEASED), then the 2498 server SHOULD include options of interest to a failover partner 2499 from the client's request packet in the client-request-options for 2500 transmission to its partner (see section 12.8). 2502 A server sending a BNDUPD SHOULD remember the "interesting" options 2503 or the information that would appear in an "interesting" option for 2504 transmission at a time when the BNDUPD is not closely associated 2505 with a DHCP client request. 2507 A server SHOULD send the following "interesting" options. It MAY 2508 send any DHCP client options. As new options are defined, the RFC 2509 defining these options SHOULD include information that they are 2510 "interesting to failover servers" if they should be sent as part of 2511 a BNDUPD. 2513 option option 2514 number name 2515 ----------------------------------------- 2517 12 host-name 2518 81 client-FQDN [FQDN] 2519 82 relay-agent-information [RFC 3046] 2520 77 user-class [RFC 3004] 2521 60 vendor-class-identifier 2522 118 subnet-selection [RFC 3011] 2524 Table 7.1.1-2: Options which SHOULD be sent in 2525 the client-request-options option in a BNDUPD message. 2527 o client-reply-options 2529 If the BNDUPD was triggered by a request from a DHCP client (typi- 2530 cally those with binding-status of ACTIVE and RELEASED), then the 2531 server SHOULD include options of interest to a failover partner 2532 from the server's DHCP reply packet in the client-reply-options for 2533 transmission to its partner (see section 12.7). 2535 A server sending a BNDUPD SHOULD remember the "interesting" options 2536 or the information that would appear in an "interesting" option for 2537 transmission at a time when the BNDUPD is not closely associated 2538 with a DHCP client request. 2540 A server SHOULD send the following "interesting" options. It MAY 2541 send any DHCP client options. As new options are defined, the RFC 2542 defining these options SHOULD include information that they are 2543 "interesting to failover servers" if they should be sent as part of 2544 a BNDUPD. 2546 option option 2547 number name 2548 ----------------------------------------- 2550 58 renewal-time 2551 59 rebinding-time 2553 Table 7.1.1-3: Options which SHOULD be sent in 2554 the client-reply-options option in a BNDUPD message. 2556 The BNDUPD message SHOULD be sent as soon as possible from the time 2557 that the DHCP client received a response and the lease bindings data- 2558 base is written on stable storage. 2560 7.1.2. Receiving the BNDUPD message 2562 When a server receives a BNDUPD message, it needs to decide how to 2563 process the binding update transaction it contains and whether that 2564 transaction represents a conflict of any sort. The conflict resolu- 2565 tion process MUST be used on the receipt of every BNDUPD message, not 2566 just those that are received while in POTENTIAL-CONFLICT state, in 2567 order to increase the robustness of the protocol. 2569 There are three sorts of conflicts: 2571 o Two clients, one IP address conflict 2573 This is the duplicate IP address allocation conflict. There are 2574 two different clients each allocated the same address. See sec- 2575 tion 7.1.3 for how to resolve this conflict. 2577 o Two IP addresses, one client conflict 2579 This conflict exists when a client on one server is associated 2580 with a one IP address, and on the other server with a different 2581 IP address in the same or a related subnet. This does not refer 2582 to the case where a single client has addresses in multiple dif- 2583 ferent subnets or administrative domains, but rather the case 2584 where on the same subnet the client has as lease on one IP 2585 address in one server and on a different IP address on the other 2586 server. 2588 This conflict may or may not be a problem for a given DHCP 2589 server implementation. In the event that a DHCP server requires 2590 that a DHCP client have only one outstanding lease for an IP 2591 address on one subnet, this conflict should be resolved by 2592 accepting the lease information which has the latest client- 2593 last-transaction-time. 2595 o binding-status conflict 2597 This is normal conflict, where one server is updating the other 2598 with newer information. See section 7.1.3 for details of how to 2599 resolve these conflicts. 2601 7.1.3. Deciding whether to accept the binding update transaction in a 2602 BNDUPD message 2604 When analyzing a BNDUPD message from a partner server, if there is 2605 insufficient information in the BNDUPD to process it, then reject the 2606 BNDUPD with reject-reason 3: "Missing binding information". 2608 If the IP address in the BNDUPD is not an IP address associated with 2609 the failover endpoint which received the BNDUPD message, then reject 2610 it with reject-reason 1: "Illegal IP address (not part of any address 2611 pool)". 2613 IP addresses undergo binding status changes for several reasons, 2614 including receipt and processing of DHCP client requests, administra- 2615 tive inputs and receipt of BNDUPD messages. Every DHCP server needs 2616 to respond to DHCP client requests and administrative inputs with 2617 changes to its internal record of the binding-status of an IP 2618 address, and this response is not in the scope of the failover proto- 2619 col. However, the receipt of BNDUPD messages implies at least a pos- 2620 sible change of the binding-status for an IP address, and must be 2621 discussed here. See section 7.1.2 for general actions to take upon 2622 receipt of a BNDUPD message. 2624 When receiving a BNDUPD message, it is important to note that it may 2625 not be current, in that the server receiving the BNDUPD message may 2626 have had a more recent interaction with the DHCP client than its 2627 partner who sent the BNDUPD message. In this case, the receiving 2628 server MUST reject the BNDUPD message. The reject reason SHOULD be 2629 15: "Outdated binding information". In addition, it is worth noting 2630 that two (and possibly three) binding-status values are the direct 2631 result of interaction with a DHCP client, ACTIVE and RELEASED (and 2632 possibly ABANDONED). All other binding-status values are either the 2633 result of the expiration of a time period or interaction with an 2634 external agency (e.g., a network administrator). 2636 Every BNDUPD message SHOULD contain a client-last-transaction-time 2637 option, which MUST, if it appears, be the time that the server last 2638 interacted with the DHCP client. It MUST NOT be, for instance, the 2639 time that the lease on an IP address expired. If there has been no 2640 interaction with the DHCP client in question (or there is no DHCP 2641 client presently associated with this IP address), then there will be 2642 no client-last-transaction-time option in the BNDUPD message. 2644 The list in Figure 7.1.3-1 is indexed by the binding-status that a 2645 server receives in a BNDUPD message. In many cases, the binding- 2646 status of an IP address within the receiving server's data storage 2647 will have an affect upon the checks performed prior to accepting the 2648 new binding-status in a BNDUPD message. 2650 In Figure 7.1.3-1, to "accept" a BNDUPD means to update the server's 2651 bindings database with the information contained in the BNDUPD and 2652 once that update is complete, send a BNDACK message corresponding to 2653 the BNDUPD message. To "reject" a BNDUPD means to respond to the 2654 BNDUPD with a BNDACK with a reject-reason option included. 2656 When interpreting the information in the following table (Figure 2657 7.1.3-1), for those rules that are listed with "time" -- if a BNDUPD 2658 doesn't have a client-last-transaction-time value, then it MUST NOT 2659 be considered later than the client-last-transaction-time in the 2660 receiving server's binding. If the BNDUPD contains a client-last- 2661 transaction-time value and the receiving server's binding does not, 2662 then the client-last-transaction-time value in the BNDUPD MUST be 2663 considered later than the server's. 2665 binding-status in received BNDUPD 2666 binding-status 2667 in receiving FREE RESET 2668 server ACTIVE EXPIRED RELEASED BACKUP ABANDONED 2670 ACTIVE accept(5) time(2) time(1) time(2) accept 2671 EXPIRED time(1) accept accept accept accept 2672 RELEASED time(1) time(1) accept accept accept 2673 FREE/BACKUP accept accept accept accept accept 2674 RESET time(3) accept accept accept accept 2675 ABANDONED reject(4) reject(4) reject(4) reject(4) accept 2677 time(1): If the client-last-transaction-time in the BNDUPD 2678 is later than the client-last-transaction-time in the 2679 receiving server's binding, accept it, else reject it. 2681 time(2): If the current time is later than the receiving 2682 servers' lease-expiration-time, accept it, else reject it. 2684 time(3): If the client-last-transaction-time in the BNDUPD 2685 is later than the start-time-of-state in the receiving server's 2686 binding, accept it, else reject it. 2688 (1,2,3): If rejecting, use reject reason 15: "Outdated binding 2689 information". 2691 (4): Use reject reason 16: "Less critical binding information". 2693 (5): If the clients in a BNDUPD message and in a receiving 2694 server's binding differ, then if the receiving server is a 2695 secondary accept it, else reject it with a reject reason of 2: 2696 "Fatal conflict exists: address in use by other client". 2698 Figure 7.1.3-1: Accepting BNDUPD messages 2700 If the IP address in the BNDUPD message has the R flag set in the 2701 IP-flags option, indicating it is a reserved IP address, and if the 2702 binding-status in the BNDUPD is BACKUP, then if the receiving server 2703 does not show the IP address as reserved, the receiving server SHOULD 2704 reject the BNDUPD using reject reason 19: "IP not reserved on this 2705 server". 2707 7.1.4. Accepting the BNDUPD message 2709 When accepting a BNDUPD message, the information contained in the 2710 client-request-options and client-reply-options SHOULD be examined 2711 for any information of interest to this server. For instance, a 2712 server which wished to detect changes in client specified host names 2713 might want to examine and save information from the host-name or 2714 client-FQDN options. Servers which expect to utilize information 2715 from the relay-agent-information option would want to store this 2716 information. 2718 7.1.5. Time values related to the BNDUPD message 2720 There are four time values that MAY be sent in a BNDUPD message. 2722 o lease-expiration-time 2724 The time that the server gave to the client, i.e., the time that 2725 the server believes that the client's lease will expire. 2727 o potential-expiration-time 2729 The time that the server wants to be sure its partner waits 2730 (added to the MCLT) before assuming that this lease has expired. 2731 Typically some time beyond the desired client lease time. 2733 o client-last-transaction-time 2735 The time that the client last interacted with this server. 2737 o start-time-of-state 2739 The time at which the binding first went into the current state. 2741 As discussed in section 5.2, each server knows what its partner has 2742 ACKed with regard to potential-expiration time. In addition, each 2743 server needs to remember what it has told its partner as the 2744 potential-expiration-time. Moreover, each server must remember what 2745 it has acked to the *other* server as the most recent potential- 2746 expiration-time from that server. 2748 Remember that each server sends a potential-expiration-time and 2749 receives an ACK for that as well as receiving a potential- 2750 expiration-time and needing to remember what it has acked for that. 2752 While they don't have to be named in any particular way, the times 2753 that a server needs to remember for every IP address in order to 2754 implement the failover protocol are: 2756 o lease-expiration-time 2757 The time that a server gave to the DHCP client. A DHCP server 2758 needs to remember this time already, just to be a DHCP server. 2759 A server SHOULD update this time with the lease-expiration time 2760 received from a partner in a BNDUPD if the received lease- 2761 expiration time is later than the lease-expiration time recorded 2762 for this binding. 2764 o sent-potential-expiration-time 2766 The latest time sent to the partner for a potential-expiration- 2767 time. 2769 o acked-potential-expiration-time 2771 The latest time that the partner has acked for a potential 2772 expiration time. Typically the same as sent-potential- 2773 expiration-time if there is not a BNDUPD outstanding. 2775 o received-potential-expiration-time 2777 The latest time that this server has ever received as a 2778 potential-expiration-time from its partner in a BNDUPD that this 2779 server ACKed. 2781 So, a server has to remember two additional times concerning BNDUPD 2782 messages that it has initiated, and one additional time concerning 2783 BNDUPD message that it has received. How are these times used? 2785 First, let's look at the time that a DHCP server can offer to a DHCP 2786 client. A server can offer to a DHCP client a time that is no longer 2787 than the MCLT beyond the max( received-potential-expiration-time, 2788 acked-potential-expiration-time). One might think that the server 2789 should be able to offer only the MCLT beyond the acked-potential- 2790 expiration-time, and while that is certainly simple and easy to 2791 understand, it has negative consequences in actual operation. 2793 To illustrate this, in the simple case where the primary updates the 2794 secondary for a while and then fails, if the secondary can then renew 2795 the client for only the MCLT beyond the acked-potential-expiration- 2796 time, then the secondary will only be able to renew the client for 2797 the MCLT, because the secondary has never sent a BNDUPD packet to the 2798 primary concerning this IP address and client, and so its acked- 2799 potential-expiration-time is zero. 2801 However, since the secondary is allowed to renew the client with the 2802 MCLT beyond the max( received-potential-expiration-time, acked- 2803 potential-expiration-time), then the secondary can usually renew the 2804 client for the full lease period, at least for the first renew it 2805 sees from the client, since the received-potential-expiration-time is 2806 generally longer than the client's desired lease interval. The 2807 difference in renew times could make a big difference in server load 2808 on the secondary in this case. 2810 What are the consequences of allowing a server to offer a DHCP client 2811 a lease term of the MCLT beyond the max( received-potential- 2812 expiration-time, acked-potential-expiration-time)? The consequences 2813 appear whenever a server enters PARTNER-DOWN state, and affect how 2814 long that server has to wait before reallocating expired leases. 2815 With this approach, when a server goes into PARTNER-DOWN state, it 2816 must wait the MCLT beyond the max( lease-expiration-time, sent- 2817 potential-expiration-time, acked-potential-expiration-time, 2818 received-potential-expiration-time ) for each IP address before it 2819 can reallocate that IP address to another DHCP client. One might 2820 normally think that it needed to wait only the MCLT beyond the max( 2821 lease-expiration-time, received-potential-expiration-time ), i.e., 2822 beyond what it has told the client and what it has explicitly acked 2823 to the other server. But with the optimization discussed above -- 2824 where either server can offer the DHCP client a lease term of the 2825 MCLT beyond the max( received-potential-expiration-time, acked- 2826 potential-expiration-time), then the additional times sent- 2827 potential-expiration-time and acked-potential-expiration-time must be 2828 added into the expression, since the partner could have used those 2829 times as part of its own lease time calculation. 2831 Thus this optimization may require a longer waiting time when enter- 2832 ing PARTNER-DOWN state, but will generally allow servers to operate 2833 considerably more effectively when running in COMMUNICATIONS- 2834 INTERRUPTED state. 2836 7.2. BNDACK message [4] 2838 A server sends a binding acknowledgement (BNDACK) message when it has 2839 processed a BNDUPD message and after it has successfully committed to 2840 stable storage any binding database changes made as a result of pro- 2841 cessing the BNDUPD message. A BNDACK message is used to both accept 2842 or reject a BNDUPD message. A BNDACK message which contains a 2843 reject-reason option is a rejection of the corresponding BNDUPD mes- 2844 sage. 2846 In order to reduce the complexity of the discussion, the rest of this 2847 section is written as though every BNDUPD message contains only a 2848 single binding update transaction and thus every corresponding BNDACK 2849 message would also contain reply information about only a single 2850 binding update transaction. See section 6.3 for information on how 2851 to create and process BNDUPD and BNDACK messages which contain multi- 2852 ple binding update transactions. 2854 Note that while a server MAY generate BNDUPD messages with multiple 2855 binding update transactions, every server MUST be able to process a 2856 BNDUPD message which contains multiple binding update transactions 2857 and generate the corresponding BNDACK messages with status for multi- 2858 ple binding update transactions. If a server does not ever create 2859 BNDUPD messages which contain multiple binding update transactions, 2860 then it does not need to be able to process a received BNDACK message 2861 with multiple binding update transactions. However, all servers MUST 2862 be able to create BNDACK messages which deal with multiple binding 2863 update transactions received in a BNDUPD message. 2865 Every BNDUPD message that is received by a server MUST be responded 2866 to with a corresponding BNDACK message. The receiving server SHOULD 2867 respond quickly to every BNDUPD message but it MAY choose to respond 2868 preferentially to DHCP client requests instead of BNDUPD messages, 2869 since there is no absolute time period within which a BNDACK must be 2870 sent in response to a BNDUPD message, while DHCP clients frequently 2871 have strict time constraints. 2873 A BNDACK message can only be sent in response to a BNDUPD message 2874 using the same TCP connection from which the BNDUPD message was 2875 received, since the XID's in BNDUPD messages are guaranteed unique 2876 only during the life of a single TCP connection. When a connection 2877 to a partner server goes down, a server with unprocessed BNDUPD mes- 2878 sages MAY simply drop all of those messages, since it can be sure 2879 that the partner will resend them when they are next in communica- 2880 tions (albeit with a different XID), or it MAY instead choose to pro- 2881 cess those BNDUPD messages, but it MUST NOT send any BNDACK messages 2882 in response. 2884 The following table summarizes the options for the BNDACK message. 2886 Option accept reject 2887 ------ ------ ------ 2888 assigned-IP-address (1) MUST MUST 2889 IP-flags SHOULD NOT SHOULD NOT 2890 binding-status SHOULD NOT SHOULD NOT 2891 client-identifier SHOULD NOT SHOULD NOT 2892 client-hardware-address SHOULD NOT SHOULD NOT 2893 reject-reason SHOULD NOT MUST 2894 message SHOULD NOT SHOULD 2895 lease-expiration-time SHOULD NOT SHOULD NOT 2896 potential-expiration-time SHOULD NOT SHOULD NOT 2897 start-time-of-state SHOULD NOT SHOULD NOT 2898 client-last-trans.-time SHOULD NOT SHOULD NOT 2899 DDNS(1) SHOULD NOT SHOULD NOT 2901 (1) assigned-IP-address MUST be the first option for an IP address 2903 Table 7.2-1: Options used in a BNDACK message 2905 7.2.1. Sending the BNDACK message 2907 The BNDACK message MUST contain the same xid as the corresponding 2908 BNDUPD message. 2910 The assigned-IP-address option from the BNDUPD message MUST be 2911 included in the BNDACK message. Any additional options from the 2912 BNDUPD message SHOULD NOT appear in the BNDACK message. Note that 2913 any information sent in options (e.g, a later lease-expiration time) 2914 in the BNDACK message MUST NOT be assumed to necessarily be recorded 2915 in the stable storage of the server who receives the BNDACK message 2916 because there is no corresponding ACK of the BNDACK message. Any 2917 information that SHOULD be recorded in the partner server's stable 2918 storage MUST be transmitted in a subsequent BNDUPD. 2920 If the server is accepting the BNDUPD, the BNDACK message includes 2921 only the assigned-IP-address option. If the server is rejecting the 2922 BNDUPD, the additional option reject-reason MUST appear in the BNDACK 2923 message, and the message option SHOULD appear in this case containing 2924 a human-readable error message describing in some detail the reason 2925 for the rejection of the BNDUPD message. 2927 If the server rejects the BNDUPD message with a BNDACK and a reject- 2928 reason option, it may be because the server believes that it has 2929 binding information that the other server should know. A server 2930 which is rejecting a BNDUPD may initiate a BNDUPD of its own in order 2931 to update its partner with what it believes is better binding infor- 2932 mation, but it MUST ensure through some means that it will not end up 2933 in a situation where each server is sending BNDUPD messages as fast 2934 as possible because they can't agree on which server has better bind- 2935 ing data. Placing a considerable delay on the initiation of a BNDUPD 2936 message after sending a BNDACK with a reject-reason would be one way 2937 to ensure this situation doesn't occur. 2939 7.2.2. Receiving the BNDACK message 2941 When a server receives a BNDACK message, if it doesn't contain a 2942 reject-reason option that means that the BNDUPD message was accepted, 2943 and the server which sent the BNDUPD SHOULD update its stable storage 2944 with the potential-expiration-time value sent in the BNDUPD message. 2946 If the BNDACK message contains a reject-reason option, that means 2947 that the BNDUPD was rejected. There SHOULD be a message option in 2948 the BNDACK giving a text reason for the rejection, and the server 2949 SHOULD log the message in some way. The server MUST NOT immediately 2950 try to resend the BNDUPD message as there is no reason to believe the 2951 partner won't reject it a second time. However a server MAY choose 2952 to send another BNDUPD at some future time, for instance when the 2953 server next processes an update request from its partner. 2955 7.3. UPDREQ message [9] 2957 The update request (UPDREQ) message is used by one server to request 2958 that its partner send it all of the binding database information that 2959 it has not already seen. Since each server is required to keep 2960 track at all times of the binding information the other server has 2961 ACKed, one server can request transmission of all un-ACKed binding 2962 database information held by the other server by using the UPDREQ 2963 message. 2965 The UPDREQ message is used whenever the sending server cannot proceed 2966 before it has processed all previously un-ACKed binding update infor- 2967 mation, since the UPDREQ message should yield a corresponding UPDDONE 2968 message. The UPDDONE message is not sent until the server that sent 2969 the UPDREQ message has responded to all of the BNDUPD messages gen- 2970 erated by the UPDREQ message with BNDACK messages (they may either be 2971 accepted or rejected by the BNDACK messages, but they MUST have been 2972 responded to). Thus, the sender of the UPDREQ message can be sure 2973 upon receipt of an UPDDONE message that it has received and committed 2974 to stable storage all outstanding binding database updates. 2976 See section 9, Failover Endpoint States, for the details of when the 2977 UPDREQ message is sent. 2979 7.3.1. Sending the UPDREQ message 2981 The UPDREQ message has no message specific options. 2983 7.3.2. Receiving the UPDREQ message 2985 A server receiving an UPDREQ message MUST send all binding database 2986 changes that have not yet been ACKed by the sending server. These 2987 changes are sent as undistinguished BNDUPD messages. 2989 However, the server which received and is processing the UPDREQ mes- 2990 sage MUST track the BNDACK messages that correspond to the BNDUPD 2991 messages triggered by the UPDREQ message and, when they are all 2992 received, the server MUST send an UPDDONE message. 2994 The server processing the UPDREQ message and sending BNDUPD messages 2995 to its partner SHOULD only track the BNDUPD and BNDACK message pairs 2996 for unACKed binding database changes that were present upon the 2997 receipt of the UPDREQ message. A server which has received an UPDREQ 2998 message SHOULD send BNDUPD messages for binding database changes that 2999 occur after receipt of the UPDREQ message, but it SHOULD NOT include 3000 those additional BNDUPD messages and their corresponding BNDACK mes- 3001 sages in the accounting necessary to consider the UPDREQ complete and 3002 subsequently send the UPDDONE message. If some additional binding 3003 database changes end up becoming part of the set of BNDUPD messages 3004 considered as part of the UPDREQ (due to whatever algorithm the 3005 server uses to scan its bindings database for unacked changes) it 3006 will probably not cause any difficulty, but a server MUST NOT attempt 3007 to include all such later BNDUPD messages in the accounting for the 3008 UPDREQ in order to be able to transmit an UPDDONE message. 3010 When queuing up the BNDUPD messages for transmission to the sender of 3011 the UPDREQ message, the server processing the UPDREQ message MUST 3012 honor the value returned in the max-unacked-bndupd option in the CON- 3013 NECT or CONNECTACK message that set up the connection with the send- 3014 ing server. It MUST NOT send more BNDUPD messages without receiving 3015 corresponding BNDACKs than the value returned in max-unacked-bndupd. 3016 (See section 8 for more details.) 3018 7.4. UPDREQALL message [7] 3020 The update request all (UPDREQALL) message is used by one server to 3021 request that its partner send it all of the binding database informa- 3022 tion. This message is used to allow one server to recover from a 3023 failure of stable storage and to restore its binding database in its 3024 entirety from the other server. 3026 A server which sends an UPDREQALL message cannot proceed until all of 3027 its binding update information is restored, and it knows that all of 3028 that information is restored when an UPDDONE message is received. 3030 See section 9, Protocol state transitions, for the details of when 3031 the UPDREQALL message is sent. 3033 The UPDREQALL message has no message specific options. 3035 7.4.1. Sending the UPDREQALL message 3037 The UPDREQALL is sent. 3039 7.4.2. Receiving the UPDREQALL message 3041 A server receiving an UPDREQALL message MUST send all binding data- 3042 base information to the sending server. See section 5.16 for details 3043 of what might actually comprise "all binding database information". 3045 A server receiving an UPDREQALL message MUST remember that such a 3046 message has been received, ensure that all binding information extant 3047 at that point is sent to the partner prior to any UPDDONE message 3048 being sent to that partner. One way to do this is to remember the 3049 receipt of an UPDREQALL message and to and treat every subsequent 3050 UPDREQ message as an UPDREQALL message until it sends the first 3051 UPDDONE message after receipt of the UPDREQALL message. This 3052 requirement exists because communications may fail and become re- 3053 established between the two servers, and the specific conditions 3054 which provoked the UPDREQALL message may not longer exist even though 3055 the UPDREQALL message may not yet have completed. See section 5.17 3056 for information on a more efficient way to meet the above require- 3057 ment. 3059 These changes are sent as undistinguished BNDUPD messages. Otherwise 3060 the processing is the same as for the UPDREQ message. See section 3061 7.3.2 for details. 3063 7.5. UPDDONE message [8] 3065 The update done (UPDDONE) message is used by a server receiving an 3066 UPDREQ or UPDREQALL message to signify that it has sent all of the 3067 BNDUPD messages requested by the UPDREQ or UPDREQALL request and that 3068 it has received a BNDACK for each of those messages. 3070 While a BNDACK message MUST have been received for each BNDUPD mes- 3071 sage prior to the transmission of the UPDDONE message, this doesn't 3072 necessarily mean that all of the BNDUPD messages were accepted, only 3073 that all of them were responded to with a BNDACK message. Thus, a 3074 NAK (comprised of a BNDACK message containing a reject-reason option) 3075 could be used to reject a BNDUPD, but for the purposes of the UPDDONE 3076 message, such NAK would count as a response to the associated BNDUPD 3077 message, and would not block the eventual transmission of the UPDDONE 3078 message. 3080 The xid in an UPDDONE message MUST be identical to the xid in the 3081 UPDREQ or UPDREQALL message that initiated the update process. 3083 The UPDDONE message has no message specific options. 3085 7.5.1. Sending the UPDDONE message 3087 The UPDDONE message SHOULD be sent as soon as the last BNDACK message 3088 corresponding to a BNDUPD message requested by the UPDREQ or 3089 UPDREQALL is received from the server which sent the UPDREQ or 3090 UPDREQALL. The XID of the UPDDONE message MUST be the same as the 3091 XID of the corresponding UPDREQ or UPDREQALL message. 3093 7.5.2. Receiving the UPDDONE message 3095 A server receiving the UPDDONE message knows that all of the informa- 3096 tion that it requested by sending an UPDREQ or UPDREQALL message has 3097 now been sent and that it has recorded this information in its stable 3098 storage. It typically uses the receipt of an UPDDONE message to move 3099 to a different failover state. See sections 9.5.2 and 9.8.3 for 3100 details. 3102 7.6. POOLREQ message [1] 3104 The pool request (POOLREQ) message is used by the secondary server to 3105 request an allocation of IP addresses from the primary server. It 3106 MUST be sent by a secondary server to a primary server to request IP 3107 address allocation by the primary. The IP addresses allocated are 3108 transmitted using normal BNDUPD messages from the primary to the 3109 secondary. 3111 The POOLREQ message SHOULD be sent from the secondary to the primary 3112 whenever the secondary makes a transition into NORMAL state. It 3113 SHOULD periodically be resent in order that any change in the number 3114 of available IP addresses on the primary be reflected in the pool on 3115 the secondary. The period may be influenced by the secondary 3116 server's leasing activity. 3118 The POOLREQ message has no message specific options. 3120 7.6.1. Sending the POOLREQ message 3122 The POOLREQ message is sent. 3124 7.6.2. Receiving the POOLREQ message 3126 When a primary server receives a POOLREQ message it SHOULD examine 3127 the binding database and determine how many IP addresses the secon- 3128 dary server should have, and set these IP addresses to BACKUP state. 3129 It SHOULD then send BNDUPD messages concerning all of these IP 3130 addresses to the secondary server. 3132 Servers frequently have several kinds of IP addresses available on a 3133 particular network segment. The failover protocol assumes that both 3134 primary and secondary servers are configured in such a way that each 3135 knows the type and number of IP addresses on every network segment 3136 participating in the failover protocol. The primary server is 3137 responsible for allocating the secondary server the correct propor- 3138 tion of available IP addresses of each kind, and the secondary server 3139 is responsible for being configured in such a way that it can tell 3140 the kind of every IP address based solely on the IP address itself. 3142 A primary server MUST keep track of how many IP addresses were allo- 3143 cated as a result of processing the POOLREQ message, and send that 3144 number in the POOLRESP message. 3146 A primary server MAY choose to defer processing a POOLREQ message 3147 until a more convenient time to process it, but it should not depend 3148 on the secondary server to resend the POOLREQ message in that case. 3150 If a secondary server receives a POOLREQ message it SHOULD report an 3151 error. 3153 7.7. POOLRESP message [2] 3155 A primary server sends a POOLRESP message to a secondary server after 3156 the allocation process for available addresses to the secondary 3157 server is complete. Typically this message will precede some of the 3158 BNDUPD messages that the primary uses to send the actual allocated IP 3159 addresses to the secondary. 3161 The xid in the POOLRESP message MUST be identical to the xid in the 3162 POOLREQ message for which this POOLRESP is a response. 3164 7.7.1. Sending the POOLRESP message 3166 The POOLRESP message MUST contain the same xid as the corresponding 3167 POOLREQ message. 3169 Only one option MUST appear in a POOLREQ message: 3171 o addresses-transferred 3173 The number of addresses allocated to the secondary server by the 3174 primary server as a result of a POOLREQ is contained in the 3175 addresses-transferred option in a POOLRESP message. Note this 3176 is the number of addresses that are transferred to the secondary 3177 in the primary's binding database as a result of the correspond- 3178 ing POOLREQ message, and that it may be some time before they 3179 can all be transmitted to the secondary server through the use 3180 of BNDUPD messages. 3182 7.7.2. Receiving the POOLRESP message 3184 When a secondary server receives a POOLRESP message, it SHOULD send 3185 another POOLREQ message if the value of the addresses-transferred 3186 option is non-zero. 3188 Typically, no other action is taken on the reception of a POOLRESP 3189 message. 3191 7.8. CONNECT message [5] 3193 The connect message is used to establish an applications level con- 3194 nection over a newly created TCP connection. It gives the source 3195 information for the connection and critical configuration informa- 3196 tion. It MUST be sent only by the primary server. Either server can 3197 initiate a TCP connection, but the CONNECT message is only sent by 3198 the primary server. 3200 The CONNECT message MUST be the first message sent down a newly esta- 3201 blished connection, and it MUST be sent only by the primary server. 3203 The following table summarizes the options that are associated with 3204 the CONNECT message: 3206 Option 3207 ------ 3208 relationship-name MUST 3209 max-unacked-bndupd MUST 3210 receive-timer MUST 3211 vendor-class-identifier MUST 3212 protocol-version MUST 3213 TLS-request MUST (1) 3214 MCLT MUST 3215 hash-bucket-assignment MUST 3217 (1) MUST NOT if CONNECT is being sent over a TLS connection 3219 Table 7.8-1: Options used in a CONNECT message 3221 7.8.1. Sending the CONNECT message 3223 The CONNECT message MUST be the first message sent by the primary 3224 server after the establishment of a new TCP connection with a secon- 3225 dary server participating in the failover protocol. 3227 The xid of the CONNECT message is not related to any previous xid 3228 sequence, but initiates the sequence for this connection. 3230 The name of the failover relationship MUST be placed in the 3231 relationship-name option. This information is placed in an option 3232 inside of the message in order to allow the identity of the sender to 3233 be covered by a shared secret. 3235 The number of BNDUPD messages the primary server can accept without 3236 blocking the TCP connection MUST be placed in the max-unacked-bndupd 3237 option. This MUST be a number equal to or greater than 1, SHOULD be 3238 a number greater than 10, and SHOULD be a number less than 100. 3240 The length of the receive timer (tReceive, see section 8.3) MUST be 3241 placed in the receive-timer option. 3243 The MCLT MUST be placed in the MCLT option. 3245 The hash-bucket-assignment option MUST be included in the CONNECT 3246 message. In the event that load balancing is not configured for this 3247 server, the hash-bucket-assignment option will indicate that. The 3248 value of the hash-bucket-assignment option is determined from the 3249 specific buckets that the primary server has determined that the 3250 secondary server MUST service as part of the load-balancing 3251 algorithm. The way in which the primary server determines this 3252 information is outside the scope of this protocol definition. The 3253 primary server SHOULD be configured with a percentage of clients that 3254 the secondary server will be instructed to service, and the primary 3255 server SHOULD use the algorithm in [RFC 3074] to generate a Hash 3256 Bucket Assignment which it sends to the secondary server. 3258 The vendor class identifier MUST be placed in the vendor-class- 3259 identifier option. 3261 The protocol-version option MUST be included in every CONNECT mes- 3262 sage. The current value of the protocol version is 1. 3264 The TLS-request option MUST be sent and contains the desired TLS con- 3265 nection request as well as information concerning whether TLS is sup- 3266 ported. If this CONNECT message is being sent over a already 3267 created TLS connection, the TLS-request MUST NOT appear. 3269 7.8.2. Receiving the CONNECT message 3271 When a server established a TCP connection on a failover port, if it 3272 is a PRIMARY server it should send a CONNECT message, and if it is a 3273 secondary server it should wait for a CONNECT message before sending 3274 any messages. To avoid denial of service attacks, a secondary should 3275 only wait for a CONNECT message on a new connection for a limited 3276 amount of time and close the connection if none is received during 3277 that time. 3279 When a secondary server receives a CONNECT message it should: 3281 1. Record the time at which the message was received. 3283 2. Examine the protocol-version option, and decide if this server 3284 is capable of interoperating with another server running that 3285 protocol version. If not, send the CONNECTACK message with 3286 the reject reason 14: "Protocol version mismatch". The server 3287 MUST include its protocol-version in the CONNECTACK message. 3289 3. Examine the TLS-request option. Figure out the TLS-reply 3290 value based on the capabilities and configuration of this 3291 server. If the result for the TLS-reply value is a 1 and the 3292 connection is accepted, indicating use of TLS, then immedi- 3293 ately send the CONNECTACK message and go into TLS negotiation. 3294 If the TLS-reply value implies rejection of the connection, 3295 then immediately send the CONNECTACK message with the TLS- 3296 reply value and the appropriate reject-reason option value. 3297 In all other cases, save the TLS-reply option information for 3298 the eventual CONNECTACK message. 3300 The possibilities for TLS-request and TLS-reply are: 3302 CONNECT CONNECTACK 3303 TLS TLS 3304 request reply 3305 Reject 3306 t1 t1 Reason Comments 3307 -- -- ------ -------- 3308 0 0 no TLS used 3309 0 1 11 primary won't use TLS, secondary requires TLS 3310 1 0 primary desires TLS, secondary doesn't 3311 1 1 primary desires TLS, secondary will use TLS 3312 2 0 9, 10 primary requires TLS and secondary won't 3313 2 1 primary requires TLS and secondary will use TLS 3315 4. Check to see if there is a message-digest option in the CON- 3316 NECT message. If there was, and the server does not support 3317 message-digests, then reject the connection with reject reason 3318 12: "Message digest not supported" in the CONNECTACK. If the 3319 server does support message-digests, then check this message 3320 for validity based on the message-digest, and reject it if the 3321 digest indicates the message was altered with reject reason 3322 20: "Message digest failed to compare". 3324 5. Determine if the sender (from the relationship-name option) 3325 and the implicit role of the sender (i.e., primary) represents 3326 a server with which the receiver was configured to engage in 3327 failover activity. This is performed after any TLS or message 3328 digest processing so that it occurs after a secure connection 3329 is created, to ensure that there is no tampering with the 3330 relationship name of the partner. In the absence of any other 3331 security capability (i.e., when TLS or a message digest is not 3332 used), the server MAY wish to be configured with the IP 3333 address of the partner and check the source-ip of the CONNECT 3334 message against that IP address as a weak form of security. 3336 If not, then the receiving server should reject the CONNECT 3337 request by sending a CONNECTACK message with a reject-reason 3338 value of: 8, invalid failover partner. 3340 If it is, then the receiving failover endpoint should be 3341 determined. 3343 6. Decide if the time delta between the sending of the message, 3344 in the time field, and the receipt of the message, recorded in 3345 step 1 above, is acceptable. A server MAY require an 3346 arbitrarily small delta in time values in order to set up a 3347 failover connection with another server. See section 5.10 for 3348 information on time synchronization. 3350 If the delta between the time values is too great, the server 3351 should reject the CONNECT request by sending a CONNECTACK mes- 3352 sage with a reject-reason of 4, time mismatch too great. 3354 If the time mismatch is not considered too great then the 3355 receiving server MUST record the delta between the servers. 3356 The receiving server MUST use this delta to correct all of the 3357 absolute times received from the other server in all time- 3358 valued options. Note that servers can participate in failover 3359 with arbitrarily great time mismatches, as long as it is more 3360 or less constant. 3362 7. Examine the MCLT option in the CONNECT request and use the 3363 value of the MCLT as the MCLT for this failover endpoint. 3365 The secondary server SHOULD be able to operate with any MCLT 3366 sent by the primary, but if it cannot, then it should send a 3367 CONNECTACK with a reject-reason of 5, MCLT mismatch. In the 3368 event that the MCLT from the primary does not match that con- 3369 figured on the secondary, and the secondary will run with the 3370 primary's value, then the secondary MUST save the MCLT in 3371 secondary storage since it will need it even if it cannot con- 3372 tact the primary. The secondary MUST NOT use a different MCLT 3373 value than it received from the primary even if it cannot con- 3374 tact the primary. 3376 8. The server MUST store hash-bucket-assignment option for use 3377 during processing during NORMAL state. If this hash bucket 3378 assignment conflicts with the secondary server's configured 3379 hash bucket assignment for use in other than NORMAL state, the 3380 secondary server should send a CONNECTACK with a reject reason 3381 of 19, Hash bucket assignment conflict. 3383 9. The receiving server MAY use the vendor-class-identifier to do 3384 vendor specific processing. 3386 7.9. CONNECTACK message [6] 3388 The CONNECTACK message is sent to accept or reject a CONNECT message. 3389 It is sent by the secondary server which received a CONNECT message. 3391 Attempting immediately to reconnect after either receiving a CONNEC- 3392 TACK with a reject-reason or after sending a CONNECTACK with a 3393 reject-reason could yield unwanted looping behavior, since the reason 3394 that the connection was rejected may well not have changed since the 3395 last attempt. A simple suggested solution is to wait a minute or two 3396 after sending or receiving a CONNECTACK message with a reject-reason 3397 before attempting to reestablish communication. 3399 The following table summarizes the options associated with the CON- 3400 NECTACK message: 3402 Option accept reject 3403 ------ 3404 relationship-name MUST MUST 3405 max-unacked-bndupd MUST MUST NOT 3406 receive-timer MUST MUST NOT 3407 vendor-class-identifier MUST MUST NOT 3408 protocol-version MUST MUST 3409 TLS-reply (1) (2) 3410 reject-reason MUST NOT MUST 3411 message MUST NOT SHOULD 3412 MCLT MUST NOT MUST NOT 3413 hash-bucket-assignment MUST NOT MUST NOT 3415 (1) MUST NOT if sending CONNECTACK after TLS negotiation, MUST 3416 if TLS-request in CONNECT, else MUST NOT. 3417 (2) MUST if TLS-request in CONNECT message, else MUST NOT. 3419 Table 7.9-1: Options used in a CONNECTACK message 3421 7.9.1. Sending the CONNECTACK message 3423 The xid of the CONNECTACK message MUST be that of the corresponding 3424 CONNECT message. 3426 The name of the relationship MUST be placed in the relationship-name 3427 option. This information is placed in an option inside of the mes- 3428 sage in order to allow the identity of the sender to be covered by a 3429 shared secret. 3431 The protocol-version option MUST be included in every CONNECTACK mes- 3432 sage. The current value of the protocol version is 1. 3434 If the connection has been rejected, the reject-reason option MUST be 3435 placed in the CONNECTACK message with an appropriate reason, and a 3436 message option SHOULD be included with a human-readable error message 3437 describing the reason for the rejection in some detail. If the 3438 reject-reason option appears, then the remaining options listed below 3439 do not appear. The sending server should close the connection after 3440 sending the CONNECTACK if the connection was rejected. 3442 The results of the TLS negotiation MUST be placed in the TLS-reply 3443 option. If this CONNECTACK message is being sent over an already TLS 3444 secured connection, then there MUST NOT be a TLS-reply option. 3446 If there was a message-digest option in the CONNECT message, then 3447 there MUST be a message-digest in the CONNECTACK message and any sub- 3448 sequent messages if the CONNECTACK does not contain a reject-reason. 3450 The number of BNDUPD messages the server can accept without blocking 3451 the TCP connection MUST be placed in the max-unacked-bndupd option. 3452 This SHOULD be a number greater than 10, and SHOULD be a number less 3453 than 100. 3455 The length of the receive timer (tReceive, see section 8.3) MUST be 3456 placed in the receive-timer option. 3458 The vendor class identifier MUST be placed in the vendor-class- 3459 identifier option. 3461 After a connection is created (either by sending a CONNECTACK message 3462 to the first CONNECT message, or sending a CONNECTACK message to a 3463 CONNECT message received over a TLS connection), the server MUST send 3464 a STATE message. 3466 After a connection is created, the server MUST start two timers for 3467 the connection: tSend and tReceive. The tSend timer SHOULD be 3468 approximately 33 percent of the time in the receiver-timer option in 3469 the corresponding CONNECT message. The tReceive timer SHOULD be the 3470 time sent in the receiver-timer option in the CONNECTACK message. 3472 The tReceive timer is reset whenever a message is received from this 3473 TCP connection. If it ever expires, the TCP connection is dropped 3474 and communications with this partner is considered not ok. The 3475 reject reason 17: "No traffic within sufficient time" is placed in 3476 the DISCONNECT message sent prior to dropping the TCP connection. 3478 The tSend timer is reset whenever a message is sent over this connec- 3479 tion. When it expires, a CONTACT message MUST be sent. 3481 7.9.2. Receiving the CONNECTACK message 3483 If a CONNECTACK message is received with a different XID from the one 3484 in the CONNECT that was sent, it SHOULD be ignored. To avoid denial 3485 of service attacks, a primary should only wait for a CONNECTACK mes- 3486 sage on a new connection for a limited amount of time and close the 3487 connection if none is received during that time. 3489 When a CONNECTACK message is received, the following actions should 3490 be taken: 3492 1. Record the time the message was received. 3494 2. Check to see if the xid on the CONNECTACK matches an outstand- 3495 ing CONNECT message on this TCP connection. 3497 3. Check to see if there is a reject-reason option in the CONNEC- 3498 TACK message. If not, continue with step 3. If there is a 3499 reject-reason option, the server SHOULD report the error code. 3500 If a message option appears a server SHOULD display the string 3501 from the message option in a user visible way. The server 3502 MUST close the connection if a reject-reason option appears. 3504 4. Check the value of the TLS-reply option (if any, which there 3505 won't be if this CONNECT is taking place utilizing TLS), and 3506 if it was 1, then skip processing of the rest of the CONNEC- 3507 TACK message, and immediately enter into TLS connection setup. 3509 This step occurs prior to steps 5 and 6 in order to allow 3510 creation of a secure connection (if required) prior to pro- 3511 cessing the protocol version and IP address information. 3513 5. Examine the value of the protocol-version option. If this 3514 server is able to establish connections with another server 3515 running this protocol version, then continue, else close the 3516 connection. 3518 6. Decide if the time delta between the sending of the message, 3519 in the time field, and the receipt of the message, recorded in 3520 step 1 above, is acceptable. A server MAY require an arbi- 3521 trarily small delta in time values in order to set up a fail- 3522 over connection with another server. 3524 If the delta between the time values is too great, the server 3525 should drop the TCP connection (see section 7.12). 3527 If the time mismatch is not considered too great then the 3528 receiving server MUST record the delta between the servers. 3529 The receiving server MUST use this delta to correct all of the 3530 absolute times received from the other server in all time- 3531 valued options. Note that the failover protocol is con- 3532 structed so that two servers can be failover partners with 3533 arbitrarily great time mismatches. 3535 7. The receiving server MAY use the vendor-class-identifier to do 3536 vendor specific processing. 3538 8. After accepting a CONNECTACK message, the server MUST send a 3539 STATE message. 3541 After receiving a CONNECTACK message, the server MUST start 3542 two timers for the connection: tSend and tReceive. The tSend 3543 timer SHOULD be approximately 20 percent of the time in the 3544 receiver-timer option in the corresponding CONNECTACK message. 3545 The tReceive timer SHOULD be set to the time sent in the 3546 receiver-timer option in the CONNECT message. 3548 The tReceive timer is reset whenever a message is received 3549 from this TCP connection. If it ever expires, the TCP connec- 3550 tion is dropped and communications with this partner is con- 3551 sidered not ok. The reject reason 17: "No traffic within suf- 3552 ficient time" is placed in the DISCONNECT message sent prior 3553 to dropping the TCP connection. 3555 The tSend timer is reset whenever a message is sent over this 3556 connection. When it expires, a CONTACT message MUST be sent. 3558 7.10. STATE message [10] 3560 The state (STATE) message is used to communicate the current failover 3561 state to the partner server. 3563 The STATE message MUST be sent after sending a CONNECTACK message 3564 that didn't contain a reject-reason option, and MUST be sent after 3565 receiving a CONNECTACK message without a reject-reason option. 3567 A STATE message MUST be sent whenever the failover endpoint changes 3568 its failover state and a connection exists to the partner. 3570 The STATE message requires no response from the failover partner. 3572 The following table shows the options that MUST appear in a STATE 3573 message: 3575 Option 3576 ------ 3577 sending-state MUST 3578 server-flags MUST 3579 start-time-of-state MUST 3581 Table 7.10-1: Options used in a STATE message 3583 7.10.1. Sending the STATE message 3585 The current failover state is placed in the server-state option and 3586 the current state of the STARTUP flag is placed in the server-flags 3587 option. 3589 The message is sent with a unique xid. 3591 A server SHOULD only send the STATE message either when the connec- 3592 tion is created (i.e, after sending or receiving a CONNECTACK message 3593 with no reject-reason option), or when there is a change from the 3594 values sent in a previous STATE message. 3596 7.10.2. Receiving the STATE message 3598 Every STATE message SHOULD indicate a change in state or a change in 3599 the flags. 3601 When a STATE message is received, any state transitions specified in 3602 section 9 are taken. 3604 No response to a STATE message is required. 3606 7.11. CONTACT message [11] 3608 The contact (CONTACT) message is sent to verify communications 3609 integrity with a failover partner. The CONTACT message is sent when 3610 no messages have been sent to the failover partner for a specified 3611 period of time. This is determined by the tSend timer expiring (see 3612 section 8.3). 3614 The CONTACT message has no message specific options. 3616 7.11.1. Sending the CONTACT message 3618 The CONTACT message is sent. 3620 7.11.2. Receiving the CONTACT message 3622 When a CONTACT message is received, the tReceive timer is reset (as 3623 it is with any message that is received). 3625 A server SHOULD use the time in the time field and the time the mes- 3626 sage was received to refine the delta time calculations between the 3627 servers. 3629 7.12. DISCONNECT message [12] 3631 The DISCONNECT is the last message sent over a connection before 3632 dropping an established connection (note that an established connec- 3633 tion is one where a CONNECTACK has been sent without a reject rea- 3634 son). 3636 After sending or receiving a DISCONNECT message, a server needs to 3637 have some mechanism to prevent an error loop. Simply reconnecting to 3638 the partner immediately is not the best option, especially after 3639 several consecutive attempts. 3641 A simple suggested solution is to wait a minute or two after sending 3642 or receiving a DISCONNECT before attempting to reestablish communica- 3643 tion. 3645 The DISCONNECT message MUST be the last message sent down a connec- 3646 tion before it is closed. 3648 The following table summarizes the options that are associated with 3649 the DISCONNECT message: 3651 Option 3652 ------ 3653 reject-reason MUST 3654 message SHOULD 3656 Table 7.12-1: Options used in a DISCONNECT message 3658 7.12.1. Sending the DISCONNECT message 3660 The DISCONNECT message MUST be the last message sent by the a server 3661 which is dropping a TCP connection. 3663 The xid of the DISCONNECT message must be unique. 3665 The reject-reason option MUST appear giving a reason why the connec- 3666 tion was dropped. A message option SHOULD appear giving a human 3667 readable error message with possibly more details. 3669 7.12.2. Receiving the DISCONNECT message 3671 When a server receives a DISCONNECT message it should log the message 3672 if there was one and possibly raise an alarm of some sort if the 3673 reject reason was one that was sufficiently serious. 3675 8. Connection Management 3677 Servers participating in the failover protocol communicate over TCP 3678 connections. These TCP connections are used both to transmit bind- 3679 ing information from one server to another as well as to allow each 3680 server to determine whether communications is possible with the other 3681 server. 3683 Central to the operation of the failover protocol is a notion of 3684 "communications okay" or "communications failed". Failover state 3685 transitions are taken in many cases when the status of communications 3686 with the partner changes, and the existence or non-existence of a TCP 3687 connections between failover endpoints is used to determine if com- 3688 munications is "okay" or "failed". 3690 A single TCP connection exists which connects two failover endpoints. 3692 8.1. Connection granularity 3694 There exists one TCP connection between each set of failover end- 3695 points. See section 5.1.1 for an explanation of failover endpoints. 3697 There are a maximum of two TCP connections between any two servers 3698 implementing the failover protocol, one for each of the possible 3699 failover endpoints between these two servers. There is a minimum of 3700 one TCP connection between one server and every other failover server 3701 with which it implements the failover protocol. 3703 8.2. Creating the TCP connection 3705 There are two ports used for initiating TCP connections, correspond- 3706 ing to the two roles that a server can fill with respect to another 3707 server. Every server implementing the failover protocol MUST listen 3708 on at least one of these ports. Port 647 is the port to which pri- 3709 mary servers will attempt a connection, and port 847 is the port to 3710 which secondary servers will attempt a connection. When a connection 3711 attempt is received on port 647, it is therefore from a primary 3712 server, and the primary server is attempting to connect to this 3713 secondary server. Likewise, when a connection attempt is received on 3714 port 847, it is therefore from a secondary server, and the secondary 3715 server is attempting to connect to this primary server." See the 3716 schematic representation below: 3718 Primary Server 3719 -------------- 3720 Listens on port 847 for secondary server to connect to it 3721 Periodically connects on port 647 to contact secondary 3723 Secondary Server 3724 -------------- 3725 Listens on port 647 for primary server to connect to it 3726 Periodically connects on port 847 to contact primary 3728 Every server implementing the failover protocol SHOULD attempt to 3729 connect to all of its partners periodically, where the period is 3730 implementation dependent and SHOULD be configurable. In the event 3731 that a connection has been rejected by a CONNECTACK message with a 3732 reject-reason option contained in it or a DISCONNECT message, a 3733 server SHOULD reduce the frequency with which it attempts to connect 3734 to that server but it SHOULD continue to attempt to connect periodi- 3735 cally. 3737 If a connection attempt has been received from another server in a 3738 particular role (i.e., from a specific failover endpoint) then the 3739 receiving server MUST NOT initiate a connection attempt to the 3740 partner server in that same role. 3742 If both servers happen to attempt to connect simultaneously, the 3743 secondary server MUST drop its attempt in favor of the primary's 3744 attempt. Thus, in the event that a secondary server receives a con- 3745 nection attempt to port 647 from a primary server when it has already 3746 initiated a connection attempt to port 847 on the same primary 3747 server, it MUST accept the connection to port 647 and it MUST drop 3748 drop the connection attempt to port 847. In the event that a primary 3749 server receives a connection attempt to port 847 from a secondary 3750 server when it has already initiated a connection attempt to port 647 3751 on that same server, it MUST reject the connection attempt to port 3752 847 and continue to pursue the connection attempt on port 647. 3754 Once a connection is established, the primary server MUST send a CON- 3755 NECT message across the connection. A secondary server MUST wait for 3756 the CONNECT message from a primary server. 3758 Every CONNECT message includes a TLS-request option, and if the CON- 3759 NECTACK message does not reject the CONNECT message and the TLS-reply 3760 option says TLS MUST be used, then the servers will immediately enter 3761 into TLS negotiation. 3763 Once TLS negotiation is complete, the primary server MUST resend the 3764 CONNECT message on the newly secured TLS connection and then wait for 3765 the CONNECTACK message in response. The TLS-request and TLS-reply 3766 options MUST NOT appear in either this second CONNECT or its associ- 3767 ated CONNECTACK message as they had in the first messages. 3769 The second message sent over a new connection (either a bare TCP con- 3770 nection or a connection utilizing TLS) is a STATE message. Upon the 3771 receipt of this message, the receiver can consider communications up. 3773 It is entirely possible that two servers will attempt to make connec- 3774 tions to each other essentially simultaneously, and in this case the 3775 secondary server will be waiting for a CONNECT message on each con- 3776 nection. The primary server MUST send a CONNECT message over one 3777 connection and it MUST close the other connection. 3779 A secondary server MUST NOT respond to the closing of a TCP connec- 3780 tion with a blind attempt to reconnect -- there may be another TCP 3781 connection to the same failover partner already in use. 3783 8.3. Using the TCP connection for determining communications status 3785 The TCP connection is used to determine the communications status of 3786 the other server, i.e., communications-ok, or communications- 3787 interrupted. 3789 Three things must happen for a server to consider that communications 3790 are ok with respect to another server: 3792 1. A TCP connection must be established to the other server. 3794 2. A CONNECT message must be received and a CONNECTACK message 3795 sent in response. The CONNECT message is used to determine 3796 the identify of the failover endpoint of the other end of the 3797 TCP connection -- without it, the failover endpoint cannot be 3798 uniquely determined. Without knowledge of the failover end- 3799 point, then the entity with which communications is ok is 3800 undetermined. 3802 3. A STATE message must be received from the other server over 3803 the connection. This STATE message initializes important 3804 information necessary to the operation of the state machine 3805 the governs the behavior of this failover endpoint. 3807 There are two ways that a server can determine that communications 3808 has failed: 3810 1. The TCP connection can go down, yielding an error when 3811 attempting to send or receive a message. This will happen at 3812 least as often as the period of the tSend timer. 3814 2. The tReceive timer can expire. 3816 In either of these cases, communications is considered interrupted. 3818 If the tReceive timer expires, the connection MUST be dropped. The 3819 reject reason 17: "No traffic within sufficient time" is placed in 3820 the DISCONNECT message sent prior to dropping the TCP connection. 3822 Several difficulties arise when trying to use one TCP connection for 3823 both bulk data transfer as well as to sense the communications status 3824 of the other server. One aspect of the problem stems from the dif- 3825 ferent requirements of both uses. The bulk data transfer is of 3826 course critically important to the protocol, but the speed with which 3827 it is processed is not terribly significant. It might well be 3828 minutes before a BNDUPD message is processed, and while not optimal, 3829 such an occasional delay doesn't compromise the correctness of the 3830 protocol. However, the speed with which one server detects the other 3831 server is up (or, more importantly, down) is more highly constrained. 3832 Generally one server should be able to detect that the other server 3833 is not communicating within a minute or less. 3835 These differing time constraints makes it difficult to use the same 3836 TCP connection for data transfer as well as to sense communications 3837 integrity. See section 3.5 for additional details on TCP. 3839 The solution to this problem is to require that some message be 3840 received by each end of the connection within a limited time or that 3841 the connection will be considered down. If no messages have been 3842 sent recently, then a CONTACT message is sent. 3844 In the case where there is no data queued to be sent, this is not a 3845 problem, but in the case where there is data queued to be sent to the 3846 partner, then the CONTACT message will not actually be transmitted 3847 until the queued data is sent. Section 3.5 explains why waiting for 3848 TCP to determine that the connection is down is not acceptable, and 3849 leads to a requirement that the receiving server never block the 3850 sending server from sending CONTACT messages. 3852 In order to meet this requirement, each server tells the other server 3853 the number of outstanding BNDUPD messages that it will accept. The 3854 receiving server is required to always be able to accept that many 3855 BNDUPD messages off of the connection's input queue even if it cannot 3856 process them immediately, and to accept all other messages immedi- 3857 ately. 3859 Thus, the sending server's TCP is never blocked from sending a mes- 3860 sage except for very short periods, less than a few seconds unless 3861 the network connection itself has problems. In this case, if the 3862 CONTACT messages don't make it to the partner then the partner will 3863 close the connection. 3865 DISCUSSION: 3867 When implementing this capability, one needs to be careful when 3868 sending any message on the TCP connection as TCP can easily block 3869 the server if the local TCP send buffers are full. This can't be 3870 prevented because if the receiver is not reachable (via the net- 3871 work), the sending TCP can't send and thus it will be unable to 3872 empty the local TCP send buffers. So, all send operations either 3873 need to assume they may block for some time or non-blocking sends 3874 must be used carefully. 3876 8.4. Using the TCP connection for binding data 3878 Binding data, in the form of BNDUPD messages and BNDACK messages to 3879 respond to them, are sent across the TCP connection. 3881 In order to support timely detection of any failure in the partner 3882 server, the TCP connection MUST NOT block for more than a very short 3883 time, on the order of a few seconds. Therefore, a server that is 3884 sending BNDUPD messages MUST send only a restricted number before 3885 receiving BNDACK messages about previous messages sent. 3887 The number of outstanding BNDUPD messages that each server will 3888 accept without causing TCP to block transmission of additional data 3889 (i.e, CONTACT messages) is sent by each server in the CONNECT and 3890 CONNECTACK messages in the max-unacked-bndupd option. 3892 8.5. Using the TCP connection for control messages 3894 The TCP connection is used for control messages: POOLREQ, UPDREQ, 3895 STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL- 3896 RESP, UPDDONE. A server MUST immediately accept all of these mes- 3897 sages from the TCP connection. A server MUST immediately accept any 3898 BNDACK which is received as well. 3900 8.6. Losing the TCP connection 3902 When the TCP connection is lost, then communications is not ok with 3903 the other server. A server which has lost communications SHOULD 3904 immediately attempt to reconnect to the other server, and should 3905 retry these connection attempts periodically. 3907 An acknowledgement message (BNDACK, POOLRESP, UPDDONE) message can 3908 only be sent in response to a request message (BNDUPD, POOLREQ, 3909 UPDREQ, UPDREQALL) on the same TCP connection from which the request 3910 was received, in part since the XID's in the request messages are 3911 guaranteed unique only during the life of a single TCP connection. 3913 When a connection to a partner server goes down, a server with unpro- 3914 cessed request messages MAY simply drop all of those messages, since 3915 it can be sure that the partner will resend them when they are next 3916 in communications. A server with unprocessed BNDUPD messages when a 3917 TCP connection goes down MAY instead choose to process those BNDUPD 3918 messages, but it MUST NOT send any BNDACK messages in response (again 3919 because of the issues surrounding XID uniqueness). 3921 When the TCP connection is closed explicitly, the DISCONNECT message 3922 with a reject-reason option (and, ideally, a message option) MUST be 3923 sent over the TCP connection. 3925 9. Failover Endpoint States 3927 This section discusses the various states that a failover endpoint 3928 may take, and the server actions required when entering the state, 3929 operating in the state, and leaving the state, as well as the events 3930 that cause transitions out of the state into another state. 3932 The state transition diagram in Figure 9.2-1 is relevant for this 3933 section. This is the common state transition diagram for both servers 3934 in a failover pair. In the event that the textual description of a 3935 state differs from the state transition diagram, the textual descrip- 3936 tion is to be considered authoritative. 3938 9.1. Server Initialization 3940 When a server starts it starts out in STARTUP state. See section 9.3 3941 below for details. 3943 9.2. Server State Transitions 3945 Whenever a server makes a transition into a new state, it MUST record 3946 the state and the time at which it entered that state in stable 3947 storage. If communications is "ok", it MUST also send a STATE mes- 3948 sage to its failover partner. 3950 Figure 9.2-1 is the diagram of the server state transitions. The 3951 remainder of this section contains information important to the 3952 understanding of that diagram. 3954 The server stays in the current state until all of the actions 3955 specified on the state transition are complete. If communications 3956 fails during one of the actions, the server simply stays in the 3957 current state and attempts a transition whenever the conditions for a 3958 transition are later fulfilled. 3960 In the state transition diagram below, the "+" or "-" in the upper 3961 right corner of each state is a notation about whether communication 3962 is ongoing with the other server. 3964 The legend "responsive", "balanced", or "unresponsive" in each state 3965 indicates whether the server is responsive to all DHCP client 3966 requests, running in load balanced mode, or totally unresponsive in 3967 the respective state. The terms "responsive" and "unresponsive" have 3968 the obvious meanings, while "balanced" means that a DHCP server may 3969 respond to all DHCPREQUEST messages that are RENEWAL or REBINDING, 3970 and to all other messages from clients for which the load balancing 3971 algorithm indicates that it MUST respond to. See sections 5.3 and 3972 9.8.2 for details on load balancing. 3974 In the state transition diagram below, when communication is reesta- 3975 blished between the two servers, each must record the state of the 3976 partner when communication was restored. State transitions on one 3977 server in some cases imply state transitions on the partner server, 3978 so a record of the current state of the partner server must be kept 3979 by each server. 3981 If the state of the partner changes while communicating a server 3982 moves through the communications-failed transition and into whatever 3983 state results. It then immediately moves through whatever state 3984 transition is appropriate given the current state of the partner 3985 server. A server performing this operation SHOULD NOT close the TCP 3986 connection to its partner. 3988 DISCUSSION: 3990 The point of this technique is simplicity, both in explanation of 3991 the protocol and in its implementation. The alternative to this 3992 technique of memory of partner state and automatic state transi- 3993 tion on change of partner state is to have every state in the fol- 3994 lowing diagram have a state transition for every possible state of 3995 the partner. With the approach adopted, only the states in which 3996 communications are reestablished require a state transition for 3997 each possible partner state. 3999 The current state of a server MUST be recorded in stable storage and 4000 thus be available to the server after a server restart. 4002 A transition into SHUTDOWN or PAUSED state is not represented in the 4003 following figure, since other than sending that state to its partner, 4004 the remaining actions involved look just like the server halting in 4005 its otherwise current state, which then becomes the previous state 4006 upon server restart. 4008 +---------------+ V +--------------+ 4009 | RECOVER -|+| | | STARTUP - | 4010 |(unresponsive) | +->+(unresponsive)| 4011 +------+--------+ +--------------+ 4012 +-Comm. OK +-----------------+ 4013 | Other State: | PARTNER DOWN - +<----------------------+ 4014 | RESOLUTION-INTER. | (responsive) | ^ 4015 All POTENTIAL- +----+------------+ | 4016 Others CONFLICT------------ | --------+ | 4017 | CONFLICT-DONE Comm. OK | +--------------+ | 4018 UPDREQ or Other State: | +--+ RESOLUTION - | | 4019 UPDREQALL | | | | | INTERRUPTED | | 4020 Rcv UPDDONE RECOVER All | | | (responsive) | | 4021 | +---------------+ | Others | | +------------+-+ | 4022 +->+RECOVER-WAIT +-| RECOVER | | | ^ | | 4023 |(unresponsive) | WAIT or | | Comm. | Ext. | 4024 +-----------+---+ DONE | | OK Comm. Cmd----->+ 4025 Comm.---+ Wait MCLT | V V V Failed | 4026 Changed | V +---+ +---+-----+--+-+ | | 4027 | +---+----------++ | | POTENTIAL + +-------+ | 4028 | |RECOVER-DONE +-| Wait | CONFLICT +------+ | 4029 +->+(unresponsive) | for |(unresponsive)| Primary | 4030 +------+--------+ Other +>+----+--------++ resolve Comm. | 4031 Comm. OK State: | | ^ conflict Changed | 4032 +---Other State:-+ RECOVER | Secondary | V V | | 4033 | | | DONE | resolve | ++----------+---++ | 4034 | All Others: POTENT. | | conflict | |CONFLICT-DONE-|+| | 4035 | Wait for CONFLICT- | ----+ see (9.10) | | (responsive) | | 4036 | Other State: V V | +------+---------+ | 4037 | NORMAL or RECOVER ++------------+---+ Other State: NORMAL | 4038 | | DONE | NORMAL + +<--------------+ | 4039 | +--+----------+-->+ (balanced) +-------External Command--->+ 4040 | ^ ^ +--------+--------+ or Other State: | 4041 | | | | | SHUTDOWN | 4042 | Wait for Comm. OK Comm. Failed or | | 4043 | Other Other Other State: PAUSED | External 4044 | State: State: | | Command 4045 | RECOVER-DONE NORMAL Start Safe Comm. OK or 4046 | | COMM. INT. Period Timer Other State: Safe 4047 | Comm. OK. | V All Others Period 4048 | Other State: | +---------+--------+ | expiration 4049 | RECOVER +--+ COMMUNICATIONS - +----+ | 4050 | +-------------+ INTERRUPTED | | 4051 RECOVER | (responsive) +-------------------------->+ 4052 RECOVER-WAIT--------->+------------------+ 4053 Figure 9.2-1: Server state diagram. 4055 9.3. STARTUP state 4057 The STARTUP state affords an opportunity for a server to probe its 4058 partner server, before starting to service DHCP clients. 4060 DISCUSSION: 4062 Without the STARTUP state, a server would likely start in a state 4063 derived from its previously stored state (held in stable storage), 4064 if any. However, this may be inconsistent with the current state 4065 of the partner. The STARTUP state affords the opportunity for a 4066 server to potentially learn the partner's state and determine if 4067 that state is consistent with its derived starting state or 4068 whether some significant state change has occurred at the partner 4069 that forces the server to start in another state. This is 4070 especially critical if significant time has elapsed while the 4071 server was down. 4073 9.3.1. Operation while in STARTUP state 4075 Whenever a server is in STARTUP state, it MUST be unresponsive to 4076 DHCP client requests, and so the time spent in the STARTUP state is 4077 necessarily short, typically on the order of a few seconds to a few 4078 tens of seconds. The exact time spent in the STARTUP state is imple- 4079 mentation dependent, and the primary and secondary server are not 4080 required to spend the same amount of time in the STARTUP state. See 4081 section 5.9 for some guidelines on the time to spend in STARTUP 4082 state. 4084 Whenever a STATE message is sent to the partner while in STARTUP 4085 state the STARTUP bit MUST be set in the server-flags option and the 4086 previously recorded failover state MUST be placed in the server-state 4087 option. 4089 9.3.2. Transition out of STARTUP state 4091 Each server starts out in startup state every time it initializes 4092 itself, and performs the following algorithm as part of its initiali- 4093 zation: 4095 1. Is there any record in stable storage of a previous failover 4096 state? If yes, set previous-state to the last recorded state 4097 in stable storage, and continue with step 2. 4099 Is there any configuration information that indicates that 4100 this server was previously running but lost its stable 4101 storage? Such information must typically come from some 4102 administrative intervention, since it is difficult for a 4103 server to distinguish first startup from a startup after it 4104 has lost its stable storage. If yes, then set the previous- 4105 state to RECOVER, and set the time-of-failure to whatever time 4106 was configured, and go on to step 2. This time-of-failure 4107 will be used in the transition out of the RECOVER-WAIT state 4108 into the RECOVER-DONE state, below. 4110 If there is no record of any previous failover state in stable 4111 storage for this server, then set the previous-state to 4112 RECOVER and set the time-of-failure to a time before the 4113 maximum-client-lead-time before now. If using standard Posix 4114 times, 0 would typically do quite well. This will allow two 4115 servers which already have lease information to synchronize 4116 themselves prior to operating. 4118 Note that neither server is responsive to DHCP client requests 4119 while in the RECOVER state. If both servers can communicate, 4120 however, they will come out of the RECOVER state and progress 4121 through RECOVER-WAIT to RECOVER-DONE and thence to NORMAL or 4122 COMMUNICATIONS-INTERRUPTED state quickly. If both have state, 4123 then they will exchange information. If only one has state, 4124 then the one that does not will complete its update of its 4125 partner quickly (since it has nothing to send). 4127 In some cases, an existing server will be commissioned as a 4128 failover server and brought back into operation where its 4129 partner is not yet available. In this case, the newly commis- 4130 sioned failover server will not operate until its partner 4131 comes online -- but it has operational responsibilities as a 4132 DHCP server nonetheless. To properly handle this situation, a 4133 server SHOULD be configurable in such a way as to move 4134 directly into PARTNER-DOWN state after the startup period 4135 expires if it has been unable to contact its partner during 4136 the startup period. 4138 2. If the previous state is one where communications was "OK", 4139 then set the previous state to the state that is the result of 4140 the communications failed state transition in Figure 9.2-1 (if 4141 such transition is shown -- some states don't have a communi- 4142 cations failed state transition, since they allow both commun- 4143 ications OK and failed). 4145 3. Start the STARTUP state timer. The time that a server remains 4146 in the STARTUP state (absent any communications with its 4147 partner) is implementation dependent and SHOULD be 4148 configurable. It SHOULD be long enough for a TCP connection 4149 to be created to a heavily loaded partner across a slow net- 4150 work. 4152 4. Attempt to create a TCP connection to the failover partner. 4153 See section 8.2. 4155 5. Wait for "communications okay", i.e., the process discussed in 4156 section 8.2 "Creating the TCP Connection", to complete, 4157 including the receipt of a STATE message from the partner. 4159 When and if communications become "okay", clear the STARTUP 4160 flag, and set the current state to the previous-state. 4162 If the partner is in PARTNER-DOWN state, and if the time at 4163 which it entered PARTNER-DOWN state (as received in the 4164 start-time-of-state option in the STATE message) is later than 4165 the last recorded time of operation of this server, then set 4166 the current state to RECOVER. If the time at which it entered 4167 PARTNER-DOWN state is earlier than the last recorded time of 4168 operation of this server, then set the current state to 4169 POTENTIAL-CONFLICT. 4171 Then, transition to the current state and take the "communica- 4172 tions okay" state transition based on the current state of 4173 this server and the partner. 4175 6. If the startup time expires, take an implementation dependent 4176 action: The server MAY go to the previous-state, or the 4177 server MAY wait. 4179 Reasons to go to previous-state and begin processing: 4181 If the current server is the only operational server, then if 4182 it waits, there will be no operational DHCP servers. This 4183 situation could occur very easily where one server fails and 4184 then the other crashes and reboots. If the rebooting server 4185 doesn't start processing DHCP client requests without first 4186 being in communication with the other server, then the level 4187 of DHCP redundancy is not particularly high. This is an 4188 appropriate approach if the possibility of partition is low, 4189 or if the safe period expiration time is well beyond the time 4190 at which an operator would notice and react to a partition 4191 situation. It is also quite appropriate if the safe period 4192 will never expire. 4194 Reasons to wait: 4196 If the current server has been down for longer than the 4197 maximum-client-lead-time, and it is partitioned from the other 4198 server, then when it returns it will attempt to use its own 4199 available addresses to allocate to new DHCP clients, and the 4200 other server may well be in PARTNER-DOWN state and may have 4201 already allocated some of those available addresses to DHCP 4202 clients. In cases where the possibility of partition is high, 4203 and the safe period expiration time is less than the likely 4204 operator reaction time, this is a good approach to use. 4206 9.4. PARTNER-DOWN state 4208 PARTNER-DOWN state is a state either server can enter. When in this 4209 state, the server does not assume that the other server could still 4210 be operating and servicing a different set of clients, but instead 4211 assumes that it is the only server operating. If one server is in 4212 PARTNER-DOWN state, the other server MUST NOT be operating. 4214 9.4.1. Upon entry to PARTNER-DOWN state 4216 No special actions are required when entering PARTNER-DOWN state. 4218 The server should continue to attempt to connect to the partner 4219 periodically. 4221 9.4.2. Operation while in PARTNER-DOWN state 4223 A server in PARTNER-DOWN state MUST respond to DHCP client requests. 4224 It will allow renewal of all outstanding leases on IP addresses, and 4225 will allocate IP addresses from its own pool, and after a fixed 4226 period of time (the MCLT interval) has elapsed from entry into 4227 PARTNER-DOWN state, it will allocate IP addresses from the set of all 4228 available IP addresses. 4230 Once a server has entered NORMAL state, the PARTNER-DOWN state is 4231 entered only on command of an external agency (typically an adminis- 4232 trator of some sort) or after the expiration of an externally config- 4233 ured minimum safe-time after the beginning of COMMUNICATIONS- 4234 INTERRUPTED state. 4236 Any IP address tagged as available for allocation by the other server 4237 (at entry to PARTNER-DOWN state) MUST NOT be allocated to a new 4238 client until the maximum-client-lead-time beyond the entry into 4239 PARTNER-DOWN state has elapsed. 4241 A server in PARTNER-DOWN state MUST NOT allocate an IP address to a 4242 DHCP client different from that to which it was allocated at the 4243 entrance to PARTNER-DOWN state until the maximum-client-lead-time 4244 beyond the maximum of the following times: client expiration time, 4245 most recently transmitted potential-expiration-time, most recently 4246 received ack of potential-expiration-time from the partner, and most 4247 recently acked potential-expiration-time to the partner. See section 4248 7.1.5 for details. If this time would be earlier than the current 4249 time plus the maximum-client-lead-time, then the time the server 4250 entered PARTNER-DOWN state plus the maximum-client-lead-time is used. 4252 Two options exist for lease times given out while in PARTNER-DOWN 4253 state, with different ramifications flowing from each. 4255 If the server wishes the Failover protocol to protect it from loss of 4256 stable storage in PARTNER-DOWN state, then it should ensure that the 4257 MCLT based lease time restrictions in section 5.1 are maintained, 4258 even in PARTNER-DOWN state. 4260 If the server wishes to forego the protection of the Failover proto- 4261 col in the event of loss of stable storage, then it need recognize no 4262 restrictions on actual client lease times while in PARTNER-DOWN 4263 state. 4265 A server in PARTNER-DOWN state MUST continue to attempt to establish 4266 communications and synchronization with its partner. 4268 9.4.3. Transitions out of PARTNER-DOWN state 4270 When a server in PARTNER-DOWN state succeeds in establishing a con- 4271 nection to its partner, its actions are conditional on the state and 4272 flags received in the STATE message from the other server as part of 4273 the process of establishing the connection. 4275 If the STARTUP bit is set in the server-flags option of a received 4276 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 4277 transitions based on reestablishing communications. Essentially, if a 4278 server is in PARTNER-DOWN state, it ignores all STATE messages from 4279 its partner that have the STARTUP bit set in the server-flags option 4280 of the STATE message. 4282 If the STARTUP bit is not set in the server-flags option of a STATE 4283 message received from its partner, then a server in PARTNER-DOWN 4284 state takes the following actions based on the value of the server- 4285 state option in the received STATE message (either immediately after 4286 establishing communications or at any time later when a new state is 4287 received): 4289 o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN, 4290 POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE 4291 state 4293 transition to POTENTIAL-CONFLICT state 4295 o partner in RECOVER, RECOVER-WAIT, SHUTDOWN, PAUSED state 4297 stay in PARTNER-DOWN state 4299 o partner in RECOVER-DONE state 4301 transition into NORMAL state 4303 9.5. RECOVER state 4305 This state indicates that the server has no information in its stable 4306 storage or that it is re-integrating with a server in PARTNER-DOWN 4307 state after it has been down. A server in this state MUST attempt to 4308 refresh its stable storage from the other server. 4310 9.5.1. Operation in RECOVER state 4312 A server in RECOVER MUST NOT respond to DHCP client requests. 4314 A server in RECOVER state will attempt to reestablish communications 4315 with the other server. 4317 9.5.2. Transitions out of RECOVER state 4319 If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, 4320 or CONFLICT-DONE state when communications are reestablished, then 4321 the server in RECOVER state will move to POTENTIAL-CONFLICT state 4322 itself. 4324 If the other server is in any other state, then the server in RECOVER 4325 state will request an update of missing binding information by send- 4326 ing an UPDREQ message. If the server has been instructed (through 4327 configuration or other external agency) that it has lost its stable 4328 storage, or if it has deduced that from the fact that it has no 4329 record of ever having talked to its partner, while its partner does 4330 have a record of communicating with it, it MUST send an UPDREQALL 4331 message, otherwise it MUST send an UPDREQ message. See Figure 4332 9.5.2-1. 4334 It will wait for an UPDDONE message, and upon receipt of that message 4335 it will transition to RECOVER-WAIT state. 4337 If communications fails during the reception of the results of the 4338 UPDREQ or UPDREQALL message, the server will remain in RECOVER state, 4339 and will re-issue the UPDREQ or UPDREQALL when communications are 4340 re-established. (See section 5.17). 4342 If an UPDDONE message isn't received within an implementation depen- 4343 dent amount of time, and no BNDUPD messages are being received, the 4344 connection SHOULD be dropped. 4346 A B 4347 Server Server 4349 | | 4350 RECOVER PARTNER-DOWN 4351 | | 4352 | >--UPDREQ--------------------> | 4353 | | 4354 | <---------------------BNDUPD--< | 4355 | >--BNDACK--------------------> | 4356 ... ... 4357 | | 4358 | <---------------------BNDUPD--< | 4359 | >--BNDACK--------------------> | 4360 | | 4361 | <--------------------UPDDONE--< | 4362 | | 4363 RECOVER-WAIT | 4364 | | 4365 | >--STATE-(RECOVER-WAIT)------> | 4366 | | 4367 | | 4368 Wait MCLT from last known | 4369 time of failover operation | 4370 | | 4371 RECOVER-DONE | 4372 | | 4373 | >--STATE-(RECOVER-DONE)------> | 4374 | NORMAL 4375 | <-------------(NORMAL)-STATE--< | 4376 NORMAL | 4377 | >---- State-(NORMAL)---------------> 4378 | | 4379 | | 4381 Figure 9.5.2-1: Transition out of RECOVER state 4383 If, at any time while a server is in RECOVER state communications fails, 4384 the server will stay in RECOVER state. When communications are 4385 restored, it will restart the process of transitioning out of RECOVER 4386 state. 4388 9.6. RECOVER-WAIT state 4390 This state indicates that the server has done an UPDREQ or UPDREQALL 4391 and has received the UPDDONE message indicating that it has received 4392 all outstanding binding update information. In the RECOVER-WAIT 4393 state the server will wait for the MCLT in order to ensure that any 4394 processing that this server might have done prior to losing its 4395 stable storage will not cause future difficulties. 4397 9.6.1. Operation in RECOVER-WAIT state 4399 A server in RECOVER-WAIT MUST NOT respond to DHCP client requests. 4401 9.6.2. Transitions out of RECOVER-WAIT state 4403 Upon entry to RECOVER-WAIT state the server MUST start a timer whose 4404 expiration is set to a time equal to the time the server went down 4405 (if known) or the time the server started (if the down-time is 4406 unknown) plus the maximum-client-lead-time. When this timer goes 4407 off, the server will transition into RECOVER-DONE state. 4409 This is to allow any IP addresses that were allocated by this server 4410 prior to loss of its client binding information in stable storage to 4411 contact the other server or to time out. 4413 If this is the first time this server has run failover -- as 4414 determined by the information received from the partner, not 4415 necessarily only as determined by this server's stable storage (as 4416 that may have been lost), then the waiting time discussed above may 4417 be skipped, and the server may transition immediately to RECOVER-DONE 4418 state. 4420 See Figure 9.5.2-1. 4422 DISCUSSION: 4424 The actual requirement on this wait period in RECOVER is that it 4425 start not before the recovering server went down, not necessarily 4426 when it came back up. If the time when the recovering server 4427 failed is known, it could be communicated to the recovering server 4428 (perhaps through actions of the network administrator), and the 4429 wait period could be reduced to the maximum-client-lead-time less 4430 the difference between the current time and the time the server 4431 failed. In this way, the waiting period could be minimized. 4432 Various heuristics could be used to estimate this time, for 4433 example if the recovering server periodically updates stable 4434 storage with a time stamp, the wait period could be calculated to 4435 start at the time of the last update of stable storage plus the 4436 time required for the next update (which never occurred). This 4437 estimate is later than the server went down, but probably not too 4438 much later. 4440 If the server has never before run failover, then there is no need 4441 to wait in this state -- but, again, to determine if this server 4442 has run failover it is vital that the information provided by the 4443 partner be utilized, since the stable storage of this server may 4444 have been lost. 4446 If communications fails while a server is in RECOVER-WAIT state, it 4447 has no effect on the operation of this state. The server SHOULD 4448 continue to operate its timer, and the timer goes off during the 4449 period where communications with the other server have failed, then 4450 the server SHOULD transition to RECOVER-DONE state. This is rare -- 4451 failover state transitions are not usually made while communications 4452 are interrupted, but in this case there is no reason to inhibit the 4453 timer. A server MAY state in RECOVER-WAIT state even after expiry of 4454 the timer and transition to RECOVER-DONE state upon re-establishing 4455 communications with the partner if desired. The key point here is to 4456 allow the timer to continue to operate, not whether or not the state 4457 transition is made before or after communications are re-established. 4459 9.7. RECOVER-DONE state 4461 This state exists to allow an interlocked transition for one server 4462 from RECOVER state and another server from PARTNER-DOWN or 4463 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 4465 9.7.1. Operation in RECOVER-DONE state 4467 A server in RECOVER-DONE state MUST respond only to 4468 DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 4470 9.7.2. Transitions out of RECOVER-DONE state 4472 When a server in RECOVER-DONE state determines that its partner 4473 server has entered NORMAL or RECOVER-DONE state, then it will transi- 4474 tion into NORMAL state. 4476 If communications fails while in RECOVER-DONE state, a server will 4477 stay in RECOVER-DONE state. 4479 9.8. NORMAL state 4481 NORMAL state is the state used by a server when it is communicating 4482 with the other server, and any required resynchronization has been 4483 performed. While some bindings database synchronization is performed 4484 in NORMAL state, potential conflicts are resolved prior to entry into 4485 NORMAL state as is binding database data loss. 4487 9.8.1. Upon entry to NORMAL state 4489 When entering NORMAL state, a server will send to the other server 4490 all currently unacknowledged binding updates as BNDUPD messages. 4492 When the above process is complete, if the server entering NORMAL 4493 state is a secondary server, then it will request IP addresses for 4494 allocation using the POOLREQ message. 4496 9.8.2. Processing DHCP client requests and load balancing 4498 In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or 4499 DHCPREQUEST/REBINDING request it receives. And, it processes other 4500 requests only for those clients as dictated by the load balancing 4501 algorithm specified in [RFC 3074]. 4503 As discussed in section 5.3, each server will take the client- 4504 identifier from each DHCP client request (or the client-hardware- 4505 address, i.e., the chaddr if no client-identifier is present in the 4506 request) and use it as the 'Request ID' specified in [RFC 3074]. 4507 After applying the algorithm specified in [RFC 3074] and comparing 4508 the result with the hash bucket assignment (performed during connect 4509 processing between failover servers), each failover server will be 4510 able to unambiguously determine if it should process the DHCP client 4511 request. 4513 9.8.3. Operation in NORMAL state 4515 When in NORMAL state, for every DHCP client request that it 4516 processes, as determined by the algorithm described in section 9.8.2, 4517 above, a server will operate in the following manner: 4519 o Lease time calculations 4521 As discussed in section 5.2.1, "Control of lease time", the 4522 lease interval given to a DHCP client can never be more than the 4523 MCLT greater than the most recently received potential- 4524 expiration-time from the failover partner or the current time, 4525 whichever is later. 4527 As long as a server adheres to this constraint, the specifics of 4528 the lease interval that it gives to a DHCP client or the value 4529 of the potential-expiration-time sent to its failover partner 4530 are implementation dependent. One possible approach is dis- 4531 cussed in section 5.2.1, but that particular approach is in no 4532 way required by this protocol. 4534 See section 7.1.5 for details concerning the storage of time 4535 associated with IP addresses and how to use these times when 4536 calculating lease times for DHCP clients. 4538 o Lazy update of partner server 4540 After an DHCPACK of a IP address binding, the server servicing a 4541 DHCP client request attempts to update its partner with the new 4542 binding information. The lease time used in the update of the 4543 secondary MUST be at least that given to the DHCP client in the 4544 DHCPACK, and the potential-expiration-time MUST be at least the 4545 lease time, and SHOULD be considerably longer. 4547 o Reallocation of IP addresses between clients 4549 Whenever a client binding is released or expires, a BNDUPD mes- 4550 sage must be sent to the partner, setting the binding state to 4551 RELEASED or EXPIRED. However, until a BNDACK is received for 4552 this message, the IP address cannot be allocated to another 4553 client. It cannot be allocated to the same client again if a 4554 BNDUPD was sent, otherwise it can. See section 5.2.2. 4556 In normal state, each server receives binding updates from its 4557 partner server in BNDUPD messages. It records these in its client 4558 binding database in stable storage and then sends a corresponding 4559 BNDACK message to its partner server. It MUST ensure that the infor- 4560 mation is recorded in stable storage prior to sending the BNDACK mes- 4561 sage back to its partner. 4563 9.8.4. Transitions out of NORMAL state 4565 If an external command is received by a server in NORMAL state 4566 informing it that its partner is down, then transition into PARTNER- 4567 DOWN state. Generally, this would be an unusual situation, where 4568 some external agency knew the partner server was down. Using the 4569 command in this case would be appropriate if the polling interval and 4570 timeout were long. 4572 If a server in NORMAL state fails to receive acks to messages sent to 4573 its partner for an implementation dependent period of time, it MAY 4574 move into COMMUNICATIONS-INTERRUPTED state. This situation might 4575 occur if the partner server was capable of maintaining the TCP con- 4576 nection between the server and also capable of sending a CONTACT mes- 4577 sage every tSend seconds, but was (for some reason) incapable of pro- 4578 cessing BNDUPD messages. 4580 If the communications is determined to not be "ok" (as defined in 4581 section 8), then transition into COMMUNICATIONS-INTERRUPTED state. 4583 If a server in NORMAL state receives any messages from its partner 4584 where the partner has changed state from that expected by the server 4585 in NORMAL state, then the server should transition into 4586 COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- 4587 sition from there. For example, it would be expected for the partner 4588 to transition from POTENTIAL-CONFLICT into NORMAL state, but not for 4589 the partner to transition from NORMAL into POTENTIAL-CONFLICT state. 4591 If a server in NORMAL state receives any messages from its partner 4592 where the PARTNER has changed into PAUSED state, the server should 4593 transition into COMMUNICATIONS-INTERRUPTED state. If a server in 4594 NORMAL state receives any messages from its partner where the PARTNER 4595 has changed into SHUTDOWN state, the server should transition into 4596 PARTNER-DOWN state. 4598 9.9. COMMUNICATIONS-INTERRUPTED State 4600 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 4601 unable to communicate with the other server. Primary and secondary 4602 servers cycle automatically (without administrative intervention) 4603 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 4604 connection between them fails and recovers, or as the partner server 4605 cycles between operational and non-operational. No duplicate IP 4606 address allocation can occur while the servers cycle between these 4607 states. 4609 9.9.1. Upon entry to COMMUNICATIONS-INTERRUPTED state 4611 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 4612 configured to support an automatic transition out of COMMUNICATIONS- 4613 INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period" 4614 has been configured, see section 10), then a timer MUST be started 4615 for the length of the configured safe period. 4617 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 4618 the NORMAL state SHOULD raise some alarm condition to alert adminis- 4619 trative staff to a potential problem in the DHCP subsystem. 4621 9.9.2. Operation in COMMUNICATIONS-INTERRUPTED State 4623 In this state a server MUST respond to all DHCP client requests, and 4624 the algorithm for load balancing described in section 5.3 MUST NOT be 4625 used. When allocating new IP addresses, each server allocates from 4626 its own IP address pool, where the primary MUST allocate only FREE IP 4627 addresses, and the secondary MUST allocate only BACKUP IP addresses. 4628 When responding to renewal requests, each server will allow continued 4629 renewal of a DHCP client's current lease on an IP address irrespec- 4630 tive of whether that lease was given out by the receiving server or 4631 not, although the renewal period MUST NOT exceed the maximum client 4632 lead time (MCLT) beyond the latest of: 1) the potential-expiration- 4633 time already acknowledged by the other server, or 2) the lease- 4634 expiration-time, or 3) the potential-expiration-time received from 4635 the partner server. 4637 However, since the server cannot communicate with its partner in this 4638 state, the acknowledged-potential-expiration time will not be updated 4639 in any new bindings. This is likely to eventually cause the actual- 4640 client-lease-times to be the current time plus the maximum-client- 4641 lead-time (unless this is greater than the desired-client-lease- 4642 time). 4644 The server should continue to try to establish a connection with its 4645 partner. 4647 9.9.3. Transition out of COMMUNICATIONS-INTERRUPTED State 4649 If the safe period timer expires while a server is in the 4650 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 4651 PARTNER-DOWN state. 4653 If an external command is received by a server in COMMUNICATIONS- 4654 INTERRUPTED state informing it that its partner is down, it will 4655 transition immediately into PARTNER-DOWN state. 4657 If communications is restored with the other server, then the server 4658 in COMMUNICATIONS-INTERRUPTED state will transition into another 4659 state based on the state of the partner: 4661 o partner in NORMAL or COMMUNICATIONS-INTERRUPTED 4662 The partner SHOULD NOT be in NORMAL state here, since upon res- 4663 toration of communications it MUST have created a new TCP con- 4664 nection which would have forced it into COMMUNICATIONS- 4665 INTERRUPTED state. Still, we should account for every state 4666 just in case. 4668 Transition into the NORMAL state. 4670 o partner in RECOVER 4672 Stay in COMMUNICATIONS-INTERRUPTED state. 4674 o partner in RECOVER-DONE 4676 Transition into NORMAL state. 4678 o partner in PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or 4679 RESOLUTION-INTERRUPTED 4681 Transition into POTENTIAL-CONFLICT state. 4683 o partner in PAUSED 4685 Stay in COMMUNICATIONS-INTERRUPTED state. 4687 o partner in SHUTDOWN 4689 Transition into PARTNER-DOWN state. 4691 The following figure illustrates the transition from NORMAL to 4692 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 4694 Primary Secondary 4695 Server Server 4697 NORMAL NORMAL 4698 | >--CONTACT-------------------> | 4699 | <--------------------CONTACT--< | 4700 | [TCP connection broken] | 4701 COMMUNICATIONS : COMMUNICATIONS 4702 INTERRUPTED : INTERRUPTED 4703 | [attempt new TCP connection] | 4704 | [connection succeeds] | 4705 | | 4706 | >--CONNECT-------------------> | 4707 | <-----------------CONNECTACK--< | 4708 | NORMAL 4709 | <-------------------STATE-----< | 4710 NORMAL | 4711 | >--STATE---------------------> | 4712 | 4713 | >--BNDUPD--------------------> | 4714 | <---------------------BNDACK--< | 4715 | | 4716 | <---------------------BNDUPD--< | 4717 | >------BNDACK----------------> | 4718 ... ... 4719 | | 4720 | <--------------------POOLREQ--< | 4721 | >--POOLRESP-(2)--------------> | 4722 | | 4723 | >--BNDUPD-(#1)---------------> | 4724 | <---------------------BNDACK--< | 4725 | | 4726 | <--------------------POOLREQ--< | 4727 | >--POOLRESP-(0)--------------> | 4728 | | 4729 | >--BNDUPD-(#2)---------------> | 4730 | <---------------------BNDACK--< | 4731 | | 4733 Figure 9.9.3-1: Transition from NORMAL to COMMUNICATIONS- 4734 INTERRUPTED and back (example with 2 4735 addresses allocated to secondary) 4737 9.10. POTENTIAL-CONFLICT state 4739 This state indicates that the two servers are attempting to re- 4740 integrate with each other, but at least one of them was running in a 4741 state that did not guarantee automatic reintegration would be 4742 possible. In POTENTIAL-CONFLICT state the servers may determine that 4743 the same IP address has been offered and accepted by two different 4744 DHCP clients. 4746 It is a goal of this protocol to minimize the possibility that 4747 POTENTIAL-CONFLICT state is ever entered. 4749 9.10.1. Upon entry to POTENTIAL-CONFLICT state 4751 When a primary server enters POTENTIAL-CONFLICT state it should 4752 request that the secondary send it all updates of which it is 4753 currently unaware by sending an UPDREQ message to the secondary 4754 server. 4756 A secondary server entering POTENTIAL-CONFLICT state will wait for 4757 the primary to send it an UPDREQ message. 4759 9.10.2. Operation in POTENTIAL-CONFLICT state 4761 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 4762 DHCP requests. 4764 9.10.3. Transitions out of POTENTIAL-CONFLICT state 4766 If communications fails with the partner while in POTENTIAL-CONFLICT 4767 state, then the server will transition to RESOLUTION-INTERRUPTED 4768 state. 4770 Whenever either server receives an UPDDONE message from its partner 4771 while in POTENTIAL-CONFLICT state, it MUST transition to a new state. 4772 The primary MUST transition to CONFLICT-DONE state, and the secondary 4773 MUST transition to NORMAL state. This will cause the primary server 4774 to leave POTENTIAL-CONFLICT state prior to the secondary, since the 4775 primary sends an UPDREQ message and receives an UPDDONE before the 4776 secondary sends an UPDREQ message and receives its UPDDONE message. 4778 When a secondary server receives an indication that the primary 4779 server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE 4780 state, it SHOULD send an UPDREQ message to the primary server. 4782 Primary Secondary 4783 Server Server 4785 | | 4786 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 4787 | | 4788 | >--UPDREQ--------------------> | 4789 | | 4790 | <---------------------BNDUPD--< | 4791 | >--BNDACK--------------------> | 4792 ... ... 4793 | | 4794 | <---------------------BNDUPD--< | 4795 | >--BNDACK--------------------> | 4796 | | 4797 | <--------------------UPDDONE--< | 4798 NORMAL | 4799 | >--STATE--(NORMAL)-----------> | 4800 | <---------------------UPDREQ--< | 4801 | | 4802 | >--BNDUPD--------------------> | 4803 | <---------------------BNDACK--< | 4804 ... ... 4805 | >--BNDUPD--------------------> | 4806 | <---------------------BNDACK--< | 4807 | | 4808 | >--UPDDONE-------------------> | 4809 | NORMAL 4810 | <------------STATE--(NORMAL)--< | 4811 | | 4812 | <--------------------POOLREQ--< | 4813 | >------POOLRESP-(n)----------> | 4814 | addresses | 4816 Figure 9.8.3-1: Transition out of POTENTIAL-CONFLICT 4818 9.11. RESOLUTION-INTERRUPTED state 4820 This state indicates that the two servers were attempting to re- 4821 integrate with each other in POTENTIAL-CONFLICT state, but 4822 communications failed prior to completion of re-integration. 4824 If the servers remained in POTENTIAL-CONFLICT while communications 4825 was interrupted, neither server would be responsive to DHCP client 4826 requests, and if one server had crashed, then there might be no 4827 server able to process DHCP requests. 4829 9.11.1. Upon entry to RESOLUTION-INTERRUPTED state 4831 When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an 4832 alarm condition to alert administrative staff of a problem in the 4833 DHCP subsystem. 4835 9.11.2. Operation in RESOLUTION-INTERRUPTED state 4837 In this state a server MUST respond to all DHCP client requests, and 4838 any load balancing (described in section 5.3) MUST NOT be used. When 4839 allocating new IP addresses, each server SHOULD allocate from its own 4840 IP address pool (if that can be determined), where the primary SHOULD 4841 allocate only FREE IP addresses, and the secondary SHOULD allocate 4842 only BACKUP IP addresses. When responding to renewal requests, each 4843 server will allow continued renewal of a DHCP client's current lease 4844 on an IP address irrespective of whether that lease was given out by 4845 the receiving server or not, although the renewal period MUST not 4846 exceed the maximum client lead time (MCLT) beyond the latest of: 1) 4847 the potential-expiration-time already acknowledged by the other 4848 server or 2) the lease-expiration-time or 3) `potential-expiration- 4849 time received from the partner server. 4851 However, since the server cannot communicate with its partner in this 4852 state, the acknowledged-potential-expiration time will not be updated 4853 in any new bindings. 4855 9.11.3. Transitions out of RESOLUTION-INTERRUPTED state 4857 If an external command is received by a server in RESOLUTION- 4858 INTERRUPTED state informing it that its partner is down, it will 4859 transition immediately into PARTNER-DOWN state. 4861 If communications is restored with the other server, then the server 4862 in RESOLUTION-INTERRUPTED state will transition into POTENTIAL- 4863 CONFLICT state. 4865 9.12. CONFLICT-DONE state 4867 This state indicates that during the process where the two servers 4868 are attempting to re-integrate with each other, the primary server 4869 has received all of the updates from the secondary server. It make a 4870 transition into CONFLICT-DONE state in order that it may be totally 4871 responsive to the client load, as opposed to NORMAL state where it 4872 would be in a "balanced" responsive state, running the load balancing 4873 algorithm. 4875 9.12.1. Upon entry to CONFLICT-DONE state 4877 A secondary server should never enter CONFLICT-DONE state. 4879 9.12.2. Operation in CONFLICT-DONE state 4881 A primary server in CONFLICT-DONE state is fully responsive to all 4882 DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED 4883 state). 4885 If communications fails, remain in CONFLICT-DONE state. If communi- 4886 cations becomes OK, remain in CONFLICT-DONE state until the condi- 4887 tions for transition out become satisfied. 4889 9.12.3. Transitions out of CONFLICT-DONE state 4891 If communications fails with the partner while in CONFLICT-DONE 4892 state, then the server will remain in CONFLICT-DONE state. 4894 When a primary server determines that the secondary server has made a 4895 transition into NORMAL state, the primary server will also transition 4896 into NORMAL state. 4898 9.13. PAUSED state 4900 This state exists to allow one server to inform another that it will 4901 be out of service for what is predicted to be a relatively short 4902 time, and to allow the other server to transition to COMMUNICATIONS- 4903 INTERRUPTED state immediately and to begin servicing all DHCP clients 4904 with no interruption in service to new DHCP clients. 4906 A server which is aware that it is shutting down temporarily SHOULD 4907 send a STATE message with the server-state option containing PAUSED 4908 state and close the TCP connection. 4910 While a server may or may not transition internally into PAUSED 4911 state, the 'previous' state determined when it is restarted MUST be 4912 the state the server was in prior to receiving the command to shut- 4913 down and restart and which precedes its entry into the PAUSED state. 4914 See section 9.3.2 concerning the use of the previous state upon 4915 server restart. 4917 9.13.1. Upon entry to PAUSED state 4919 When entering PAUSED state, the server MUST store the previous state 4920 in stable storage, and use that state as the previous state when it 4921 is restarted. 4923 9.13.2. Transitions out of PAUSED state 4925 A server makes a transition out of PAUSED state by being restarted. 4926 At that time, the previous state MUST be the state the server was in 4927 prior to entering the PAUSED state. 4929 9.14. SHUTDOWN state 4931 This state exists to allow one server to inform another that it will 4932 be out of service for what is predicted to be a relatively long time, 4933 and to allow the other server to transition immediately to PARTNER- 4934 DOWN state, and take over completely for the server going down. 4936 9.14.1. Upon entry to SHUTDOWN state 4938 When entering SHUTDOWN state, the server MUST record the previous 4939 state in stable storage for use when the server is restarted. It 4940 also MUST record the current time as the last time operational. 4942 A server which is aware that it is shutting down SHOULD send a STATE 4943 message with the server-state field containing SHUTDOWN. 4945 9.14.2. Operation in SHUTDOWN state 4947 A server in SHUTDOWN state MUST NOT respond to any DHCP client input. 4949 If a server receives any message indicating that the partner has 4950 moved to PARTNER-DOWN state while it is in SHUTDOWN state then it 4951 MUST record RECOVER state as the previous state to be used when it is 4952 restarted. 4954 A server SHOULD wait for a few seconds after informing the partner of 4955 entry into SHUTDOWN state (if communications are okay) to determine 4956 if the partner entered PARTNER-DOWN state. 4958 9.14.3. Transitions out of SHUTDOWN state 4960 A server makes a transition out of SHUTDOWN state by being restarted. 4962 10. Safe Period 4964 Due to the restrictions imposed on each server while in 4965 COMMUNICATIONS-INTERRUPTED state, long-term operation in this state 4966 is not feasible for either server. One reason that these states 4967 exist at all, is to allow the servers to easily survive transient 4968 network communications failures of a few minutes to a few days 4969 (although the actual time periods will depend a great deal on the 4970 DHCP activity of the network in terms of arrival and departure of 4971 DHCP clients on the network). 4973 Eventually, when the servers are unable to communicate, they will 4974 have to move into a state where they no longer can re-integrate 4975 without some possibility of a duplicate IP address allocation. There 4976 are two ways that they can move into this state (known as PARTNER- 4977 DOWN). 4979 They can either be informed by external command that, indeed, the 4980 partner server is down. In this case, there is no difficulty in mov- 4981 ing into the PARTNER-DOWN state since it is an accurate reflection of 4982 reality and the protocol has been designed to operate correctly (even 4983 during reintegration) as long as, when in PARTNER-DOWN state the 4984 partner is, indeed, down. 4986 The more difficult scenario is when the servers are running unat- 4987 tended for extended periods, and in this case an option is provided 4988 to configure something called a "safe-period" into each server. This 4989 OPTIONAL safe-period is the period after which either the primary or 4990 secondary server will automatically transition to PARTNER-DOWN from 4991 COMMUNICATIONS-INTERRUPTED state. If this transition is completed 4992 and the partner is not down, then the possibility of duplicate IP 4993 address allocations will exist. 4995 The goal of the "safe-period" is to allow network operations staff 4996 some time to react to a server moving into COMMUNICATIONS-INTERRUPTED 4997 state. During the safe-period the only requirement is that the net- 4998 work operations staff determine if both servers are still running -- 4999 and if they are, to either fix the network communications failure 5000 between them, or to take one of the servers down before the expira- 5001 tion of the safe-period. 5003 The length of the safe-period is installation dependent, and depends 5004 in large part on the number of unallocated IP addresses within the 5005 subnet address pool and the expected frequency of arrival of 5006 previously unknown DHCP clients requiring IP addresses. Many 5007 environments should be able to support safe-periods of several days. 5009 During this safe period, either server will allow renewals from any 5010 existing client. The only limitation concerns the need for IP 5011 addresses for the DHCP server to hand out to new DHCP clients and the 5012 need to re-allocate IP addresses to different DHCP clients. 5014 The number of "extra" IP addresses required is equal to the expected 5015 total number of new DHCP clients encountered during the safe period. 5016 This is dependent only on the arrival rate of new DHCP clients, not 5017 the total number of outstanding leases on IP addresses. 5019 In the unlikely event that a relatively short safe period of an hour 5020 is all that can be used (given a dearth of IP addresses or a very 5021 high arrival rate of new DHCP clients), even that can provide sub- 5022 stantial benefits in allowing the DHCP subsystem to ride through 5023 minor problems that could occur and be fixed within that hour. In 5024 these cases, no possibility of duplicate IP address allocation 5025 exists, and re-integration after the failure is solved will be 5026 automatic and require no operator intervention. 5028 11. Security 5030 The Failover protocol communicates DHCP lease activity and this data 5031 is generally easily discovered via other means, such as by pinging 5032 addresses and doing DNS lookups. Therefore, the need to encrypt the 5033 data over the wire is likely not great (though some sites may feel 5034 differently). 5036 However, it is very desirable to assure the integrity of failover 5037 partners and to thus ensure proper operation of the servers. For 5038 example, denial of service attacks are possible by the communication 5039 of invalid state information to one or both servers. 5041 Therefore, the Failover protocol MUST be capable of being secured by 5042 using a simple shared secret message digest which covers each mes- 5043 sage. This provides authentication of the servers, but does not pro- 5044 vide encryption of the data exchange. 5046 The Failover protocol MAY also be secured by using TLS [RFC 2246] 5047 (Transport Layer Security) if encryption of the data exchange is 5048 desired. The use of the shared secret or TLS will not protect 5049 against TCP or IP layer attacks (such as someone sending fake TCP RST 5050 segments). IPsec [RFC 2401] SHOULD be used to protect against most 5051 (if not all) of these kinds of attacks. 5053 11.1. Simple shared secret 5055 Messages between the failover partners can be authenticated through 5056 the use of a shared secret, which is never sent over the network and 5057 must be known by each server. How each server is told about this 5058 shared secret and secures its storage of the shared secret is outside 5059 the scope of this document. If a server is configured with a shared 5060 secret for a partner, it MUST send the message-digest option in ALL 5061 messages to that partner and it MUST treat any messages received from 5062 that partner without a message-digest option as failing authentica- 5063 tion and reject them with reject reason 21: "Missing message digest". 5064 Note that the message digest option MUST be the first option in the 5065 message. 5067 If a server is not configured with a shared secret for a partner, it 5068 MUST NOT send the message-digest option in any message to that 5069 partner and it MUST treat any messages received from that partner 5070 with a message-digest option as failing authentication with reject 5071 reason 13: "Message digest not configured". 5073 The shared secret is used to calculate a 16 octet message-digest 5074 which is sent in every failover message in the message-digest option. 5075 See section 12.16. The message-digest contains a one-way 16 octet 5076 HMAC-MD5 [RFC 2104] hash calculated over a stream of octets consist- 5077 ing of the entire message concatenated with the shared secret. 5079 For calculation, the message includes the message-digest option with 5080 the message-digest data zeroed (16-octets of zero). Once the calcula- 5081 tion is complete, these 16 octets of zero are replaced by the 16- 5082 octet HMAC-MD5 hash and the message is sent. 5084 For verification, the 16-octet message-digest is saved and replaced 5085 with 16-octets of zero and calculated per above. The resulting HMAC- 5086 MD5 hash is compared to the received hash and if they match, the mes- 5087 sage is assumed authenticated. 5089 A failover partner that fails to authenticate a received message or 5090 receives a message without a message-digest option when configured 5091 with a shared secret MUST close the connection immediately and take 5092 steps to notify operators. 5094 Every time a CONNECT message is received, the time at which that mes- 5095 sage was sent by the partner (i.e., the time that actually appears in 5096 the message itself) MUST be saved. If a CONNECT message is ever 5097 received containing that time or containing a time before that time, 5098 it MUST be rejected. 5100 The XID (see section 6.1) of every message received at a failover 5101 endpoint MUST be greater than that of the previous message received 5102 on that failover endpoint or the message just received MUST be 5103 rejected. 5105 A server MAY operate with arbitrary time skew between servers (see 5106 section 5.10), but when using a shared secret administrators MAY wish 5107 to configure a maximum allowable time skew between a failover server 5108 and its partner(s). Servers SHOULD allow an administrator to config- 5109 ure a maximum allowable time skew between two failover partners. 5111 11.2. TLS 5113 TLS, Transport Layer Security, as specified in [RFC 2246] MAY be 5114 used. The use of TLS would be similar to the way it is used with 5115 SMTP [RFC 2487] and IMAP/POP3/ACAP [RFC 2595]. 5117 To request the use of TLS, the primary MUST send the TLS-request 5118 option as part of the CONNECT message. The secondary receiving the 5119 TLS-request option MUST respond with a TLS-reply option indicating 5120 its acceptance or rejection of the TLS-request in the CONNECT mes- 5121 sage." 5123 If the CONNECTACK message contained a TLS-reply of 1 , then both 5124 servers immediately begin TLS negotiation. 5126 Upon completion of this negotiation, the primary server sends another 5127 CONNECT message without any TLS-request option, and must wait for a 5128 corresponding CONNECTACK. 5130 Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [RFC 2246] 5131 cipher suite is REQUIRED in Failover servers supporting TLS. This is 5132 important as it assures that any two compliant implementations can be 5133 configured to interoperate. 5135 12. Failover Options 5137 This section lists all of the options that are currently defined to 5138 be used with the failover protocol. See section 6.2 for details con- 5139 cerning time values. 5141 12.1. addresses-transferred 5143 A 32 bit unsigned long in network byte order. Reports the number of 5144 addresses transferred by the primary to the secondary server 5145 (addresses to be used for the secondary server's private address 5146 pool). 5148 Code Len Number of Addresses 5149 +-----+-----+-----+-----+----+-----+-----+-----+ 5150 | 0 | 1 | 0 | 4 | n1 | n2 | n3 | n4 | 5151 +-----+-----+-----+-----+----+-----+-----+-----+ 5153 12.2. assigned-IP-address 5155 The DHCP managed IP address to which this message refers. 5157 Code Len Address 5158 +-----+-----+-----+-----+----+-----+-----+-----+ 5159 | 0 | 2 | 0 | 4 | a1 | a2 | a3 | a4 | 5160 +-----+-----+-----+-----+----+-----+-----+-----+ 5162 12.3. binding-status 5164 This option is used to convey the current state of a binding. 5166 Code Len Type 5167 +-----+-----+-----+-----+-----+ 5168 | 0 | 3 | 0 | 1 | 1-7 | 5169 +-----+-----+-----+-----+-----+ 5171 Legal values for this option are: 5173 Value Binding Status 5174 ----- ------------------------------------------------ 5175 1 FREE Lease is currently available to the primary 5176 2 ACTIVE Lease is assigned to a client 5177 3 EXPIRED Lease has expired 5178 4 RELEASED Lease has been released by client 5179 5 ABANDONED A server, or client flagged address as unusable 5180 6 RESET Lease was freed by some external agent 5181 7 BACKUP Lease belongs to secondary's private address pool 5183 12.4. client-identifier 5185 This is the client-identifier for the client associated with a 5186 binding. The client-identifier data is subject to the same 5187 conventions as DHCP option 81 [RFC 2132]. 5189 Code Len Client Identifier 5190 +-----+-----+-----+-----+----+-----+--- 5191 | 0 | 4 | 0 | n | i1 | i2 | ... 5192 +-----+-----+-----+-----+----+-----+-- 5194 12.5. client-hardware-address 5196 This is the hardware address for the client associated with a 5197 binding. Byte t1 (type) MUST be set to the proper ARP hardware 5198 address code, as defined in the ARP section of RFC 1700 (it MUST NOT 5199 be zero!) 5201 Code Len htype chaddr 5202 +-----+-----+-----+-----+----+-----+-----+--- 5203 | 0 | 5 | 0 | n | t1 | c1 | c2 | ... 5204 +-----+-----+-----+-----+----+-----+-----+--- 5206 12.6. client-last-transaction-time 5208 The time at which this server last received a DHCP request from a 5209 particular client expressed as an absolute time (see section 6.2). 5211 Code Len client last transaction time 5212 +-----+-----+-----+-----+----+-----+-----+-----+ 5213 | 0 | 6 | 0 | 4 | t1 | t2 | t3 | t4 | 5214 +-----+-----+-----+-----+----+-----+-----+-----+ 5216 12.7. client-reply-options 5218 This option contains options from a DHCP server's reply to a DHCP 5219 client request. It is sent in a BNDUPD message. The first 4 bytes 5220 of the option contain the "magic number" of the option area from 5221 which the DHCP reply options were taken and serves to define the 5222 format of the rest of the sub-options contained in this option. 5223 After the magic number, the options included are in the normal 5224 options format appropriate for that magic number. 5226 A server SHOULD NOT include all of the options in a DHCP server's 5227 reply to a client's request in this option, but rather a server 5228 SHOULD include only those options which are of likely interest to its 5229 partner server. See section 7.1 for details. 5231 Code Len Magic Number Embedded options 5232 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5233 | 0 | 7 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 5234 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5236 12.8. client-request-options 5238 This option contains options from a DHCP client's request. It is 5239 sent in a BNDUPD message. The first 4 bytes of the option contain 5240 the "magic number" of the option area from which the DHCP client's 5241 request options were taken and serves to define the format of the 5242 rest of the sub-options contained in this option. After the magic 5243 number, the options included are in the normal options format 5244 appropriate for that magic number. 5246 A server SHOULD NOT include all of the options in a DHCP client 5247 request in this option, but rather a server SHOULD include only those 5248 options which are of likely interest to its partner server. See 5249 section 7.1 for details. 5251 Code Len Magic Number Embedded options 5252 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5253 | 0 | 8 | 0 | n | m1 | m2 | m3 | m4 | b1 | b2 | ... 5254 +-----+-----+-----+-----+----+----+----+----+----+----+-- 5256 12.9. DDNS 5258 If an implementation supports Dynamic DNS updates, this option is 5259 used to communicate the status of the DDNS update associated with a 5260 particular lease binding. The Flags field conveys the types of DNS 5261 RRs that are to be updated by the DHCP server, and the status of the 5262 DDNS update. The Domain Name field conveys the DNS FQDN that the 5263 DHCP server is using to refer to the client, in DNS encoding as 5264 specified in [RFC 1035]. 5266 Code Len Flags Domain Name 5267 +-----+-----+-----+-----+-----+------+------+-----+------ 5268 | 0 | 9 | 0 | n | flags | d1 | d2 | ... 5269 +-----+-----+-----+-----+-----+------+------+-----+------ 5271 The Flags field is a 16-bit field; several bit positions are 5272 specified here. 5274 1 1 1 1 1 1 5275 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 5276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5277 |C|A|D|P| MBZ | 5278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5280 The bits (numbered from the least-significant bit in network 5281 byte-order) are used as follows: 5283 0 (C): name to address (such as A RR) update successfully completed 5284 1 (A): Server is controlling A RR on behalf of the client 5285 2 (D): address to name (such as PTR RR) update successfully completed (Done) 5286 3 (P): Server is controlling PTR RR on behalf of the client 5287 4-15 : Must be zero 5289 All of the unspecified bit positions SHOULD be set to 0 by servers 5290 sending the Failover-DDNS option, and they MUST be ignored by servers 5291 receiving the option. 5293 12.10. delayed-service-parameter 5295 The delayed-service-parameter is an optional load balancing tuning 5296 parameter, defined in [RFC 3074]. If it is used, it MUST be sent in 5297 the same message as the hash-bucket-assignment option (see section 5298 12.11). 5300 Format : 5302 Code Len Seconds 5303 +-----+-----+-----+-----+----+ 5304 | 0 | 10 | 0 | 1 | S | 5305 +-----+-----+-----+-----+----+ 5307 S is a one byte value, 1..255. 5309 12.11. hash-bucket-assignment 5311 A set of load balancing hash values for the secondary server. A one 5312 bit in the hash buckets indicates that the secondary is to service 5313 that set of clients. See section 5.3 for more information on how 5314 this option is used. This option is only sent from the primary to 5315 the secondary. 5317 The format and usage of the data in this option is defined in [RFC 5318 3074]. 5320 Code Len Hash Buckets 5321 +-----+-----+-----+-----+-----+-----+-----+-----+ 5322 | 0 | 11 | 0 | 32 | b1 | b2 | ... | b32 | 5323 +-----+-----+-----+-----+-----+-----+-----+-----+ 5325 12.12. IP-flags 5327 This option is used to convey the current flags of the assigned-IP- 5328 address option preceding it. 5330 Code Len IP Flags 5331 +-----+-----+-----+-----+-----+-----+ 5332 | 0 | 12 | 0 | 1 | f1 | f2 | 5333 +-----+-----+-----+-----+-----+-----+ 5335 The IP-flags field is a 16-bit field; two bit positions are 5336 specified here. 5338 1 1 1 1 1 1 5339 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 5340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5341 |R|B| MBZ | 5342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5344 The bits (numbered from the least-significant bit in network 5345 byte-order) are used as follows: 5347 0 (R): RESERVED (this bit allocated and in use and named "RESERVED") 5348 Bit 0 MUST be set to 1 whenever the IP address in the preceding 5349 assigned-IP-address option is reserved on the server sending the 5350 packet. 5351 1 (B): BOOTP 5352 Bit 1 MUST be set to 1 whenever the IP address in the preceding 5353 assigned-IP-address option is a an IP address which has been 5354 allocated due to an interaction with a BOOTP client (as opposed 5355 to a DHCP client). 5356 2-15 : Must be zero 5358 12.13. lease-expiration-time 5360 The lease expiration time is the lease interval that a DHCP server 5361 has ACKed to a DHCP client added to the time at which that ACK was 5362 transmitted -- expressed as an absolute time (see section 6.2). 5364 Code Len Time 5365 +-----+-----+-----+-----+----+-----+-----+-----+ 5366 | 0 | 13 | 0 | 4 | t1 | t2 | t3 | t4 | 5367 +-----+-----+-----+-----+----+-----+-----+-----+ 5369 12.14. max-unacked-bndupd 5371 The maximum number of BNDUPD message that this server is prepared to 5372 accept over the TCP connection without causing the TCP connection to 5373 block. A 32 bit unsigned integer value, in network byte order. 5375 Code Len Maximum Unacked BNDUPD 5376 +-----+-----+-----+-----+----+-----+-----+-----+ 5377 | 0 | 14 | 0 | 4 | n1 | n2 | n3 | n4 | 5378 +-----+-----+-----+-----+----+-----+-----+-----+ 5380 12.15. MCLT 5382 Maximum Client Lead Time, an interval, in seconds. A 32 bit unsigned 5383 integer value, in network byte order. 5385 Code Len Time 5386 +-----+-----+-----+-----+----+-----+-----+-----+ 5387 | 0 | 15 | 0 | 4 | t1 | t2 | t3 | t4 | 5388 +-----+-----+-----+-----+----+-----+-----+-----+ 5390 12.16. message 5392 This option is used to supply a human readable message text. It may 5393 be used in association with the Reject Reason Code to provide a human 5394 readable error message for the reject. 5396 Code Len Text 5397 +-----+-----+-----+-----+------+-----+-- 5398 | 0 | 16 | 0 | n | c1 | c2 | ... 5399 +-----+-----+-----+-----+------+-----+-- 5401 12.17. message-digest 5403 The message digest for this message. 5405 This option consists of a variable number of bytes which contain the 5406 message digest of the message prior to the inclusion of this option. 5408 When this option appears in a message, it MUST appear as the first 5409 option in the message. It MUST appear in every message if message 5410 digests are required. The Type MUST be configurable (once additional 5411 types are defined). When additional types are defined, they MUST be 5412 specified as either optional (MAY be supported) or required (MUST be 5413 supported). See the section on IANA considerations for more details. 5415 Code Len Type Message Digest 5416 +-----+-----+-----+-----+-----+-----+-----+-- 5417 | 0 | 17 | 0 | n | t | d1 | d2 | ... 5418 +-----+-----+-----+-----+-----+-----+-----+-- 5420 Type: 0 Not Allowed 5421 1 HMAC-MD5 5422 2-255 Not Allowed 5424 12.18. potential-expiration-time 5426 The potential expiration time is the time that one server tells 5427 another server that it may wish to grant in a lease to a DHCP client. 5428 It is an absolute time. See section 6.2. 5430 Code Len Time 5431 +-----+-----+-----+-----+----+-----+-----+-----+ 5432 | 0 | 18 | 0 | 4 | t1 | t2 | t3 | t4 | 5433 +-----+-----+-----+-----+----+-----+-----+-----+ 5435 12.19. receive-timer 5437 The number of seconds (an interval) within which the server must 5438 receive a message from its partner, or it will assume that 5439 communications with the partner is not ok. An unsigned 32 bit 5440 integer in network byte order. 5442 Code Len Receive Timer 5443 +-----+-----+-----+-----+----+-----+-----+-----+ 5444 | 0 | 19 | 0 | 4 | s1 | s2 | s3 | s4 | 5445 +-----+-----+-----+-----+----+-----+-----+-----+ 5447 12.20. protocol-version 5449 The protocol version being used by the server. It is only sent in the 5450 CONNECT and CONNECTACK messages. The current value for the version 5451 is 1. 5453 Code Len Version 5454 +-----+-----+-----+-----+-----+ 5455 | 0 | 20 | 0 | 1 | 1 | 5456 +-----+-----+-----+-----+-----+ 5458 12.21. reject-reason 5460 This option is used to selectively reject binding updates. It MAY be 5461 used in a BNDACK message or a CONNECTACK message, always associated 5462 with an assigned-IP-address option, which contains the IP address of 5463 the update being rejected. 5465 Code Len Reason Code 5466 +-----+-----+-----+-----+-----+ 5467 | 0 | 21 | 0 | 1 | R1 | 5468 +-----+-----+-----+-----+-----+ 5470 Reason codes (section where referenced in parentheses): 5472 0 Reserved 5473 1 Illegal IP address (not part of any address pool). (7.1.3) 5474 2 Fatal conflict exists: address in use by other client. (7.1.3) 5475 3 Missing binding information. (7.1.3) 5476 4 Connection rejected, time mismatch too great. (7.8.2) 5477 5 Connection rejected, invalid MCLT. (7.8.2) 5478 6 Connection rejected, unknown reason. (not specifically referenced) 5479 7 Connection rejected, duplicate connection. (unused) 5480 8 Connection rejected, invalid failover partner. (7.8.2) 5481 9 TLS not supported. (7.8.2) 5482 10 TLS supported but not configured. (7.8.2) 5483 11 TLS required but not supported by partner. (7.8.2) 5484 12 Message digest not supported. (11.1) 5485 13 Message digest not configured. (11.1) 5486 14 Protocol version mismatch. (7.8.2) 5487 15 Outdated binding information. (7.1.3) 5488 16 Less critical binding information. (7.1.3) 5489 17 No traffic within sufficient time. (8.6) 5490 18 Hash bucket assignment conflict. (7.8.2) 5491 19 IP not reserved on this server. (7.1.3) 5492 20 Message digest failed to compare. (7.8.2) 5493 21 Missing message digest. (7.1.3) 5494 22-253, reserved. 5495 254 Unknown: Error occurred but does not match any reason code. 5496 255 Reserved for code expansion. 5498 12.22. relationship-name 5500 A string which is a unique identifier for the failover relationship. 5502 Code Len Relationship Name 5503 +-----+-----+-----+-----+----+-----+--- 5504 | 0 | 22 | 0 | n | c1 | c2 | ... 5505 +-----+-----+-----+-----+----+-----+--- 5507 12.23. server-flags 5509 This option is used to convey the current flags of the failover 5510 endpoint in the sending server. 5512 Code Len Server Flags 5513 +-----+-----+-----+-----+-------+ 5514 | 0 | 23 | 0 | 1 | flags | 5515 +-----+-----+-----+-----+-------+ 5517 The flags field is an 8-bit field; one bit position is 5518 specified here. 5520 0 1 2 3 4 5 6 7 5521 +-+-+-+-+-+-+-+-+ 5522 |S| MBZ | 5523 +-+-+-+-+-+-+-+-+ 5525 The bits (numbered from the least-significant bit in network 5526 byte-order) are used as follows: 5528 0 (S): STARTUP, 5529 Bit 0 MUST be set to 1 whenever the server is in STARTUP state, 5530 and set to 0 otherwise. (Note that when in STARTUP state, the 5531 state transmitted in the server-state option is usually the last 5532 recorded state from stable storage, but see section 9.3 for 5533 details.) 5534 1-7 : Must be zero 5536 12.24. server-state 5538 This option is used to convey the current state of the failover 5539 endpoint in the sending server. 5541 Code Len Server State 5542 +-----+-----+-----+-----+-----+ 5543 | 0 | 24 | 0 | 1 | 1-9 | 5544 +-----+-----+-----+-----+-----+ 5546 Legal values for this option are: 5548 Value Server State 5549 ----- ------------------------------------------------------------- 5550 0 reserved 5551 1 STARTUP Startup state (1) 5552 2 NORMAL Normal state 5553 3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe) 5554 4 PARTNER-DOWN Partner down (unsafe mode) 5555 5 POTENTIAL-CONFLICT Synchronizing 5556 6 RECOVER Recovering bindings from partner 5557 7 PAUSED Shutting down for a short period. 5558 8 SHUTDOWN Shutting down for an extended 5559 period. 5560 9 RECOVER-DONE Interlock state prior to NORMAL 5561 10 RESOLUTION-INTERRUPTED Comm. failed during resolution 5562 11 CONFLICT-DONE Primary has resolved its conflicts 5564 (1) The STARTUP state is never sent to the partner server, it is 5565 indicated by the STARTUP bit in the server-flags options (see section 5566 12.22). 5568 12.25. start-time-of-state 5570 This option is used for different states in different messages. In a 5571 BNDUPD message it represents the start time of the state of the lease 5572 in the BNDUPD message. In a STATE message, it represents the start 5573 time of the partner server's failover state. In all cases it is an 5574 absolute time. 5576 Code Len Start Time of State 5577 +-----+-----+-----+-----+----+-----+-----+-----+ 5578 | 0 | 25 | 0 | 4 | t1 | t2 | t3 | t4 | 5579 +-----+-----+-----+-----+----+-----+-----+-----+ 5581 12.26. TLS-reply 5583 This option contains information relating to TLS security 5584 negotiation. It is sent in a CONNECTACK message 5586 A t1 value of 0 indicates no TLS operation, a value of 1 indicates 5587 that TLS operation is required. 5589 Code Len TLS 5590 +-----+-----+-----+-----+-----+ 5591 | 0 | 26 | 0 | 1 | t1 | 5592 +-----+-----+-----+-----+-----+ 5594 12.27. TLS-request 5596 This option contains information relating to TLS security 5597 negotiation. It is sent in a CONNECT message. 5599 The t1 byte is the TLS request from the primary server. A value of 0 5600 indicates no TLS operation (to communicate the secondary server MUST 5601 NOT require TLS), a value of 1 indicates that TLS operation is 5602 desired but not required (to communicate, the secondary server MAY 5603 utilize TLS), and a value of 2 indicates that TLS operation is 5604 required (to communicate the secondary server MUST utilize TLS) to 5605 establish communications with the primary server. 5607 Code Len TLS 5608 +-----+-----+-----+-----+-----+ 5609 | 0 | 27 | 0 | 1 | t1 | 5610 +-----+-----+-----+-----+-----+ 5612 12.28. vendor-class-identifier 5614 A string which identifies the vendor of the failover protocol 5615 implementation. 5617 Code Len vendor class string 5618 +-----+-----+-----+-----+----+-----+--- 5619 | 0 | 28 | 0 | n | c1 | c2 | ... 5620 +-----+-----+-----+-----+----+-----+--- 5622 12.29. vendor-specific-options 5624 This option is used to convey options specific to a particular 5625 vendor's implementation. The vendor class identifier is used to 5626 specify which option space the embedded options are drawn from. 5627 Every message that uses vendor specific options MUST have a vendor- 5628 class-identifier option in it. 5630 It functions similarly to the vendor class identifier and vendor 5631 specific options in the DHCP protocol. 5633 This option contains other options in the same two byte code, two 5634 byte length format. If this option appears in a message without a 5635 corresponding vendor class identifier, it MUST be ignored. 5637 Code Len Embedded options 5638 +-----+-----+-----+-----+----+-----+--- 5639 | 0 | 29 | 0 | n | c1 | c2 | ... 5640 +-----+-----+-----+-----+----+-----+--- 5642 13. IANA Considerations 5644 This document defines several number spaces (failover options, fail- 5645 over message types, message digest types, and failover reject reason 5646 codes). For all of these number spaces, certain values are defined in 5647 this specification. New values may only be defined by IETF Con- 5648 sensus, as described in [RFC 2434]. Basically, this means that they 5649 are defined by RFCs approved by the IESG. 5651 14. Acknowledgments 5653 Ralph Droms started it all, by sketching out an initial interserver 5654 draft that embodied ideas from several past IETF meetings. In that 5655 draft, he acknowledged contributions by Jeff Mogul, Greg Minshall, 5656 Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group. 5658 Kim Kinnear and Bob Cole each extended that draft, separately and 5659 then together, until they created an interserver draft that supported 5660 any number of servers. The complexity of that approach was just too 5661 great, and that draft wasn't greeted with enthusiasm by many, includ- 5662 ing its authors. 5664 It did however lead to a much simpler approach embodied in the first 5665 Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph 5666 Droms. This draft posited only two servers -- a primary and a secon- 5667 dary. 5669 Kim Kinnear then wrote the Safe Failover draft to layer on top of the 5670 Failover Draft and increase its robustness in the face of certain 5671 rare network failures. 5673 At the spring 1998 IETF meeting in LA, the DHC working group said 5674 that they wanted a merged Failover and Safe Failover draft. Steve 5675 Gonczi and Bernie Volz stepped up and produced the raw material for 5676 such a merged draft, along with a new message format designed around 5677 DHCP options and other extensions and clarifications. Kim Kinnear 5678 edited their work into draft format and made other changes in time 5679 for the Summer Chicago IETF meeting. 5681 Many people have reviewed the various earlier drafts that went into 5682 this result. At American Internet, ideas were contributed by Brad 5683 Parker. At Cisco Systems Paul Fox and Ellen Garvey contributed to 5684 the design of the protocol. 5686 During the summer and fall of 1998, two groups worked on separate 5687 implementations of the UDP failover draft. Bernie Volz and Steve 5688 Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul 5689 Fox made up the other. These two groups worked together to produce 5690 considerable changes and simplifications of the protocol during that 5691 period, and Steve Gonczi and Kim Kinnear edited those changes into 5692 -03 draft in time for submission to the December 1998 Orlando IETF 5693 meeting. 5695 In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting of 5696 people interested in the failover draft. During that meeting a gen- 5697 eral agreement was reached to recast the failover protocol to use TCP 5698 instead of UDP. In addition, the group together brainstormed a work- 5699 able load-balancing technique. Kim Kinnear rewrote the entire draft 5700 to include the changes made at that meeting as well as to restructure 5701 the draft along guidelines suggested by Thomas Narten. The result 5702 was the -04 draft, submitted prior to the Oslo IETF meeting. 5704 The initial idea for a hash-based load balancing approach was offered 5705 by Ted Lemon, and the determination of an algorithm and its integra- 5706 tion into the draft was done by Steve Gonczi. The security section 5707 was spearheaded by Bernie Volz. Both contributed considerably to the 5708 ideas and text in the rest of the draft with several reviews. 5710 In early October of 1999, three conference calls were held to discuss 5711 the -04 draft. The -05 includes changes as a result of those calls, 5712 perhaps the largest of which was to remove the load balancing 5713 approach into a separate draft. Thanks to all of the many people 5714 who participated in the conference calls. Changes were made because 5715 of contributions by: Ted Lemon, David Erdmann, Richard Jones, Rob 5716 Stevens, Thomas Narten, Diana Lane, and Andre Kostur. 5718 Another conference call was held in mid-January of 2000, and the -06 5719 draft was produced to tighten up the the -05 draft both technically 5720 as well as editorially. 5722 The -07 draft was edited by Kim Kinnear and was based in part on 5723 reviews by Richard Jones, Bernie Volz, and Steve Gonczi. It embodies 5724 several technical updates as well as numerous editorial revisions 5725 that enhanced both correctness as well as clarity. 5727 The -08 draft was edited by Kim Kinnear and was based on the results 5728 of two conference calls held in October and November of 2000. It 5729 includes the correct second port number, a new state to synchronize 5730 conflict resolution with load balancing, a generally accepted 5731 approach to secondary pool allocation, and many other updates based 5732 on both operational as well as implementation experience. 5734 This, the -09 draft was edited by Kim Kinnear based on discussions 5735 held at the Minneapolis IETF in December of 2000, as well as issues 5736 raised by Ted Lemon based on implementation and deployment. The 5737 specific changes were mailed to the dhcp-v4 list. 5739 These most recent changes have not been widely circulated among the 5740 other authors prior to submission to the IETF. 5742 Glenn Waters of Nortel Networks contributed ideas and enthusiasm to 5743 make a Failover protocol that was both "safe" and "lazy". 5745 15. References 5747 [DHCID] Stapp, M., Lemon, T., Gustafsson, A., "draft-ietf-dnsext- 5748 dhcid-rr-02.txt", March, 2001. 5750 [DNSRES] Stapp, M., "draft-ietf-dhc-dns-resolution-01.txt", March, 5751 2001. 5753 [FQDN] Rekhter, Y., Stapp, M., "draft-ietf-dhc-fqdn-option-01.txt", 5754 March, 2001. 5756 [RFC 1035] Mockapetris, P., "Domain Names - Implementation and 5757 Specification", November, 1987. 5759 [RFC 1534] Droms, R., "Interoperation between DHCP and BOOTP", RFC 5760 1534, October 1993. 5762 [RFC 2104] Krawczyk, H., Bellare, M., and Canetti, R., "HMAC: Keyed 5763 Hashing for Message Authentication", RFC 2104, IBM T.J. Watson 5764 Research Center, University of California at San Diego, February 5765 1997. 5767 [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate 5768 Requirement Levels", RFC 2119. 5770 [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 5771 2131, March 1997. 5773 [RFC 2132] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor 5774 Extensions", Internet RFC 2132, March 1997. 5776 [RFC 2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic 5777 Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April 5778 1997 5780 [RFC 2139] Rigney, C., "Radius Accounting", RFC 2139, Livingston 5781 Enterprises, April 1997. 5783 [RFC 2246] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, 5784 January 1999. 5786 [RFC 2401] Kent, S., Atkinson, R., "Security Architecture for the 5787 Internet Protocol", RFC 2401, November 1998. 5789 [RFC 2434] Alvestrand, H. and T. Narten, "Guidelines for Writing an 5790 IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 5791 1998. 5793 [RFC 2487] Hoffman, P., "SMTP Service Extension for Secure SMTP over 5794 TLS", RFC 2487, January 1999. 5796 [RFC 2595] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC 5797 2595, June 1999. 5799 [RFC 3004] Stump, G., Droms, R., Gu, Y., Vyaghrapuri, R., Demirtjis, 5800 A., Privat, J. "The User Class Option for DHCP", November 2000. 5802 [RFC 3011] Waters, G., "The IPv4 Subnet Selection Option for DHCP", 5803 November 2000. 5805 [RFC 3046] Patrick, M., "DHCP Relay Agent Information Option", RFC 5806 3046, January 2001. 5808 [RFC 3074] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "DHC Load- 5809 balancing Algorithm", February, 2001. 5811 16. Author's information 5813 Ralph Droms 5814 Kim Kinnear 5815 Mark Stapp 5816 Cisco Systems 5817 250 Apollo Drive 5818 Chelmsford, MA 01824 5820 Phone: (978) 244-8000 5822 EMail: rdroms@cisco.com 5823 kkinnear@cisco.com 5824 mjs@cisco.com 5826 Bernie Volz 5827 Ericsson 5828 959 Concord St. 5829 Framingham, MA 01701 5831 Phone: +1-617-513-9060 5833 EMail: bernie.volz@ericsson.com 5835 Steve Gonczi 5836 Network Engines, Inc. 5837 25 Dan Road 5838 Canton, MA 02021-2817 5840 Phone: (781) 332-1165 5842 Email: steve.gonczi@networkengines.com 5844 Greg Rabil, Mike Dooley, Arun Kapur 5845 Lucent Technologies 5846 400 Lapp Road 5847 Malvern, PA 19355 5849 Phone: (800) 208-2747 5850 EMail: grabil@lucent.com 5851 mdooley@lucent.com 5852 akapur@lucent.com 5854 17. Full Copyright Statement 5856 Copyright (C) The Internet Society (2000). All Rights Reserved. 5858 This document and translations of it may be copied and furnished to oth- 5859 ers, and derivative works that comment on or otherwise explain it or 5860 assist in its implementation may be prepared, copied, published and dis- 5861 tributed, in whole or in part, without restriction of any kind, provided 5862 that the above copyright notice and this paragraph are included on all 5863 such copies and derivative works. However, this document itself may not 5864 be modified in any way, such as by removing the copyright notice or 5865 references to the Internet Society or other Internet organizations, 5866 except as needed for the purpose of developing Internet standards in 5867 which case the procedures for copyrights defined in the Internet Stan- 5868 dards process must be followed, or as required to translate it into 5869 languages other than English. 5871 The limited permissions granted above are perpetual and will not be 5872 revoked by the Internet Society or its successors or assigns. 5874 This document and the information contained herein is provided on an "AS 5875 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 5876 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 5877 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 5878 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT- 5879 NESS FOR A PARTICULAR PURPOSE.