idnits 2.17.1 draft-ietf-dhc-failover-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 100 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 16 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC2131]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1171 has weird spacing: '... of all of th...' == Line 1233 has weird spacing: '...eserved not...' == Line 1716 has weird spacing: '... Len reque...' == Line 4115 has weird spacing: '...ore the expir...' == Line 4197 has weird spacing: '...'s hash algor...' == (1 more instance...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests, and the algorithm for load balancing described in section 5.3 MUST NOT be used. When allocating new IP addresses, each server allocates from its own IP address pool, where the primary MUST allocate only FREE IP addresses, and the secondary MUST allocate only BACKUP IP addresses. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease on an IP address irrespec-tive of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the potential-expiration-time already ack-nowledged by the other server or the lease-expiration-time or potential-expiration-time received from the partner server. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1999) is 8889 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 588 == Missing Reference: 'IPAMTLS' is mentioned on line 4174, but not defined -- Looks like a reference, but probably isn't: '256' on line 4207 == Unused Reference: 'RFC 2132' is defined on line 4329, but no explicit reference was found in the text == Unused Reference: 'IMAPTLS' is defined on line 4338, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2487 (ref. 'SMTPTLS') (Obsoleted by RFC 3207) -- Possible downref: Non-RFC (?) normative reference: ref. 'NAMESPACE' -- Possible downref: Non-RFC (?) normative reference: ref. 'DDNS' Summary: 10 errors (**), 0 flaws (~~), 13 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Ralph Droms 2 INTERNET DRAFT Bucknell University 4 Kim Kinnear 5 Mark Stapp 6 Cisco Systems 8 Bernie Volz 9 Steve Gonczi 10 Process Software 12 Greg Rabil 13 Mike Dooley 14 Arun Kapur 15 Quadritek Systems 17 June 1999 18 Expires December 1999 20 DHCP Failover Protocol 21 23 Status of this Memo 25 This document is an Internet-Draft and is in full conformance with 26 all provisions of Section 10 of RFC2026. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet- Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 Copyright Notice 46 Copyright (C) The Internet Society (1999). All Rights Reserved. 48 Abstract 50 DHCP [RFC 2131] allows for multiple servers to be operating on a 51 single network. Some sites are interested in running multiple servers 52 in such a way so as to provide redundancy in case of server failure. 53 In order for this to work reliably, the cooperating primary and 54 secondary servers must maintain a consistent database of the lease 55 information. This implies that servers will need to coordinate any 56 and all lease activity so that this information is synchronized in 57 case of failover. 59 This document defines a protocol to provide this synchronization 60 between two servers. One server is designated the "primary" server, 61 the other is the "secondary" server. Additionally, this document 62 describes a protocol which allows each server to determine to which 63 DHCP clients it should provide service when both servers are 64 operating in order to support load balancing as well as when on one 65 server has failed in order to support increased DHCP service 66 availability. 68 This document is a complete rewrite of draft-ietf-dhc-failover- 69 03.txt. That earlier draft described a UDP based failover protocol, 70 and this draft describes a closely related protocol which uses TCP as 71 a transport and includes new load-balancing and security 72 capabilities. 74 Table of Contents 76 1. Introduction................................................. 4 77 2. Terminology.................................................. 5 78 2.1. Requirements terminology................................... 5 79 2.2. DHCP and failover terminology.............................. 5 80 3. Background and External Requirements......................... 7 81 3.1. Key aspects of the DHCP protocol........................... 7 82 3.2. BOOTP relay agent implementation........................... 9 83 3.3. What does it mean if a server can't communicate with its partner? 84 10 85 3.4. Challenging scenarios for a Failover protocol............. 10 86 3.5. Using TCP to detect partner server failure................ 11 87 4. Design Goals................................................ 13 88 4.1. Design requirements for this protocol..................... 13 89 4.2. Goals for this protocol................................... 13 90 4.3. Limitations of this Protocol.............................. 14 91 5. Protocol Overview........................................... 15 92 5.1. Messages and States....................................... 15 93 5.2. Fundamental restrictions.................................. 18 94 5.3. Load balancing............................................ 24 95 5.4. Operating in NORMAL state................................. 25 96 5.5. Operating in COMMUNICATIONS-INTERRUPTED state............. 25 97 5.6. Operating in PARTNER-DOWN state........................... 25 98 5.7. Operating in RECOVER state................................ 26 99 6. Packet Formats.............................................. 26 100 6.1. Common message format..................................... 26 101 6.2. Common option format...................................... 28 102 6.3. BNDUPD message format..................................... 40 103 6.4. BNDACK message format..................................... 42 104 6.5. Bulking for BNDUPD and BNDACK messages.................... 44 105 6.6. UPDREQ message format..................................... 44 106 6.7. UPDREQALL message format.................................. 44 107 6.8. UPDDONE message format.................................... 44 108 6.9. POOLREQ message format.................................... 45 109 6.10. POOLRESP message format.................................. 45 110 6.11. CONNECT message format................................... 46 111 6.12. CONNECTACK message format................................ 46 112 6.13. STATE message format..................................... 47 113 6.14. CONTACT message format................................... 48 114 7. Protocol Messages........................................... 48 115 7.1. BNDUPD message............................................ 48 116 7.2. BNDACK message............................................ 57 117 7.3. UPDREQ message............................................ 58 118 7.4. UPDREQALL message......................................... 59 119 7.5. UPDDONE message........................................... 60 120 7.6. POOLREQ message........................................... 60 121 7.7. POOLRESP message.......................................... 61 122 7.8. CONNECT message........................................... 62 123 7.9. CONNECTACK message........................................ 65 124 7.10. STATE message............................................ 68 125 7.11. CONTACT message.......................................... 69 126 8. Connection Management....................................... 70 127 8.1. Connection granularity.................................... 70 128 8.2. Creating the TCP connection............................... 70 129 8.3. Using the TCP connection for determining communications status. 71 130 8.4. Using the TCP connection for binding data................. 73 131 8.5. Using the TCP connection for control messages............. 73 132 8.6. Losing the TCP connection................................. 73 133 9. Protocol States............................................. 73 134 9.1. Server Initialization..................................... 74 135 9.2. Server State Transitions.................................. 74 136 9.3. STARTUP state............................................. 77 137 9.4. PARTNER-DOWN state........................................ 79 138 9.5. RECOVER state............................................. 81 139 9.6. NORMAL state.............................................. 83 140 9.7. COMMUNICATIONS-INTERRUPTED State.......................... 86 141 9.8. POTENTIAL-CONFLICT state.................................. 89 142 9.9. RECOVER-DONE state........................................ 90 143 9.10. PAUSED state............................................. 91 144 9.11. SHUTDOWN state........................................... 91 145 10. Safe Period................................................ 92 146 11. Security................................................... 94 147 11.1. Simple shared secret..................................... 94 148 11.2. TLS...................................................... 94 149 12. Hash algorithm for load balancing.......................... 95 150 13. Acknowledgments............................................ 96 151 14. References................................................. 97 152 15. Author's information....................................... 98 153 16. Full Copyright Statement................................... 99 155 1. Introduction 157 DHCP [RFC 2131] allows for multiple servers to be operating on a sin- 158 gle network. Some sites are interested in running multiple servers 159 in such a way so as to provide redundancy in case of server failure 160 since the DHCP subsystem is in many cases a critical part of the net- 161 work infrastructure. 163 This document defines a protocol to provide synchronization between 164 two servers in order that each can take over for the other should 165 either one fail or become unreachable. 167 One server is designated the "primary" server, the other is the 168 "secondary" server, and all DHCP client requests are sent to each 169 server. 171 In order to provide a high availability DHCP service, these 172 cooperating primary and secondary servers must maintain a consistent 173 database of lease information. This implies that servers will need 174 to coordinate any and all lease activity so that this information is 175 synchronized in case failover is required. The protocol messages and 176 processing techniques required to maintain a consistent database are 177 specified in the protocol described here. 179 The failover protocol also contains an algorithm which allows each 180 server to determine to which DHCP clients it should provide service 181 when both servers are operating normally, and this capability can be 182 used to support load balancing. 184 2. Terminology 186 This section discusses both the generic requirements terminology com- 187 mon to many IETF protocol specifications as well as specialized DHCP 188 and failover protocol specific terminology. 190 2.1. Requirements terminology 192 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 193 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 194 document are to be interpreted as described in RFC 2119 [RFC 2119]. 196 2.2. DHCP and failover terminology 198 This document uses the following terms: 200 o "DHCP client" or "client" 202 A DHCP client is an Internet host using DHCP to obtain confi- 203 guration parameters such as a network address. 205 o "DHCP server" or "server" 207 A DHCP server is an Internet host that returns configuration 208 parameters to DHCP clients. 210 o "binding" 212 A binding is a collection of configuration parameters, including 213 at least an IP address, associated with or "bound to" a DHCP 214 client. Bindings are managed by DHCP servers. 216 o "binding database" 218 The collection of bindings managed by a primary and secondary. 220 o "failover endpoint" 222 The failover protocol allows for there to be a unique failover 223 endpoint per partner per role (where role is primary or secon- 224 dary). This failover endpoint can take actions and hold unique 225 states. There are thus a maximum of two failover endpoints per 226 server per partner (one for each partner as a primary and one 227 for that same partner as a secondary.) 229 o "lazy update" 230 Lazy update refers to the requirement placed on a server imple- 231 menting a failover protocol to update its failover partner when- 232 ever the binding database changes. A failover protocol which 233 didn't support lazy update would require the failover partner 234 update to be complete before a DHCP server could respond to a 235 DHCP client request with a DHCPACK. A failover protocol which 236 does support lazy update places no such restriction on the 237 update of the failover partner server, and so a server can allo- 238 cate an IP address or extend a lease on an IP address and then 239 update its failover partner as time permits. A failover proto- 240 col which supports lazy update not only removes the requirement 241 to update the failover partner prior to responding to a DHCP 242 client with a DHCPACK, but also allows gathering up batches of 243 updates from one failover server to its partner. 245 o "subnet address pool" 247 A subnet address pool is the set of IP address which is associ- 248 ated with a particular network number and subnet mask. In the 249 simple case, there is a single network number and subnet mask 250 and a set of IP addresses. In the more complex case (sometimes 251 called "secondary subnets", sometimes "superscopes"), several 252 (apparently unrelated) network number and subnet mask combina- 253 tions with their associated IP addresses may all be configured 254 together into one subnet address pool. 256 o "Primary server" or "Primary" 258 A DHCP server configured to provide primary service to a set of 259 DHCP clients for a particular set of subnet address pools. 261 o "Secondary server" or "Secondary" 263 A DHCP server configured to act as backup to a primary server 264 for a particular set of subnet address pools. 266 o "stable storage" 268 Every DHCP server is assumed to have some form of what is called 269 "stable storage". Stable storage is used to hold information 270 concerning IP address bindings (among other things) so that this 271 information is not lost in the event of a server failure which 272 requires restart of the server. 274 o "MCLT" 276 The MCLT refers to maximum client lead time. This time is con- 277 figured on the primary server and transmitted from the primary 278 to the secondary server in the CONNECT message. It is the max- 279 imum amount of time that one server can give to a client for a 280 binding beyond that known and ACKed by the partner server. See 281 section 5.2.1 for details. 283 3. Background and External Requirements 285 This section highlights key aspects of the DHCP protocol on which the 286 failover protocol depends. It also discusses the requirements that 287 the failover protocol places on other aspects of the network infras- 288 tructure, and some general issues surrounding server failure detec- 289 tion. Some failure scenarios that provide particular challenges to a 290 failover protocol are discussed. Finally, the challenges inherent in 291 using a TCP connection as a means to detect failure of a partner 292 server are elaborated. 294 3.1. Key aspects of the DHCP protocol 296 The failover protocol is designed to augment the DHCP protocol as 297 described in RFC 2131 [RFC 2131]. There are several key aspects of 298 the DHCP protocol which are required by the failover protocol in 299 order to successfully meet its design goals. 301 3.1.1. Broadcast behavior 303 There are two aspects of the broadcast behavior of the DHCP protocol 304 which are key to making the failover protocol operate successfully. 305 The first is simply that the DHCP protocol requires a DHCP client to 306 broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages. 307 Because of this requirement, a DHCP client who was communicating with 308 one server will automatically be able to communicate with another 309 server if one is available. 311 The second aspect of broadcast behavior is similar to the first, but 312 involves the distinction between a DHCPREQUEST/RENEW and 313 DHCPREQUEST/REBINDING. A DHCPREQUEST/RENEW is the message that a 314 DHCP client uses to extend its lease. It is unicast to the DHCP 315 server from which it acquired the lease. However, the DHCP protocol 316 (in a farsighted move), was explicitly designed so that in the event 317 that a DHCP client cannot contact the server from which it received a 318 lease on an IP address using a DHCPREQUEST/RENEW, the client is 319 required to broadcast its renewal using a DHCPREQUEST/REBINDING to 320 any available DHCP server. Since all DHCP clients were required to 321 implement this algorithm, the failover protocol can have a different 322 server from the one that initially granted a lease be the server to 323 renew a lease. Thus, one server can take over for another with no 324 interruption in the service as experience by the DHCP client or its 325 associated applications software. 327 3.1.2. Client responsibility 329 In the DHCP protocol the DHCP clients are entrusted with a consider- 330 able responsibility. In particular, after they are granted a lease 331 on an IP address, they are enjoined to only use that IP address while 332 their lease is valid. Every DHCP client is expected to stop using an 333 IP address if the expiration time on the lease has passed and if it 334 cannot get an extension on the lease for that IP address from some 335 DHCP server. Thus, the correct behavior of every DHCP client in this 336 regard is required to ensure the integrity of the DHCP service. On 337 the other hand, incorrect behavior by a client in this area will tend 338 to adversely affect at most one other DHCP client. 340 Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or 341 DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or 342 broadcast for a REBINDING) MUST still have time to run on the lease 343 for that IP address. The DHCP server sends the DHCPACK back unicast 344 to the IP address from which the RENEW or REBINDING originated. 346 Given the existing responsibility placed on the client to only use an 347 IP address when the lease is valid, and to only send in a RENEW or 348 REBINDING if the lease is valid, the failover protocol relies on DHCP 349 clients to perform responsibly and will, in the absence of conflict- 350 ing information, believe a DHCP client that is attempting to RENEW or 351 REBIND a lease on an IP address is the legitimate owner of that IP 352 address. 354 One troublesome issue is that of the DHCP client responsibility when 355 sending in DHCPREQUEST/INIT-REBOOT requests. While the original DHCP 356 RFC was written to require a DHCP client to have time left to run on 357 the lease for an IP address if the client is sending an INIT-REBOOT 358 request, it was sufficiently unclear that some client vendors didn't 359 realize this until recently. Since the INIT-REBOOT request was sent 360 with the IP address in the dhcp-requested-address option and not in 361 the ciaddr (for perfectly good reasons), the similarity to the RENEW 362 and REBINDING case was lost on many people. 364 At present, the failover protocol does not assume that a client send- 365 ing in an INIT-REBOOT request necessarily has a valid lease on the IP 366 address appearing in the dhcp-requested-address option in the INIT- 367 REBOOT request. 369 The implications of this are as follows: Assume that there is a DHCP 370 client that gets a lease from one server while that server is unable 371 to communicate with its failover partner. Then, assume that after 372 that client reboots it is able only to communicate with the other 373 failover server. If the failover servers have not been able to com- 374 municate with each other during this process, then the DHCP client 375 will get a new IP address instead of being able to continue to use 376 its existing IP address. This will affect no applications on the DHCP 377 client, since it is rebooting. However, it will use up an additional 378 IP address in this marginal case. 380 3.1.3. Stable storage update before DHCPACK 382 The DHCP protocol allocates resources, and in order to operate 383 correctly it requires that a DHCP server update some form of stable 384 storage prior to sending a DHCPACK to a DHCP client in order to grant 385 that client a lease on an IP address. 387 One of the goals of the failover protocol is that it not add signifi- 388 cant additional time to this already time consuming requirement to 389 update stable storage prior to a DHCPACK. In particular, adding a 390 requirement to communicate with another server prior to sending a 391 DHCPACK would simplify the failover protocol, but it would limit the 392 potential scalability of any DHCP server which employed the failover 393 protocol in an unacceptable manner. 395 3.2. BOOTP relay agent implementation 397 Many DHCP clients are not resident on the same network segment as a 398 DHCP server. In order to support this form of network architecture, 399 most contemporary routers implement something known as a BOOTP Relay 400 Agent. This capability inside of a router listens for all broadcasts 401 at the DHCP port, port 67, and will relay any broadcasts that it 402 receives on to a DHCP server. The IP address of the DHCP server must 403 have been previously configured into the router. As part of the 404 relay process, the relay agent will place the address of the inter- 405 face on which it received the broadcast into the giaddr field of the 406 DHCP packet. 408 Since the failover protocol requires two DHCP servers to receive any 409 broadcast DHCP messages, in order to work with DHCP clients which are 410 not local to the DHCP server, the BOOTP relay agent on the router 411 closest to the DHCP client must be configured to point at more than 412 one DHCP server. 414 Most BOOTP relay agent implementations allow this duplication of 415 packets. 417 If this is not possible, an administrator might be able to configure 418 the relay agent with a subnet broadcast address, but in this case the 419 primary and secondary DHCP servers in a failover pair must both 420 reside on the same subnet. While this is a realistic configuration, 421 it is not the one that most people will use. 423 3.3. What does it mean if a server can't communicate with its partner? 425 In any protocol designed to allow one server to take over some 426 responsibilities from a partner server in the event of "failure" of 427 that partner server, there is an inherent difficulty in determining 428 when that partner server has failed. 430 In fact, it is fundamentally impossible for one server to distinguish 431 a network communications failure from the outright failure of the 432 server to which it is trying to communicate. In the case where each 433 server is handing out resources (in this case IP addresses) to a 434 client community, mistaking an inability to communicate with a 435 partner server for failure of that partner server could easily cause 436 both servers to be handing out the same IP addresses to different 437 clients. 439 One way that this is sometimes handled is for there to be more than 440 two servers. In the case of an odd number of servers, the servers 441 that can still communicate with a majority of other servers will con- 442 sider themselves operational, and any server which can't communicate 443 to a majority of other servers must immediately cease operations. 445 While this technique works in some domains, having the only server to 446 which a DHCP client can communicate voluntarily shut itself down 447 seems like something worth avoiding. 449 The failover protocol will operate correctly while both servers are 450 unable to communicate, whether they are both running or not. At some 451 point there may be resource contention, and if one of the servers is 452 actually down, then the operator can inform the other server and the 453 operational server will be able to use all of the downed server's 454 resources. 456 The protocol also allows detection of an orderly shutdown of a parti- 457 cipating server. 459 3.4. Challenging scenarios for a Failover protocol 461 There exist two failure scenarios which provide particular challenges 462 the correctness guarantees of a failover protocol. 464 3.4.1. Primary Server crash before "lazy" update: 466 In the case where the primary server sends a DHCPACK to a client for 467 a newly allocated IP address and then crashes prior to sending the 468 corresponding update to the secondary server, the secondary server 469 will have no record of the IP address allocation. When the secondary 470 server takes over, it may well try to allocate that IP address to a 471 different client. In the case where the first client to receive the 472 IP address is not on the net at the time (yet while there was still 473 time to run on its lease), an ICMP echo (i.e., ping) will not prevent 474 the secondary server from allocating that IP address to a different 475 client. 477 The failover protocol deals with this situation by having the primary 478 and secondary servers allocate addresses for new clients from dis- 479 joint address pools. See section 5.4 for details. 481 A more likely (in that DHCPRENEWs are presumably more common than 482 DHCPDISCOVERs) and more subtle version of this problem is where the 483 primary server crashes after extending a client's lease time, and 484 before updating the secondary with a new time using a lazy update. 485 After the secondary takes over, if the client is not connected to the 486 network the secondary will believe the client's lease has expired 487 when, in fact, it has not. In this case as well, the IP address 488 might be reallocated to a different client while the first client is 489 still using it. 491 This scenario is handled by the failover protocol through control of 492 the lease time and the use of the maximum client lead time (MCLT). 493 See section 5.2.1 for details. 495 3.4.2. Network partition where DHCP servers can't communicate but each 496 can talk to clients: 498 Several conditions are required for this situation to occur. First, 499 due to a network failure, the primary and secondary servers cannot 500 communicate. As well, some of the DHCP clients must be able to com- 501 municate with the primary server, and some of the clients must now 502 only be able to communicate with the secondary server. When this 503 condition occurs, both primary and secondary servers could attempt to 504 allocate IP addresses for new clients from the same pool of available 505 addresses. At some point, then, two clients will end up being allo- 506 cated the same IP address. This will cause problems when the network 507 failure that created this situation is corrected. 509 The failover protocol deals with this situation by having the primary 510 and secondary servers allocate addresses for new clients from dis- 511 joint address pools. See section 5.4 for details. 513 3.5. Using TCP to detect partner server failure 515 There are several characteristics of TCP that are important to the 516 functioning of the failover protocol, which uses one TCP connection 517 for both bulk data transfer as well as to assess communications 518 integrity with the other server. Reliable and ordered message 519 delivery are chief among these important characteristics. 521 It would be nice to use the capabilities built in to TCP to allow it 522 to determine if communications integrity exists to the failover 523 partner but this strategy contains some problems which require 524 analysis. There exist three fundamental cases for an open TCP con- 525 nection that must be examined. 527 1. When no data is being sent then no messages are traveling 528 across the TCP connection. 530 2. When data is queued to be sent, and the receiver has not 531 blocked the sending of additional data, then messages are 532 flowing across the TCP connection containing the applications 533 data. 535 3. When data is queued to be sent, and the receiver has blocked 536 the transmission of additional data, then persist messages are 537 flowing from the receiver to the sender to ensure that the 538 sender doesn't miss the receiver opening the window for 539 further transmissions. 541 The first case can be turned into the second case by sending 542 application-level keep-alive messages periodically when there is no 543 other data queued to be sent. Note TCP keep-alive messages might be 544 used as well, but they present additional problems. 546 Thus, we can ensure that the TCP connection has messages flowing 547 periodically across the connection fairly easily. The question 548 remains as to what TCP will do if the other end of the connection 549 fails to respond (either because of network partition or because the 550 receiving server crashes). TCP will attempt to retransmit a message 551 with an exponential backoff, and will eventually timeout that 552 retransmission. However, the length of that timeout cannot, in gen- 553 eral, be set on a per-connection basis, and is frequently as long as 554 nine minutes, though in some cases it may be as short as two minutes. 555 One some systems it can be set system-wide, while on some systems it 556 cannot be changed at all. 558 A value for this timeout that would be appropriate for the failover 559 protocol, say less than 1 minute, could have unpleasant side-effects 560 on other applications running on the same server, assuming that it 561 could be changed at all on the host operating system. 563 Nine minutes is a long time for the DHCP service to be unavailable to 564 any new clients that were being served by the server which has 565 crashed, when there is another server running that could respond to 566 them immediately as soon as it determines that its partner is not 567 operational. 569 The conclusion drawn from this analysis is that TCP provides very 570 useful support for the failover protocol in the areas of reliable and 571 ordered message delivery, but cannot by itself be relied upon to 572 detect partner server failure in a fashion acceptable to the needs of 573 the failover protocol. Additional failover protocol capabilities 574 will need to be created to support timely detection of partner server 575 failure. See section 8.3 for details on this mechanism. 577 4. Design Goals 579 This section lists the design requirements, the design goals, and the 580 limitations of the failover protocol. 582 4.1. Design requirements for this protocol 584 The following list of requirements must be (and are) met by this pro- 585 tocol. They are listed in priority order. 587 1. Implementations of this protocol must work with existing DHCP 588 client implementations based on the DHCP protocol [1]. 590 2. Implementations of the protocol must work with existing BOOTP 591 relay agent implementations. 593 3. The protocol must provide failover redundancy between servers 594 that are not located on the same subnet. 596 4.2. Goals for this protocol 598 The following goals are met by this protocol as well, though they are 599 less important than the requirements listed above. These goals are 600 listed in priority order. 602 1. Provide for continued service to DHCP clients through an 603 automated mechanism in the event of failure of the primary 604 server. 606 2. Avoid binding an IP address to a client while that binding is 607 currently valid for another client. In other words, do not 608 allocate the same IP address to two clients. 610 3. Minimize any need for manual administrative intervention. 612 4. Introduce no additional delays in server response time as a 613 result of the network communications required to implement the 614 failover protocol, i.e., don't require communications with the 615 partner between the receipt of a DHCPREQUEST and the 616 corresponding DHCPACK. 618 5. Share IP address ranges between primary and secondary servers; 619 i.e., impose no requirement that the pool of available 620 addresses be divided between servers. 622 6. Continue to meet the goals and objectives of this protocol in 623 the event of server failure or network partition. 625 7. Provide graceful reintegration of full protocol service after 626 server failure or network partition. 628 8. Allow for one computer to act as a secondary server for multi- 629 ple primary servers. Other topologies (e.g.: mesh) are also 630 possible. primary and secondary servers SHOULD be viewed as 631 "logical" servers and not necessarily physical computers. 633 9. Ensure that an existing client can keep its existing IP 634 address binding if it can communicate with either the primary 635 or secondary DHCP server implementing this protocol - not just 636 whichever server that originally offered it the binding. 638 10. Ensure that a new client can get an IP address from some 639 server. Ensure that in the face of partition, where servers 640 continue to run but cannot communicate with each other, the 641 above goals and requirements may be met. In addition, when the 642 partition condition is removed, allow graceful automatic re- 643 integration without requiring human intervention. 645 11. If either primary or secondary server loses all of the infor- 646 mation that is has stored in stable storage, it should be able 647 to refresh its stable storage from the other server. 649 12. Support load balancing between the primary and secondary 650 servers, and allow configuration of the percentage of the 651 client population served by each with a moderately fine granu- 652 larity. 654 4.3. Limitations of this Protocol 656 The following are explicit limitations of this protocol. 658 1. This protocol provides only one level of redundancy through a 659 single secondary server for each primary server. 661 2. A subset of the address pool is reserved for secondary server 662 use. In order to handle the failure case where both servers 663 are able to communicate with DHCP clients, but unable to com- 664 municate with each other, a subset of the IP address pool must 665 be set aside as a private address pool for the secondary 666 server. The secondary can use these to service newly arrived 667 DHCP clients during such a period. The size of this private 668 pool SHOULD be based only on the arrival rate of new DHCP 669 clients and the length of expected downtime, and is not influ- 670 enced in any way by the total number of DHCP clients supported 671 by the server pair. 673 3. The primary and secondary servers do not respond to client 674 requests at all while recovering from a failure that could 675 have resulted in duplicate IP assignments. (When synchroniz- 676 ing in POTENTIAL-CONFLICT state). 678 5. Protocol Overview 680 This section will discuss the failover protocol at a relatively high 681 level level of detail. In the event that a description in this sec- 682 tion conflicts (or appears to conflict due to the overview nature of 683 this section) with information in later sections of this draft, the 684 information in the later sections should be considered authoritative. 686 5.1. Messages and States 688 This protocol is centered around the message exchange used by one 689 server to update the other server of binding database changes result- 690 ing from DHCP client activity: 692 o Communication of binding database changes 694 The binding update (BNDUPD) message is used to send the binding 695 database changes to the partner server, and the partner server 696 responds with a binding acknowledgement (BNDACK) message when it 697 has successfully committed those changes to its own stable 698 storage. 700 All of the other messages are involve ancillary issues: 702 o Management of available IP addresses 704 The pool request (POOLREQ) is used by the secondary server to 705 request an allocation of IP addresses from the primary server. 707 The pool response (POOLRESP) is used by the primary server to 708 inform the secondary server how many IP addresses it was allo- 709 cated as the result of a pool request. 711 o Synchronization of the binding databases between the servers 712 after they've been out of communications 714 The update request (UPDREQ) message is used by one server to 715 request that its partner send it all binding database informa- 716 tion that it has not already seen. The update request all 717 (UPDREQALL) message is used by one server to request that all 718 binding database information be sent in order to recover from a 719 total loss of its lease state database by the requesting server. 720 The update done (UPDDONE) message is used by the responding 721 server to indicate that all requested updates have been sent the 722 responding server and acked by the requesting server. 724 o Connection establishment 726 The connect (CONNECT) message is used by either server to estab- 727 lish a high level connection with the other server, and to 728 transmit several important configuration data items between the 729 servers. The connect acknowledgement message (CONNECTACK) is 730 used to respond to a CONNECT message from another server. 732 o Server synchronization 734 The state change (STATE) message is used by either server to 735 inform the other server of a change of failover state. 737 o Connection integrity management 739 The contact (CONTACT) message is used by either server to ensure 740 that the other server continues to see the connection as opera- 741 tional. It MUST be transmitted periodically over every esta- 742 blished connection if other message traffic is not flowing, and 743 it MAY be sent at any time. 745 5.1.1. Failover endpoints 747 The proper operation of the failover protocol requires more than the 748 transmission of messages between one server and the other. Each end- 749 point might seem to be a single DHCP server, but in fact there are 750 many situations where additional flexibility in configuration is use- 751 ful. 753 For instance, there might be several servers which are each primary 754 for a distinct set of address pools, and one server which is 755 secondary for all of those address pools. The situation with the 756 primaries is straightforward, but the secondary will need to maintain 757 a separate failover state, partner state, and communications up/down 758 status for each of the separate primary servers for which it is act- 759 ing as a secondary. 761 The failover protocol calls for there to be a unique failover end- 762 point per partner per role (where role is primary or secondary). 763 This failover endpoint can take actions and hold unique states. 764 There are thus a maximum of two failover endpoints per partner (one 765 for the partner as a primary and one for that same partner as a 766 secondary.) 768 Thus, in the case where there are two primary servers A and B each 769 backed up by a single common secondary server C, there is one fail- 770 over endpoint on each of A and B, and two different failover end- 771 points on C. The two different failover endpoints on C each have 772 unique states and independent TCP connections. 774 This document describes the behavior of the protocol in terms of pri- 775 mary and secondary servers, not primary and secondary failover end- 776 points. However, it is important to remember that every 'server' 777 described in this document is in reality a failover endpoint that 778 resides in a particular process, and that many failover endpoints may 779 reside in the same process. 781 It is not the case that there is a unique failover endpoint for each 782 subnet that participates in a failover relationship. On one server, 783 there is one failover endpoint per partner per role, regardless of 784 how many subnets or address pools are managed by that combination of 785 partner and role. Conversely, any given subnet or pool will be asso- 786 ciated with exactly one failover endpoint on a single server. 788 When a connection is received from the partner, the unique failover 789 endpoint to which the message is directed is determined solely by the 790 IP address of the partner and the setting of the SECONDARY bit in the 791 'flags' field of the contact message. 793 Throughout this document, the states and actions taken by "servers" 794 are described. The terms "server", "primary server", and "secondary 795 server" are commonly used to described the failover endpoint taking 796 these states and performing these actions. This description is 797 wholly accurate only for the simplest of cases, where all of the 798 address pools on one server are backed up by all of the address pools 799 on another server. In this case, there is single failover endpoint 800 in each server. In all other cases, the term "server" is used to 801 describe one of the two possible failover endpoints per partner. 803 5.2. Fundamental restrictions 805 There a several fundamental restrictions this protocol places on what 806 one server an do in the absence of knowledge of the other server, and 807 these restrictions are key to the correct operation of the protocol. 809 5.2.1. Control of lease time 811 The key problem with lazy update is that when the a server fails 812 after updating a client with a particular lease time and before 813 updating its partner, the partner will believe that a lease has 814 expired even though the client still retains a valid lease on that IP 815 address. 817 In order to handle this problem, a period of time known as the "Max- 818 imum Client Lead Time" (MCLT) is defined and must be known to both 819 the primary and secondary servers. Proper use of this time interval 820 places an upper bound on the difference allowed between the lease 821 time provided to a DHCP client by a server and the lease time known 822 by that server's partner. However, the MCLT is typically much less 823 than the lease time that a server has been configured to offer a 824 client, and so some strategy must exist to allow a server to offer 825 the configured lease time to a client. During a lazy update the 826 updating server typically updates its partner with a potential 827 expiration time which is longer than the lease time previously given 828 to the client and which is longer than the lease time that the server 829 has been configured to give a client. This allows that server to 830 give a longer lease time to the client the next time the client 831 renews its lease, since the time that it will give to the client will 832 not exceed the MCLT beyond the potential expiration time acknowledged 833 by the partner. 835 When moving to the PARTNER-DOWN state (where a server is allowed to 836 reallocate the partner's IP addresses), a server will wait the Max- 837 imum Client Lead Time before allocating any IP addresses from its 838 partner's pool to any new DHCP clients. Thus, any clients which have 839 a lease on an IP address with a lease time greater than that known by 840 the server moving into PARTNER-DOWN state will either have contacted 841 that server during the MCLT period or their leases will have expired. 843 When a server has transitioned to PARTNER-DOWN state, it MUST NOT 844 reallocate an IP address from one client to another client until an 845 additional maximum client lead time interval after the lease by the 846 original client expires. (Actually, until the maximum client lead 847 time after what it believes to be the lease expiration time of the 848 first client.) 850 Some optimizations exist for this restriction, in that it only 851 applies to leases that were issued BEFORE entering PARTNER-DOWN. Once 852 a server has entered PARTNER-DOWN and it leases out an address, it 853 need not wait this time as long as it has never communicated with the 854 partner since the lease was given out. 856 The fundamental relationship on which much of the correctness of this 857 protocol depends is that the lease expiration time known to a DHCP 858 client MUST NOT be more than the maximum client lead time greater 859 than the potential expiration time known to a server's partner. 861 The remainder of this section makes the above fundamental relation- 862 ship more explicit. 864 This protocol requires a DHCP server to deal with several different 865 lease intervals and places specific restrictions on their relation- 866 ships. The purpose of these restrictions is to allow the other server 867 in the pair to be able to make certain assumptions in the absence of 868 an ability to communicate between servers. 870 The different lease times are: 872 o desired lease interval 874 The desired lease interval is the lease interval that a DHCP 875 server would like to give to a DHCP client in the absence of any 876 restrictions imposed by the Failover protocol. Its determina- 877 tion is outside of the scope of this protocol. Typically this is 878 the result of external configuration of a DHCP server. 880 o actual lease interval 882 The actual lease internal is the lease interval that a DHCP 883 server gives out to a DHCP client in the dhcp-lease-time option 884 of a DHCPACK packet. It may be shorter than the desired client 885 lease interval (as explained below). 887 o potential lease interval 889 The potential lease interval is the lease expiration interval 890 the local server tells to its partner in the potential- 891 expiration-time option of a BNDUPD message. 893 o acknowledged potential lease interval 895 The acknowledged potential lease interval is the potential least 896 interval the partner server has most recently acknowledged in 897 the potential-expiration-time option of a BNDACK message. 899 The key restriction (and guarantee) that any server makes with 900 respect to lease intervals is that the actual client lease interval 901 never exceeds the acknowledged potential lease interval (if any) by 902 more than a fixed amount. This fixed amount is called the "Maximum 903 Client Lead Time" (MCLT). 905 The MCLT MAY be configurable on the primary server, but for correct 906 server operation it MUST be the same and known to both the primary 907 and secondary servers. The secondary server determines the MCLT from 908 the MCLT option sent from the primary server to the secondary server 909 in the CONNECT or CONNECTACK message. 911 A server MUST record in its stable storage both the actual lease 912 interval and the most recently acknowledged potential lease interval 913 for each IP address binding. It is assumed that the desired client 914 lease interval can be determined through techniques outside of the 915 scope of this protocol. 917 Again, the fundamental relationship among these times which MUST be 918 maintained is: 920 actual lease interval < 921 ( acknowledged potential lease interval + MCLT ) 923 Figure 5.1-1 illustrates a initial lease to a client using the rules 924 discussed in the example which follows it. 926 DHCP Primary Secondary 927 time Client Server Server 929 | (time in intervals) | (absolute time) | 930 | | | 931 | >-DHCPDISCOVER-> | | 932 | <---DHCPOFFER-< | | 933 | | | 934 | >-DHCPREQUEST-> | | 935 | (selecting) | | 936 | | | 937 t | <--------DHCPACK-< | | 938 | lease-time=MCLT | | 939 | | >-BNDUPD--> | 940 | | lease-expiration=t+MCLT 941 | | potential-expiration=t+(MCLT/2)+X 942 | | | 943 | | <-BNDACK-< | 944 | | potential-expiration=t+(MCLT/2)+X 945 ... ... ... 946 | | | 947 t+MCLT/2 | >-DHCPREQUEST-> | | 948 | (renew) | | 949 | | | 950 t1 | <--------DHCPACK-< | | 951 | lease-time=X | | 952 | | >-BNDUPD--> | 953 | | lease-expiration=t1+X 954 | | potential-expiration=t1+(X/2)+X 955 | | | 956 | | <-BNDACK-< | 957 | | potential-expiration=t1+(X/2)+X 958 ... ... ... 960 Figure 5.1-1: Lazy Update Message Traffic 961 X = Desired Lease Interval 963 DISCUSSION: 965 This protocol mandates no algorithm concerning these lease inter- 966 vals, as long as above fundamental relationship is preserved. 968 In the interests of clarity, however, let's examine a specific 969 example. The MCLT in this case is 1 hour. The desired lease 970 interval is 3 days, and its renewal time is half the lease inter- 971 val. 973 The rules for this example are: 975 o What to tell the client: 977 Take the remainder of the acknowledged potential lease interval. 978 If this is a new lease, then this value will be zero. If this 979 remainder plus the MCLT is greater than the desired lease inter- 980 val, give the client the desired lease interval else give the 981 client the remainder plus the MCLT. 983 o What to tell the failover partner server: 985 Take the renewal interval (typically half of the actual client 986 lease interval), add to it the desired lease interval, and add 987 it to the current time to yield the value that goes into the 988 potential-expiration-time option. 990 Also tell the failover partner the actual lease interval by 991 adding it to the current time to yield the value that goes into 992 the lease-expiration option. 994 In operation this might work as follows: 996 When a server makes an offer for a new lease on an IP address to a 997 DHCP client, it determines the desired lease interval (in this 998 case, 3 days). It then examines the acknowledged potential lease 999 interval (which in this case is zero) and determines the remainder 1000 of the time left to run, which is also zero. To this it adds the 1001 MCLT. Since the actual lease interval cannot be allowed to exceed 1002 the remainder of the current acknowledged potential lease interval 1003 plus the MCLT, the offer made to the client is for the remainder 1004 of the current acknowledged potential lease interval (i.e., zero) 1005 plus the MCLT. Thus, the actual lease interval is 1 hour. 1007 Once the server has performed the ACK to the DHCP client, it will 1008 update the secondary server with the lease information. However, 1009 the desired potential lease interval will be composed of the one 1010 half of the current actual lease interval added to the desired 1011 lease interval. Thus, the secondary server is updated with a 1012 BNDUPD with a lease interval of 3 days + 1/2 hour specified in the 1013 IP Address Lease Time Option (Option 51). 1015 When the primary server receives an ACK to its update of the 1016 secondary server's (partner's) potential lease interval, it 1017 records that as the acknowledged potential lease interval. A 1018 server MUST NOT send a BNDACK in response to a BNDUPD message 1019 until it is sure that the information in the BNDUPD message 1020 resides in its stable storage. Thus, the primary server in this 1021 case can be sure that the secondary server has recorded the poten- 1022 tial lease interval in its stable storage when the primary server 1023 receives a BNDACK message from the secondary server. 1025 When the DHCP client attempts to renew at T1 (approximately one 1026 half an hour from the start of the lease), the primary server 1027 again determines the desired lease interval, which is still 3 1028 days. It then compares this with the remaining acknowledged 1029 potential lease interval (3 days + 1/2 hour) and adjusts for the 1030 time passed since the secondary was last updated (1/2 hour). Thus 1031 the time remaining of the acknowledged potential lease interval is 1032 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which 1033 is more than the desired lease interval of 3 days. So the client 1034 is renewed for the desired lease interval -- 3 days. 1036 When the primary DHCP server updates the secondary DHCP server 1037 after the DHCP client's renewal ACK is complete, it will calculate 1038 the desired potential lease interval as the T1 fraction of the 1039 actual client lease interval (1/2 of 3 days this time = 1.5 days). 1040 To this it will add the desired client lease interval of 3 days, 1041 yielding a total desired partner server lease interval of 4.5 1042 days. In this way, the primary attempts to have the secondary 1043 always "lead" the client in its understanding of the client's 1044 lease interval so as to be able to always offer the client the 1045 desired client lease interval. 1047 Once the initial actual client lease interval of the MCLT is past, 1048 the protocol operates effectively like the DHCP protocol does 1049 today in its behavior concerning lease intervals. However, the 1050 guarantee that the actual client lease interval will never exceed 1051 the remaining acknowledged partner server lease interval by more 1052 than the MCLT allows full recovery from a variety of failures. 1054 5.2.2. Controlled re-allocation of IP addresses 1056 When in PARTNER-DOWN state there is a waiting period after which an 1057 IP address can be re-allocated to another client. For leases which 1058 are available when the server enters PARTNER-DOWN state, the period 1059 is the MCLT from entry into PARTNER-DOWN state. For IP addresses 1060 which are not available when the server enters PARTNER-DOWN state, 1061 the period is the MCLT after the lease becomes available. See sec- 1062 tion 9.4.2 for more details. 1064 In any other state, a server cannot reallocate an address from one 1065 client to another without first notifying its partner (through a 1066 BNDUPD message) and receiving acknowledgement (through a BNDACK mes- 1067 sage) that its partner is aware that that first client is not using 1068 the address. 1070 This could be modeled in the following way. Though this specific 1071 implementation is in no way required, it may serve to better illus- 1072 trate the concept. 1074 An "available" IP address on a server may be allocated to any client. 1075 An IP address which was leased to a client and which expired or was 1076 released by that client would take on a new state, EXPIRED or 1077 RELEASED respectively. The partner server would then be notified 1078 that this IP address was EXPIRED or RELEASED through a BNDUPD. When 1079 the sending server received the BNDACK for that IP address showing it 1080 was FREE, it would move the IP address from EXPIRED or RELEASED to 1081 FREE, and it would be available for allocation by the primary server 1082 to any clients. 1084 A server MAY reallocate an IP address in the EXPIRED or RELEASED 1085 state to the same client with no restrictions. 1087 5.3. Load balancing 1089 In order to implement load balancing between a primary and secondary 1090 server pair, each server must respond to DHCPDISCOVER requests from 1091 some clients and not from other clients. In order to do this suc- 1092 cessfully, each server must be able to determine immediately upon 1093 receipt of a DHCP client request whether it is to service this 1094 request or to ignore it in order to allow the other server to service 1095 the request. 1097 In addition, it should be possible to configure the percentage of 1098 clients which will be serviced by either the primary or secondary 1099 server. This configuration should be more or less continuous, from 1100 all serviced by the primary through an even split with half serviced 1101 by each, to all serviced by the secondary. 1103 The technique chosen to support these goals is to define a hash func- 1104 tion which must be applied to the client-identifier or to the htype 1105 concatenated with the chaddr if no client-identifier is specified. 1106 The results of this hash function yields a number between 0 and 255 1107 which maps into one of 256 "hash-buckets". Each hash bucket is 1108 assigned to one server or the other by the primary server whenever a 1109 connection is established, through use of the hash-bucket-assignment 1110 option. 1112 The hash-bucket-assignment option uses a 32 octet value field (con- 1113 taining 256 bits), with one bit associated with each possible hash 1114 bucket. If the bit corresponding to a hash bucket is a 1 in the 1115 hash-bucket-assignment option, then the secondary server is required 1116 to service all DHCP client requests that map into that hash bucket 1117 when in NORMAL state. 1119 For example, if the primary server sends a hash-bucket-assignment 1120 option to the secondary with the following 32 octets: 1122 buckets 1123 FF FF FF FF FF FF FF FF ( 0 - 63 ) 1124 FF FF FF FF FF FF FF FF ( 64 - 127 ) 1125 00 00 00 00 00 00 00 00 ( 128 - 191 ) 1126 00 00 00 00 00 00 00 00 ( 192 - 255 ) 1128 then the secondary MUST service any DHCP client requests where the 1129 client-identifier or htype concatenated with the chaddr hashs into 1130 the bucket values of 0 through 127. 1132 See section 12 for the code to implement the hash bucket algorithm. 1133 Each server MUST implement this same algorithm in order for all 1134 clients to get service. 1136 5.4. Operating in NORMAL state 1138 When in NORMAL state, each server services DHCPDISCOVER's and all 1139 other DHCP requests other than DHCPREQUEST/RENEWAL or 1140 DHCPREQUEST/REBINDING from the client set defined by the load balanc- 1141 ing algorithm. Each server services DHCPREQUEST/RENEWAL or 1142 DHCPDISCOVER/REBINDING requests from any client. 1144 In general, whenever the binding database is changed in stable 1145 storage, then a BNDUPD message is sent with the contents of that 1146 change to the partner server. The partner server then writes the 1147 information about that binding in its bindings database in stable 1148 storage and replies with a BNDACK message. 1150 5.5. Operating in COMMUNICATIONS-INTERRUPTED state 1152 When operating in COMMUNICATIONS-INTERRUPTED state, each server is 1153 operating independently, but does not assume that its partner is not 1154 operating. The partner server might be operating and simply unable 1155 to communicate with this server, or might not be operating. 1157 Each server responds to the full range of DHCP client messages that 1158 it receives, but in such a way that graceful reintegration is alway 1159 possible when its partner comes back into contact with it. 1161 5.6. Operating in PARTNER-DOWN state 1163 When operating in PARTNER-DOWN state, a server assumes that its 1164 partner is not currently operating, but does make allowances for the 1165 possibility that that server was operating in the past. It responds 1166 to all DHCP client requests in PARTNER-DOWN state. 1168 Any transactions that the partner server may have had with DHCP 1169 clients but been unable to communicate to this server are allowed for 1170 in the algorithms that are used to gradually take over full control 1171 of all of the addresses configured into the server. 1173 5.7. Operating in RECOVER state 1175 A server operating in RECOVER state assumes that it is reintegrating 1176 with a server that has been operating in PARTNER-DOWN state, and that 1177 it needs to update its bindings database before it services DHCP 1178 client requests. 1180 A server may also operate in RECOVER state in order to fully recover 1181 its bindings database from its partner server. 1183 6. Packet Formats 1185 This section discusses the common message format that all failover 1186 messages have in common, and then defines option used in the failover 1187 protocol. 1189 6.1. Common message format 1191 All failover protocol messages are sent over the TCP connection 1192 between failover endpoints and encoded using a packet format specific 1193 to the failover protocol. 1195 There exists a common message format for all failover messages, which 1196 utilizes the options in a way similar to the DHCP protocol. For each 1197 message type, some options are required and some are optional. In 1198 addition, when a message is received any options that are not under- 1199 stood by the receiving server MUST be ignored. 1201 All of the fields in the fixed portion of the packet MUST be filled 1202 with correct data in every message sent. 1204 0 1 2 3 1205 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1207 | packet length (2) | msg type (1) |payload off (1)| 1208 +---------------+---------------+---------------+---------------+ 1209 | xid (4) | 1210 +---------------------------------------------------------------+ 1211 | 0 or more additional header bytes (variable) | 1212 +---------------------------------------------------------------+ 1213 | payload data (variable) | 1214 | | 1215 | formatted as DHCP-style options | 1216 | using a unique option number space in the ?R6? | 1217 | format defined by [NAMESPACE] | 1218 +---------------------------------------------------------------+ 1220 packet length - 2 bytes, network byte order 1222 This is the length of the packet. It includes the two byte packet 1223 length itself. 1225 msg type - 1 byte 1227 The message type field is used to distinguish between messages. 1229 The following message types are defined: 1231 Value Message Type 1232 ----- ------------ 1233 0 reserved not used 1234 1 POOLREQ request allocation of addresses 1235 2 POOLRESP respond with allocation count 1236 3 BNDUPD update partner with binding info 1237 4 BNDACK acknowledge receipt of binding update 1238 5 CONNECT establish connection with partner 1239 6 CONNECTACK respond to attempt to establish contact with partner 1240 7 UPDREQALL request full transfer of binding info 1241 8 UPDDONE ack send and ack of req'd binding info 1242 9 UPDREQ req transfer of un-acked binding info 1243 10 STATE inform partner of current state or state change 1244 11 CONTACT probe communications integrity with partner 1246 New message types should be defined in one of two ranges, 0-127 or 1247 129-255. The range of 0-127 is used for messages that MUST be 1248 supported by every server, and if a server receives a message in the 1249 range of 0-127 that it doesn't understand, it MUST drop the TCP con- 1250 nection. The range of 128-255 is used for messages which MAY be sup- 1251 ported but are not required, and if a server receives a message in 1252 this range that it does not understand it SHOULD ignore the message. 1254 payload offset - 1 byte 1256 The byte offset of the Payload Data, from the beginning of the 1257 failover packet header. The value for the current protocol version is 1258 8. 1260 xid - 4 bytes, network byte order 1262 This is the transaction id of the failover packet. The sender of a 1263 failover protocol packet is responsible for setting this number, and 1264 the receiver of the packet copies the number over into any response 1265 packet, treating it as opaque data. The sender SHOULD ensure that 1266 every packet sent from a particular failover endpoint over the 1267 associated TCP connection has a unique transaction id unless that 1268 packet is a re-transmission. 1270 payload data - variable length 1272 The options are placed after the header, after skipping payload 1273 offset bytes from beginning of the packet. The payload data options 1274 are not preceded by a "cookie" value. 1276 The payload data is formatted as DHCP style options using the two 1277 byte option number and two byte option length format as specified in 1278 the recommendations of the DHCP panel in [NAMESPACE]. 1280 The maximum length of the payload data in octets is 2048 less the 1281 size of the header, i.e., the maximum packet length is 2048 octets. 1283 6.2. Common option format 1285 The options contained in the payload data section of the failover 1286 packet all use the two byte option number and two byte length format 1287 as specified by the recommendations of the DHCP panel in [NAMESPACE]. 1289 The option numbers are drawn from an option number space unique to 1290 the failover protocol. All of the message types share a common 1291 option number space and common options definitions, though not all 1292 options are required or meaningful for every message. 1294 In contrast to the options which appear in DHCP client and server 1295 packets, the options in failover message are ordered. That is, for 1296 some messages the order in which the options appear in the payload 1297 data area is significant. The messages for which this is the case 1298 spell it out in detail. 1300 For all options which refer to time, they all use an absolute time in 1301 GMT. Time synchronization has already been achieved between the 1302 source and the target server using the CONNECT message. All time 1303 fields in the options defined below use a time represented as seconds 1304 elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value representa- 1305 tion). Note that this is (at present) a signed field. 1307 Additional options can be defined for intervendor or vendor specific 1308 use with limited difficulty due to the large number of option numbers 1309 available. 1311 6.2.1. binding-status 1313 This option is used to convey the current state of a binding. 1315 Code Len Type 1316 +-----+-----+------+-----+-----+ 1317 | 0 | 1 | 0 | 1 | 1-7 | 1318 +-----+-----+------+-----+-----+ 1320 Legal values for this option are: 1322 Value Binding Status 1323 ----- ------------------------------------------------ 1324 1 FREE Lease has never been used 1325 2 ACTIVE Lease is assigned to a client 1326 3 EXPIRED Lease has expired 1327 4 RELEASED Lease has been released by client 1328 5 ABANDONED A server, or client flagged address as unusable 1329 6 RESET Lease was freed by some external agent 1330 7 BACKUP Lease belongs to secondary's private address pool 1331 8 EXPIRED-GRACE Lease will become available after this period 1332 9 RELEASED-GRACE Lease will become available after this period 1334 6.2.2. assigned-IP-address 1336 The IP address to which this message refers. 1338 Code Len Address 1339 +-----+-----+------+-----+----+-----+-----+-----+ 1340 | 0 | 2 | 0 | 4 | a1 | a2 | a3 | a4 | 1341 +-----+-----+------+-----+----+-----+-----+-----+ 1343 6.2.3. sending-server-IP-address 1345 The IP address of the server sending this message. 1347 Code Len Address 1348 +-----+-----+------+-----+----+-----+-----+-----+ 1349 | 0 | 3 | 0 | 4 | a1 | a2 | a3 | a4 | 1350 +-----+-----+------+-----+----+-----+-----+-----+ 1352 6.2.4. addresses-transferred 1354 A 32 bit unsigned long in network byte order. Reports the number of 1355 addresses transferred by the primary to the secondary server 1356 (addresses to be used for the secondary server's private address 1357 pool) 1359 Code Len Number of Addresses 1360 +-----+-----+------+-----+----+-----+-----+-----+ 1361 | 0 | 4 | 0 | 4 | n1 | n2 | n3 | n4 | 1362 +-----+-----+------+-----+----+-----+-----+-----+ 1364 6.2.5. client-identifier 1366 The format, code and conventions used are identical to DHCP option 1367 61. 1369 Code Len Client Identifier 1370 +-----+-----+------+-----+----+-----+--- 1371 | 0 | 5 | 0 | n | i1 | i2 | ... 1372 +-----+-----+------+-----+----+-----+-- 1374 6.2.6. client-hardware-address 1376 The format is similar to DHCP option 61. Byte t1 (type) MUST be set 1377 to the proper ARP hardware address code, as defined in the ARP 1378 section of RFC 1700 (it MUST NOT be zero!) 1380 Code Len MAC address 1381 +-----+-----+------+-----+----+-----+-----+--- 1382 | 0 | 6 | 0 | n | t1 | m1 | m2 | ... 1383 +-----+-----+------+-----+----+-----+-----+--- 1385 Either Client Id, Client Hardware Address or BOTH MAY be present in 1386 binding update transactions. At least one of them MUST be present. 1387 If both are present, the Client Id MUST be used to uniquely identify 1388 the owner of the binding (exactly as in RFC 2131). 1390 6.2.7. client-FQDN 1392 If an implementation supports Dynamic DNS updates, this option can be 1393 used to communicate the DNS name that was set. Uses the format of the 1394 Client FQDN option (81) as described in [DDNS] and extended to fit in 1395 the two byte code and length approach of the DHCP panel. 1397 Code Len Flags Rcode1 Rcode2 Domain Name 1398 +-----+-----+------+-----+-----+------+------+-----+------ 1399 | 0 | 7 | 0 | n | f | r1 | r2 | d1 | d2... 1400 +-----+-----+------+-----+-----+------+------+-----+------ 1402 6.2.8. reject-reason 1404 This option is used to selectively reject binding updates. It MAY be 1405 used in BNDACK message, always associated with an assigned-IP-address 1406 option, which contains the IP address of the update being rejected. 1408 Code Len Reason Code 1409 +-----+-----+------+-----+----------+ 1410 | 0 | 8 | 0 | 1 | R1 | 1411 +-----+-----+------+-----+----------+ 1413 Reason codes : 1415 0 Reserved 1416 1 Illegal IP address (not part of any address pool) 1417 2 Fatal conflict exists: address in use by other client. 1418 3 Missing binding information. 1419 4 Connection rejected, time mismatch too great. 1420 5 Connection rejected, invalid MCLT. 1421 6 Connection rejected, unknown reason. 1422 7 Connection rejected, duplicate connection. 1423 8 Connection rejected, invalid failover partner. 1424 9 TLS not supported 1425 10 TLS supported but not configured 1426 11 TLS required but not supported by partner 1427 12 Message digest not supported 1428 13 Message digest not configured 1429 14 Protocol version mismatch 1430 15 Missing binding information 1431 16 Outdata binding information 1432 17 Less critical binding information 1433 18-253, reserved. 1434 254 Unknown: Error occurred but does not match any reason code 1435 255 Reserved for code expansion 1437 6.2.9. message 1439 This option is used to supply a human readable message. It may be 1440 used in association with the Reject Reason Code to provide a human 1441 readable error message for the reject. 1443 Code Len Text 1444 +-----+-----+------+-----+------+-----+-- 1445 | 0 | 9 | 0 | n | c1 | c2 | ... 1446 +-----+-----+------+-----+------+-----+-- 1448 6.2.10. MCLT 1450 Maximum Client Lead Time, in seconds. A 32 bit integer value, in 1451 network byte order. T 1453 Code Len Time 1454 +-----+-----+------+-----+----+-----+-----+-----+ 1455 | 0 | 10 | 0 | 4 | t1 | t2 | t3 | t4 | 1456 +-----+-----+------+-----+----+-----+-----+-----+ 1458 6.2.11. vendor-class-identifier 1460 A string which identifies the vendor of the failover protocol 1461 implementation. 1463 The code for this option is 60, and its minimum length is 1. 1465 Code Len vendor class string 1466 +-----+-----+------+-----+----+-----+--- 1467 | 0 | 11 | 0 | n | c1 | c2 | ... 1468 +-----+-----+------+-----+----+-----+--- 1470 6.2.12. current-time 1472 The current time expressed as an absolute time in GMT represented as 1473 seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value 1474 representation). 1476 Code Len Current Time 1477 +-----+-----+------+-----+----+-----+-----+-----+ 1478 | 0 | 12 | 0 | 4 | t1 | t2 | t3 | t4 | 1479 +-----+-----+------+-----+----+-----+-----+-----+ 1481 6.2.13. lease-expiration-time 1483 The lease expiration time expressed as an absolute time in GMT 1484 represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t 1485 time value representation). 1487 The lease expiration time is the time that a server has ACKed to a 1488 DHCP client. 1490 Code Len Time 1491 +-----+-----+------+-----+----+-----+-----+-----+ 1492 | 0 | 13 | 0 | 4 | t1 | t2 | t3 | t4 | 1493 +-----+-----+------+-----+----+-----+-----+-----+ 1495 6.2.14. potential-expiration-time 1497 The potential expiration time expressed as an absolute time in GMT 1498 represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t 1499 time value representation). 1501 The potential expiration time is the time that one server tells 1502 another server that it may ACK to a client. 1504 Code Len Time 1505 +-----+-----+------+-----+----+-----+-----+-----+ 1506 | 0 | 14 | 0 | 4 | t1 | t2 | t3 | t4 | 1507 +-----+-----+------+-----+----+-----+-----+-----+ 1509 6.2.15. grace-expiration-time 1511 The grace expiration time expressed as an absolute time in GMT 1512 represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t 1513 time value representation). 1515 The grace expiration time is the time that a grace period will 1516 expire. 1518 Code Len Time 1519 +-----+-----+------+-----+----+-----+-----+-----+ 1520 | 0 | 15 | 0 | 4 | t1 | t2 | t3 | t4 | 1521 +-----+-----+------+-----+----+-----+-----+-----+ 1523 6.2.16. client-last-transaction-time 1525 The time at which this server last received a DHCP request from a 1526 particular client expressed as an absolute time in GMT represented as 1527 seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value 1528 representation). 1530 Code Len Partner Down Time 1531 +-----+-----+------+-----+----+-----+-----+-----+ 1532 | 0 | 16 | 0 | 4 | t1 | t2 | t3 | t4 | 1533 +-----+-----+------+-----+----+-----+-----+-----+ 1535 6.2.17. start-time-of-state 1537 The time at which the state contained in this message began, 1538 expressed as an absolute time in GMT represented as seconds elapsed 1539 since Jan 1, 1970 (i.e. ANSI C time_t time value representation). 1541 This option is used for different states in different messages. In a 1542 BNDUPD message it represents the start time of the state of the lease 1543 in the BNDUPD message. In a STATE message, it represents the start 1544 time of the partner server's failover state. 1546 Code Len Start Time of State 1547 +-----+-----+------+-----+----+-----+-----+-----+ 1548 | 0 | 17 | 0 | 4 | t1 | t2 | t3 | t4 | 1549 +-----+-----+------+-----+----+-----+-----+-----+ 1551 6.2.18. server-state 1553 This option is used to convey the current state of the failover 1554 endpoint in the sending server. 1556 Code Len Server State 1557 +-----+-----+------+-----+-----+ 1558 | 0 | 18 | 0 | 1 | 1-9 | 1559 +-----+-----+------+-----+-----+ 1561 Legal values for this option are: 1563 Value Server State 1564 ----- ------------------------------------------------------------- 1565 0 reserved 1566 1 STARTUP Startup state (1) 1567 2 NORMAL Normal state 1568 3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe) 1569 4 PARTNER-DOWN Partner down (unsafe mode) 1570 5 POTENTIAL-CONFLICT Synchronizing 1571 6 RECOVER Recovering bindings from partner 1572 7 PAUSED Shutting down for a short period. 1573 8 SHUTDOWN Shutting down for an extended 1574 period. 1575 9 RECOVER-DONE Interlock state prior to NORMAL 1577 6.2.19. server-flags 1579 This option is used to convey the current flags of the failover 1580 endpoint in the sending server. 1582 Code Len Server Flags 1583 +-----+-----+------+-----+-------+ 1584 | 0 | 19 | 0 | 1 | flags | 1585 +-----+-----+------+-----+-------+ 1587 Legal values for this option are: 1589 Currently, bit 5 is defined. All other bits 1590 are reserved, and must be set to 0. 1592 o STARTUP 1594 Bit 5 is the STARTUP flag. Bit 5 MUST be set to 1 whenever the 1595 server is in STARTUP state, and set to 0 otherwise. (Note that 1596 when in STARTUP state, the state transmitted in the server-state 1597 option is usually the last recorded state from stable storage, 1598 but see section 9.3 for details.) 1600 6.2.20. vendor-specific-options 1602 This option is used to convey options specific to a particular 1603 vendor's implementation. The vendor class identifier is used to 1604 specify which option space the embedded options are drawn from. 1606 It functions similarly to the vendor class identifier and vendor 1607 specific options in the DHCP protocol. 1609 This option contains other options in the same two byte code, two 1610 byte length format. If this option appears in a message without a 1611 corresponding vendor class identifier, it MUST be ignored. 1613 Code Len Embedded options 1614 +-----+-----+------+-----+----+-----+--- 1615 | 0 | 20 | 0 | n | c1 | c2 | ... 1616 +-----+-----+------+-----+----+-----+--- 1618 6.2.21. max-unacked-bndupd 1620 The maximum number of BNDUPD message that this server is prepared to 1621 accept over the TCP connection without causing the TCP connection to 1622 block. 1624 Code Len Maximum Unacked BNDUPD 1625 +-----+-----+------+-----+----+-----+-----+-----+ 1626 | 0 | 21 | 0 | 4 | n1 | n2 | n3 | n4 | 1627 +-----+-----+------+-----+----+-----+-----+-----+ 1629 6.2.22. server-role 1631 This option is used to convey the role of the failover endpoint in 1632 the sending server. 1634 Code Len Role 1635 +-----+-----+------+-----+-------+ 1636 | 0 | 22 | 0 | 1 | r1 | 1637 +-----+-----+------+-----+-------+ 1639 A value of 0 indicates that the failover endpoint is a primary server 1640 and a value of 1 indicates that it is a secondary server. 1642 6.2.23. receive-timer 1644 The number of seconds within which the server must receive a packet 1645 from its partner, or it will assume that the partner is down or the 1646 communication path to the partner has failed. 1648 Code Len Receive Timer 1649 +-----+-----+------+-----+----+-----+-----+-----+ 1650 | 0 | 23 | 0 | 4 | s1 | s2 | s3 | s4 | 1651 +-----+-----+------+-----+----+-----+-----+-----+ 1653 6.2.24. hash-bucket-assignment 1655 The set of hash values to which the receiving server MUST respond. 1656 See section 5.3 for more information on how this option is used. 1658 This option consists of a set of 32 bytes, in network byte order, 1659 where each bit corresponds to one of 256 possible hash bucket values. 1660 If a bit is set to 1, the recipient is required to service the 1661 requests whose client-identifier or htype concatenated with the 1662 chaddr (if no client-identifier exists) map into the corresponding 1663 hash bucket. 1665 Code Len Hash Buckets 1666 +-----+-----+------+-----+----+-----+-----+-----+ 1667 | 0 | 24 | 0 | 32 | b1 | b2 | ... | b32 | 1668 +-----+-----+------+-----+----+-----+-----+-----+ 1670 6.2.25. message-digest 1672 The message digest for this message. 1674 This option consists of a variable number of bytes which contain the 1675 message digest of the message prior to the inclusion of this option. 1677 When this option appears in a message, it MUST appear as the last 1678 option in the message. 1680 Code Len Message Digest 1681 +-----+-----+------+-----+----+-----+----- 1682 | 0 | 25 | 0 | n | d1 | d2 | ... 1683 +-----+-----+------+-----+----+-----+----- 1685 6.2.26. protocol-version 1687 The protocol version being used by the server. It is only sent in the 1688 CONNECT and CONNECTACK messages. 1690 Code Len Version 1691 +-----+-----+------+-----+----+ 1692 | 0 | 26 | 0 | 1 | v1 | 1693 +-----+-----+------+-----+----+ 1695 6.2.27. TLS-request 1697 This option contains information relating to TLS security 1698 negotiation. It is sent in a CONNECT message 1700 The first byte, req, is the TLS request from this server. A value of 1701 0 indicates no TLS operation, a value of 1 indicates that TLS 1702 operation is desired, and a value of 2 indicates that TLS operation 1703 is required to establish communications with this server. 1705 The second byte, acc, is what this server will accept for TLS 1706 operation. A value of 0 means that this server will not accept TLS 1707 connections. A value of 1 means that this server will accept TLS 1708 connections. 1710 If req is not zero, then acc MUST be 1. 1712 This allows a server which is not configured for TLS support to 1713 inform its partner that it will accept a TLS connection although it 1714 does not desire one, for instance. 1716 Code Len request acccept 1717 +-----+-----+------+-----+----+----+ 1718 | 0 | 27 | 0 | 2 | req| acc| 1719 +-----+-----+------+-----+----+----+ 1721 6.2.28. TLS-reply 1723 This option contains information relating to TLS security 1724 negotiation. It is sent in a CONNECTACK message 1726 The value of 0 indicates no TLS operation, a value of 1 indicates 1727 that TLS operation is required. 1729 Code Len TLS 1730 +-----+-----+------+-----+----+ 1731 | 0 | 28 | 0 | 1 | t1 | 1732 +-----+-----+------+-----+----+ 1734 6.3. BNDUPD message format 1736 The binding update (BNDUPD) message is used to send the binding data- 1737 base changes to the partner server. 1739 The message type for the BNDUPD message is 3. 1741 The xid of the BNDUPD MUST be unique with respect to other failover 1742 messages transmitted from this failover endpoint. 1744 The following table summarizes the various options for the BNDUPD 1745 message. 1747 binding-status 1749 Option ACTIVE EXPIRED RELEASED FREE 1750 ------ ------ ------- -------- ---- 1751 assigned-IP-address MUST MUST MUST MUST 1752 binding-status MUST MUST MUST MUST 1753 client-identifier MAY MAY MAY MAY 1754 client-hardware-address MUST MUST MUST MAY 1755 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 1756 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 1757 grace-expiration-time MUST NOT MUST NOT MUST NOT MUST NOT 1758 start-time-of-state SHOULD SHOULD SHOULD SHOULD 1759 client-last-trans.-time SHOULD SHOULD SHOULD MAY 1760 client-FQDN(1) SHOULD SHOULD SHOULD SHOULD 1761 all others MAY MAY MAY MAY 1763 binding-status 1764 BACKUP 1765 EXPIRED- RELEASED- RESET 1766 Option GRACE GRACE ABANDONED 1767 ------ ------ ----- --------- 1768 assigned-IP-address MUST MUST MUST 1769 binding-status MUST MUST MUST 1770 client-identifier MAY MAY MAY(2) 1771 client-hardware-address MAY MAY MAY(2) 1772 lease-expiration-time MUST NOT MUST NOT MUST NOT 1773 potential-expiration-time MUST NOT MUST NOT MUST NOT 1774 grace-expiration-time MUST MUST MUST NOT 1775 start-time-of-state SHOULD SHOULD SHOULD 1776 client-last-trans.-time SHOULD SHOULD MAY 1777 client-FQDN(1) SHOULD SHOULD SHOULD 1778 all others MAY MAY MAY 1780 (1) Only SHOULD appear if client supplies a host name and dynamic DNS 1781 is used. 1783 (2) MUST NOT if binding-status is ABANDONED. 1785 Table 6.3-1: Options used in a BNDACK message 1787 6.4. BNDACK message format 1789 A server sends a binding acknowledgement (BNDACK) message when it has 1790 successfully committed binding database changes received from a fail- 1791 over partner in a BNDUPD message to its own stable storage. 1793 The message type for the BNDACK message is 4. 1795 The xid in a BNDACK MUST be the same as the xid of the corresponding 1796 BNDUPD. 1798 The following table summarizes the options for the BNDACK message. 1800 binding-status 1802 Option ACTIVE EXPIRED RELEASED FREE 1803 ------ ------ ------- -------- ---- 1804 assigned-IP-address MUST MUST MUST MUST 1805 binding-status MUST MUST MUST MUST 1806 client-identifier MAY MAY MAY MAY 1807 client-hardware-address MUST MUST MUST MAY 1808 reject-reason MAY MAY MAY MAY 1809 message MAY MAY MAY MAY 1810 lease-expiration-time MUST MUST NOT MUST NOT MUST NOT 1811 potential-expiration-time MUST MUST NOT MUST NOT MUST NOT 1812 grace-expiration-time MUST NOT MUST NOT MUST NOT MUST NOT 1813 start-time-of-state SHOULD SHOULD SHOULD SHOULD 1814 client-last-trans.-time SHOULD SHOULD SHOULD MAY 1815 client-FQDN(1) SHOULD SHOULD SHOULD SHOULD 1816 all others MAY MAY MAY MAY 1818 binding-status 1819 BACKUP 1820 EXPIRED- RELEASED- RESET 1821 Option GRACE GRACE ABANDONED 1822 ------ ------ ----- --------- 1823 assigned-IP-address MUST MUST MUST 1824 binding-status MUST MUST MUST 1825 client-identifier MAY MAY MAY 1826 client-hardware-address MAY MAY MAY(2) 1827 reject-reason MAY MAY MAY 1828 message MAY MAY MAY 1829 lease-expiration-time MUST NOT MUST NOT MUST NOT 1830 potential-expiration-time MUST NOT MUST NOT MUST NOT 1831 grace-expiration-time MUST MUST MUST NOT 1832 start-time-of-state SHOULD SHOULD SHOULD 1833 client-last-trans.-time SHOULD SHOULD MAY 1834 client-FQDN(1) SHOULD SHOULD SHOULD 1835 all others MAY MAY MAY 1837 (1) Only SHOULD appear if client supplies a host name and dynamic DNS 1838 is used. 1840 (2) MUST NOT if binding-status is ABANDONED. 1842 Table 6.4-1: Options used in a BNDACK message 1844 6.5. Bulking for BNDUPD and BNDACK messages 1846 DISCUSSION: 1848 Bulking is planned for this protocol, but it hasn't been specified 1849 in this revision of the draft. Once the draft settles down, we 1850 will specify the bulking approach in detail. 1852 6.6. UPDREQ message format 1854 The update request (UPDREQ) message is used by one server to request 1855 that its partner send it all binding database information that it has 1856 not already seen. 1858 The message type for the UPDREQ message is 9. 1860 The xid in a UPDREQ message MUST be unique among messages transmitted 1861 from this failover endpoint during the life of this connection. 1863 There are no options that MUST appear in an UPDREQALL message. Any 1864 option MAY appear. 1866 6.7. UPDREQALL message format 1868 The update request all (UPDREQALL) message is used by one server to 1869 request that all binding database information be sent in order to 1870 recover from a total loss of its lease state database by the request- 1871 ing server. 1873 The message type for the UPDREQALL message is 7. 1875 The xid in a UPDREQALL message MUST be unique among messages 1876 transmitted from this failover endpoint during the life of this con- 1877 nection. 1879 There are no options that MUST appear in an UPDREQALL message. Any 1880 option MAY appear. 1882 6.8. UPDDONE message format 1884 The update done (UPDDONE) message is used by the responding server to 1885 indicate that all requested updates have been sent by the responding 1886 server as BNDUPD messages and acked by the requesting server using 1887 BNDACK messages. While a BNDACK message MUST have been received for 1888 each IP address that was sent in a BNDUPD message, the BNDACK message 1889 could have contained a reject-reason in order to NAK that specific 1890 update. 1892 Thus, this message confirms that the requesting server has received 1893 and responded to a BNDUPD message for all of the requested updates, 1894 but it does require the requesting server to accept all of the 1895 offered updates. 1897 The message type for the UPDDONE message is 7. 1899 The xid in an UPDDONE message MUST be identical to the xid in the 1900 UPDREQ or UPDREQALL message that initiated the update process. 1902 There are no options that MUST appear in an UPDDONE message. Any 1903 option MAY appear. 1905 6.9. POOLREQ message format 1907 The pool request (POOLREQ) is used by the secondary server to request 1908 an allocation of IP addresses from the primary server. 1910 The message type for the POOLREQ message is 1. 1912 The xid in a POOLREQ message MUST be unique among messages transmit- 1913 ted from this failover endpoint during the life of this connection. 1915 There are no options that MUST appear in a POOLREQ message. Any 1916 option MAY appear. 1918 6.10. POOLRESP message format 1920 The pool response (POOLRESP) is used by the primary server to inform 1921 the secondary server how many IP addresses it was allocated as the 1922 result of a pool request. 1924 The message type for the POOLRESP message is 2. 1926 The xid in the POOLRESP message MUST be identical to the xid in the 1927 POOLREQ message for which this POOLRESP is a response. 1929 The following table shows the options that MUST appear in a POOLRESP 1930 message: 1932 Option 1933 ------ 1934 addresses-transferred MUST 1936 Table 6.10-1: Options used in a STATE message 1938 6.11. CONNECT message format 1940 The connect (CONNECT) message is used by either server to establish a 1941 high level connection with the other server, and to transmit several 1942 important configuration data items between the servers. 1944 The message type for the CONNECT message is 5. 1946 The xid in a CONNECT message MUST be unique among messages transmit- 1947 ted from this failover endpoint during the life of this connection. 1949 The CONNECT message MUST be the first message sent down a newly esta- 1950 blished connection. 1952 The following table summarizes the options that are associated with 1953 the CONNECT message: 1955 role 1957 Option primary secondary 1958 ------ ------ --------- 1959 sending-server-IP-address MUST MUST 1960 server-role MUST MUST 1961 max-unacked-bndupd MUST MUST 1962 receive-timer MUST MUST 1963 current-time MUST MUST 1964 vendor-class-identifier MUST MUST 1965 protocol-version MUST MUST 1966 TLS-request MUST(1) MUST(1) 1967 MCLT MUST MUST NOT 1968 hash-bucket-assignment MUST MUST NOT 1969 all others MAY MAY 1971 (1) If the CONNECT message is being sent on a TLS secured connection, 1972 then there MUST NOT be a TLS-request option. 1974 Table 6.11-1: Options used in a CONNECT message 1976 6.12. CONNECTACK message format 1978 The connect response (CONNECTACK) message is used by a server to 1979 respond to the receipt of a CONNECT message. 1981 The message type for the CONNECTACK message is 6. 1983 The xid in the CONNECTACK message MUST be identical to the xid in the 1984 CONNECT message for which this CONNECTACK is a response. 1986 The following table summarizes the options associated with the CON- 1987 NECTACK message: 1989 Option 1990 ------ 1991 sending-server-IP-address MUST 1992 server-role MUST 1993 max-unacked-bndupd MUST 1994 receive-timer MUST 1995 current-time MUST 1996 vendor-class-identifier MUST 1997 protocol-version MUST 1998 TLS-reply MUST(1) 1999 reject-reason MAY(2) 2000 message MAY 2002 (1) If the CONNECTACK is being sent over an already TLS secured 2003 connection, then the TLS-reply option MUST NOT appear. 2005 (2) Indicates a rejection of the CONNECT message. 2007 Table 6.12-1: Options used in a CONNECTACK message 2009 6.13. STATE message format 2011 The state (STATE) message is used by either server to communicate the 2012 current state of the failover endpoint with the other server. It 2013 MUST be sent immediately after a connection is established with 2014 another server, and it MUST be sent whenever the server's state 2015 changes. 2017 The message type for the STATE message is 10. 2019 The xid in a STATE message MUST be unique among messages transmitted 2020 from this failover endpoint during the life of this connection. 2022 The following table shows the options that MUST appear in a STATE 2023 message: 2025 Option 2026 ------ 2027 sending-state MUST 2028 server-flags MUST 2029 start-time-of-state MUST 2031 Table 6.13-1: Options used in a STATE message 2033 6.14. CONTACT message format 2035 The contact (CONTACT) message is used by either server to verify that 2036 the connection is operational to the other server. 2038 The message type for the CONTACT message is 11. 2040 The xid in a CONTACT message MUST be unique among messages transmit- 2041 ted from this failover endpoint during the life of this connection. 2043 The following table shows the options that MUST appear in a CONTACT 2044 message: 2046 Option 2047 ------ 2048 current-time MUST 2050 Table 6.14-1: Options used in a CONTACT message 2052 7. Protocol Messages 2054 This section contains the detailed definition of the protocol mes- 2055 sages, including the information to include when sending the message, 2056 as well as the actions to take upon receiving the message. 2058 7.1. BNDUPD message 2060 The binding update (BNDUPD) message is used to send the binding data- 2061 base changes to the partner server, and the partner server responds 2062 with a binding acknowledgement (BNDACK) message when it has success- 2063 fully commited those changes to its own stable storage. 2065 The rest of the failover protocol exists to determine whether the 2066 partner server is able to communicate or not, and to enable the 2067 partners to exchange BNDUPD/BNDACK messages in order to keep their 2068 binding databases in stable storage synchronized. 2070 7.1.1. Sending the BNDUPD message 2072 A BNDUPD message SHOULD be generated whenever any binding changes. A 2073 change might be in the binding-status, the lease-expiration-time, or 2074 even just the last-transaction-time. In general, any time a DHCP 2075 client sends in a packet that results in a DHCP server writing to its 2076 stable storage, a BNDUPD message SHOULD be generated. 2078 The BNDUPD (and BNDACK) messages refer to the binding-status of the 2079 IP address, and this protocol defines a series of binding-statuses, 2080 discussed in more detail below. Some servers may not support all of 2081 these binding-statuses, and so in those cases they will not be sent, 2082 and upon receipt a reasonable interpretation should be made. 2084 All BNDUPD messages MUST contain the IP address in the assigned-IP- 2085 address option, and it contains the IP address about which the BNDUPD 2086 message is being sent. 2088 All BNDUPD messages MUST contain the binding-status option, and it 2089 will have one of the values in the following list. This list 2090 discusses the meanings of the various binding-statuses and the infor- 2091 mation that should go into the BNDUPD message because of them. 2093 o ACTIVE 2095 Indicates that the IP address is currently leased to a DHCP 2096 client. 2098 client-hardware-address 2100 The client-hardware-address option MUST appear, and be set from 2101 the MAC address of the DHCP client to which this IP address is 2102 leased. 2104 client-identifier 2106 If the DHCP client to which this IP address is leased used a 2107 client-identifier option to identify itself, then the client- 2108 identifier MUST appear in the BNDUPD message, else it MUST NOT 2109 appear. 2111 lease-expiration-time 2112 The lease-expiration-time option MUST appear, and be set to the 2113 expiration time most recently ACKed to the DHCP client. Note 2114 that the time ACKed to a DHCP client is a lease duration in 2115 seconds, while the lease-expiration-time option in a BNDUPD mes- 2116 sage is an absolute time value. 2118 potential-expiration-time 2120 The potential-expiration-time option MUST appear, and be set to 2121 a value beyond that of the lease-expiration time. This is the 2122 value that is ACKed by the BNDACK message. A server sending a 2123 BNDUPD message MUST be able to recover the potential- 2124 expiration-time sent in every BNDUPD, not just those that 2125 receive a corresponding BNDACK, in order to be able to protect 2126 against possible duplicate allocation of IP addresses after 2127 transitioning to PARTNER-DOWN state. See section 5.2.1 for 2128 details as to why the potential-expiration-time exists and 2129 guidelines for how to decide the value. 2131 o EXPIRED 2133 A binding-status of EXPIRED is used when a client's binding on 2134 an IP address has expired and the server does not wish to imple- 2135 ment an expired-grace period. When the partner server ACK's the 2136 BNDUPD of an EXPIRED IP address, the server sets its internal 2137 state to FREE. It is then available to allocation to any client 2138 of the primary server. 2140 client-hardware-address 2142 There SHOULD be a DHCP client associated with the IP address 2143 whose binding has expired. If there is, then the client- 2144 hardware-address option MUST appear, and be set from the MAC 2145 address of the DHCP client to which this IP address was leased. 2147 client-identifier 2149 There SHOULD be a DHCP client associated with the IP address 2150 whose binding has expired. If there is, then if the DHCP client 2151 to which this IP address was leased used a client-identifier 2152 option to identify itself, then the client-identifier MUST 2153 appear in the BNDUPD message, else it MUST NOT appear. 2155 o RELEASED 2157 A binding-status of RELEASED is used when a DHCP client sends in 2158 a DHCPRELEASE message and the server does not wish to implement 2159 a released-grace period. When the partner server ACK's the 2160 BNDUPD of an RELEASED IP address, the server sets its internal 2161 state to FREE, and it is available for allocation by the primary 2162 server to any DHCP client. 2164 client-hardware-address 2166 There SHOULD be a DHCP client associated with the IP address 2167 whose binding has been released. If there is, then the client- 2168 hardware-address option MUST appear, and be set from the MAC 2169 address of the DHCP client which released this IP address. 2171 client-identifier 2173 There SHOULD be a DHCP client associated with the IP address 2174 whose binding has been released. If there is, then if the DHCP 2175 client which released this IP address used a client-identifier 2176 option to identify itself, then the client-identifier MUST 2177 appear in the BNDUPD message, else it MUST NOT appear. 2179 o FREE 2181 A binding-status of FREE is used when a DHCP server needs to 2182 communicate that an IP address is available for allocation to 2183 another server, but it was not just released, expired, or reset 2184 by a network administrator. When the partner server ACK's the 2185 BNDUPD of an FREE IP address, the server sets its internal state 2186 such that it is available for allocation by any DHCP client. 2188 client-hardware-address 2190 There MAY be a DHCP client associated with the IP address whose 2191 binding is now desired to be FREE. If there is, then the 2192 client-hardware-address option MUST appear, and be set from the 2193 MAC address of the DHCP client which released this IP address. 2195 client-identifier 2197 There MAY be a DHCP client associated with the IP address whose 2198 binding is now desired to be FREE. If there is, then if the 2199 DHCP client which released this IP address used a client- 2200 identifier option to identify itself, then the client-identifier 2201 MUST appear in the BNDUPD message, else it MUST NOT appear. 2203 o EXPIRED-GRACE 2205 Some servers support a grace period after lease expiration, to 2206 handle clock speed differences between clients and servers as 2207 well as to limit the number of times names are removed and 2208 subsequently added to dynamic DNS. 2210 client-hardware-address 2212 There MAY be a DHCP client associated with the IP address whose 2213 binding has now expired. If there is, then the client- 2214 hardware-address option MUST appear, and be set from the MAC 2215 address of the DHCP client which released this IP address. 2217 client-identifier 2219 There MAY be a DHCP client associated with the IP address whose 2220 binding hs now expired. If there is, then if the DHCP client 2221 which most recently leased this IP address used a client- 2222 identifier option to identify itself, then the client-identifier 2223 MUST appear in the BNDUPD message, else it MUST NOT appear. 2225 grace-expiration-time 2227 The grace-expiration-time option MUST appear, and is the length 2228 of time that this server will wait before trying to make the IP 2229 address available after the lease has expired for this IP 2230 address. 2232 o RELEASED-GRACE 2234 Some servers support a grace period after lease release by a 2235 DHCP client, to handle clock speed differences between clients 2236 and servers as well as to limit the number of times names are 2237 removed and subsequently added to dynamic DNS. 2239 client-hardware-address 2241 There MAY be a DHCP client associated with the IP address whose 2242 binding has now been released by sending a DHCPRELEASE. If 2243 there is, then the client-hardware-address option MUST appear, 2244 and be set from the MAC address of the DHCP client which 2245 released this IP address. 2247 client-identifier 2249 There MAY be a DHCP client associated with the IP address whose 2250 binding has been released. If there is, then if the DHCP client 2251 which most recently leased this IP address used a client- 2252 identifier option to identify itself, then the client-identifier 2253 MUST appear in the BNDUPD message, else it MUST NOT appear. 2255 client-hardware-address 2256 There MAY be a DHCP client associated with the IP address whose 2257 binding is now desired to be FREE. If there is, then the 2258 client-hardware-address option MUST appear, and be set from the 2259 MAC address of the DHCP client which released this IP address. 2261 client-identifier 2263 There MAY be a DHCP client associated with the IP address whose 2264 binding is now desired to be FREE. If there is, then if the 2265 DHCP client which released this IP address used a client- 2266 identifier option to identify itself, then the client-identifier 2267 MUST appear in the BNDUPD message, else it MUST NOT appear. 2269 grace-expiration-time 2271 The grace-expiration-time MUST appear, and is the length of time 2272 that this server will wait before trying to make the IP address 2273 available after the lease was released for this IP address 2275 o ABANDONED 2277 An ABANDONED IP address is one that has been considered unusable 2278 by the DHCP subsystem. An IP address for which a valid PING 2279 response was received SHOULD be set to ABANDONED. 2281 client-hardware-address 2283 There SHOULD NOT be a DHCP client associated with an ABANDONDED 2284 IP address. The client-hardware-address option MUST NOT appear 2285 in the BNDUPD message. 2287 client-identifier 2289 There SHOULD NOT be a DHCP client associated with the IP address 2290 whose binding has now been ABANDONED. The client-identifier 2291 option MUST-NOT appear in the BNDUPD message. 2293 o RESET 2295 The RESET value of the binding-status is used to indicate that 2296 this IP address was made available by operator command. 2298 o BACKUP 2300 The BACKUP value of binding-status indicates that this IP 2301 address belongs to the secondary server, and can be allocated by 2302 that server to a DHCP client at any time. 2304 client-hardware-address 2306 There MAY be a DHCP client associated with an BACKUP IP address. 2307 If there is, the client-hardware-address option MUST appear, and 2308 be set from the MAC address of the DHCP client to which this IP 2309 address was most recently associated. 2311 client-identifier 2313 There MAY be a DHCP client associated with this IP address. If 2314 the DHCP client to which this IP address is leased used a 2315 client-identifier option to identify itself, then the client- 2316 identifier MUST appear in the BNDUPD message, else it MUST NOT 2317 appear. 2319 The following option information is generic to all BNDUPD messages, 2320 regardless of the value of the binding-status. 2322 o start-time-of-state 2324 The start-time-of-state SHOULD appear. It is set to the time at 2325 which this IP address first took on the state that corresponds to 2326 the current value of binding-status. 2328 o last-transaction-time 2330 The last-transaction-time value SHOULD appear. This is the time at 2331 which this DHCP server last received a packet from the DHCP client 2332 referenced by the client-identifier or client-hardware-address that 2333 was associated with the IP address referenced by the assigned-IP- 2334 address. 2336 o client-FQDN 2338 If the DHCP server is performing dynamic DNS operations on behalf 2339 of the DHCP client represented by the client-identifier or client- 2340 hardware-address, then it should include a client-FQDN option con- 2341 taining the host name, domain name, and status of any dynamic DNS 2342 operations enabled. 2344 The BNDUPD message SHOULD be sent as soon as possible from the time 2345 that the DHCP client received a response and the lease bindings data- 2346 base is written on stable storage. 2348 7.1.2. Receiving the BNDUPD message 2350 When a server receives a BNDUPD message, it needs to decide how to 2351 processes the message and whether the message represents a conflict 2352 of any sort. The conflict resolution process is used on the receipt 2353 of every BNDUPD message, not just those that are received while in 2354 POTENTIAL-CONFLICT state, in order to increase the robustness of the 2355 protocol. 2357 There are two sorts of conflict. The first, more major conflict, is 2358 when a server receives a BNDUPD message from its partner for an 2359 ACTIVE IP address and finds that the client specified in the BNDUPD 2360 message is different from the client associated with this ACTIVE IP 2361 address in this server's bindings database. 2363 The second sort of conflict is where the receiving server has in its 2364 bindings database the client specified in the BNDUPD message associ- 2365 ated with a different IP address. 2367 These two conflict cases can both occur together with the same BNDUPD 2368 message. 2370 When receiving a BNDUPD message, the server first determines the IP 2371 address from the assigned-IP-address option, and then determines if 2372 there was any client associated with this IP address by looking for 2373 the client-identifier option. If there is no client-identifier 2374 option, then the server looks for a client-hardware-address option, 2375 and ultimately determines the client's identity specified in the 2376 BNDUPD. 2378 The client specified in the BNDUPD message is compared to the client 2379 currently associated with the IP address in this server's bindings 2380 database. If they are the same, continue. If there is no client in 2381 this server's binding database, continue. If there is a client in 2382 this server's bindings database, and it is different from that speci- 2383 fied in the BNDUPD message, a 'client conflict' exists. See the sec- 2384 tion below on conflict resolution. If the client specified in the 2385 BNDUPD message is associated with a different IP address in this 2386 server's bindings database in the same subnet, then an 'IP address 2387 conflict' exists. This does not refer to the case where a single 2388 client has addresses in multiple different subnets or administrative 2389 domains, but rather the case where in the same subnet the client has 2390 as lease on one IP address in one server and on a different IP 2391 address on the other server. See the section below on conflict reso- 2392 lution. 2394 If none of the conflicts mentioned above exist, then develop a time 2395 for both the BNDUPD message and the server's information. 2397 The time for both the BNDUPD and the server's information are 2398 developed independently in the following way: If there is a client- 2399 last-transaction time, use that. If there isn't, but there is a 2400 start-time-of-state, use that. If there isn't, but there is a 2401 client-expiration-time, use that. If there isn't, then use the time 2402 the BNDUPD message was received for a BNDUPD message, and the current 2403 time for the server's information. 2405 Then the server determines the binding-status in the BNDUPD, and 2406 takes the following actions based on binding-status: 2408 (In the following list, to "accept" a BNDUPD means to update the 2409 server's bindings database with the information contained in the 2410 BNDUPD and once that update is complete, send a BNDACK message 2411 corresponding to the BNDUPD message). 2413 o ACTIVE in BNDUPD 2415 If the BNDUPD is LATER than the server's information, accept it, 2416 else reject it. 2418 o EXPIRED or EXPIRED-GRACE in BNDUPD 2420 If the binding-status in the receiving server's bindings data- 2421 base is ACTIVE, then reject the BNDUPD. Otherwise, accept the 2422 BNDUPD. 2424 If the binding-status in the BNDUPD is EXPIRED-GRACE and the 2425 server receiving the BNDUPD does not implement a grace period 2426 for expired leases, then the server MUST set its lease expira- 2427 tion to value held in the grace-expiration in the BNDUPD. 2429 o RELEASED or RELEASED-GRACE in BNDUPD 2431 If the BNDUPD is LATER than the server's information, accept it, 2432 else reject it. 2434 If the binding-status in the BNDUPD is RELEASED-GRACE and the 2435 server receiving the BNDUPD does not implement a grace period 2436 for released leases, then the server MUST set its lease expira- 2437 tion to value held in the grace-expiration in the BNDUPD. 2439 o FREE or BACKUP in BNDUPD 2441 If the binding-status in the receiving server's database is 2442 ACTIVE and the lease-expiration-time has not yet been reached, 2443 reject it, else accept it. 2445 o RESET or ABANDONDED in BNDUPD 2447 Accept it under all circumstances. 2449 7.1.3. Conflict resolution when receiving the BNDUPD message 2451 When a either of the following conflicts exists between the informa- 2452 tion in a BNDUPD message and the information held in the receiving 2453 server's bindings database, it should be resolved in the following 2454 manner: 2456 o client conflict 2458 This is the duplicate IP address allocation conflict. There are 2459 two different clients each allocated the same address. 2461 If times for both exist, use the LATER update, else use the 2462 information from the primary server. 2464 o IP address conflict 2466 An IP address conflict exists when a client on one server is 2467 associated with a one IP address, and on the other server with a 2468 different IP address in the same or a related subnet. If one 2469 binding-status is ACTIVE and the other is anything but ACTIVE, 2470 then the information in the ACTIVE binding SHOULD be used. Oth- 2471 erwise, if times exist, then the LATER SHOULD be used. Other- 2472 wise, if times do not exist, then the information from the pri- 2473 mary server should be used. 2475 7.2. BNDACK message 2477 Every BNDUPD message that is received by a server MUST be responded 2478 to with a corresponding BNDUPD message. The receiving server SHOULD 2479 respond quickly to every BNDUPD message but it MAY choose to respond 2480 preferentially to DHCP client requests instead of BNDUPD messages, 2481 since there is no absolute time period within which a BNDACK must be 2482 sent in response to a BNDUPD message, and DHCP clients frequently do 2483 have time constraints that must be met. 2485 7.2.1. Sending the BNDACK message 2487 The BNDACK message MUST contain the same xid as the corresponding 2488 BNDUPD message. 2490 All of the options which appear in the BNDUPD message MUST be 2491 included in the BNDACK message. The values in the options MAY be 2492 updated to reflect current information on the server sending the 2493 BNDACK. Note that update of this information may be used for infor- 2494 mational purposes, but MUST NOT be assumed to necessarily be recorded 2495 in the stable storage of the server who sent the BNDUPD message 2496 because there is not corresponding ACK of the BNDACK message. Any 2497 information that SHOULD be recorded in the partner server's stable 2498 storage MUST be transmitted in a subsequent BNDUPD. 2500 If the server is accepting the BNDUPD, the BNDACK message includes 2501 only those options that appears in the BNDUPD message. If the server 2502 is rejecting the BNDUPD, the additional option reject-reason MUST 2503 appear in the BNDACK message, and the message option SHOULD appear in 2504 this case containing a human-readable error message describing in 2505 some detail the reason for the rejection of the BNDUPD message. 2507 7.2.2. Receiving the BNDACK message 2509 When a server receives a BNDACK message, if it doesn't contain a 2510 reject-reason option that means that the BNDUPD message was accepted, 2511 and the server which sent the BNDUPD MUST update its stable storage 2512 with the potential-expiration-time value sent in the BNDUPD message 2513 and returned in the BNDACK message. Other values sent in the BNDUPD 2514 message MAY be used as desired. 2516 7.3. UPDREQ message 2518 The update request (UPDREQ) message is used by one server to request 2519 that its partner send it all of the binding database information that 2520 it has not already seen. Since each server is required to keep 2521 track at all times of the binding information the other server has 2522 received and ACKed, one server can request transmission of all un- 2523 ACKed binding database information held by the other server by using 2524 the UPDREQ message. 2526 The UPDREQ message is used whenever the sending server cannot proceed 2527 before it has processed all previously un-ACKed binding update infor- 2528 mation, since the UPDREQ message should yield a corresponding UPDDONE 2529 message. The UPDDONE message is not sent until the server that sent 2530 the UPDREQ message has responded to all of the BNDUPD messages gen- 2531 erated by the UPDREQ message with BNDACK messages. Thus, the sender 2532 of the UPDREQ message can be sure upon receipt of an UPDDONE message 2533 that it has received and commited to stable storage all outstanding 2534 binding database updates. 2536 See section 9, Protcol state transitions, for the details of when the 2537 UPDREQ message is sent. 2539 7.3.1. Sending the UPDREQ message 2541 There are no options for the UPDREQ message. 2543 The UPDREQ message is sent with a unique xid. 2545 7.3.2. Receiving the UPDREQ message 2547 A server receiving an UPDREQ message MUST send all binding database 2548 changes that have not yet been ACKed by the sending server. These 2549 changes are sent as undistinguished BNDUPD messages. 2551 However, the server which received and is processing the UPDREQ mes- 2552 sage MUST track the BNDACK messages that correspond to the BNDUPD 2553 messages triggered by the UPDREQ message and, when they are all 2554 received, the server MUST send an UPDDONE message. 2556 When queuing up the BNDUPD messages for transmission to the sender of 2557 the UPDREQ message, the receiving server MUST honor the value 2558 returned in the max-unacked-bndupd option in the CONNECT or CONNEC- 2559 TACK message that set up the connection with the sending server. It 2560 MUST NOT send more BNDUPD messages without receiving corresponding 2561 BNDACKs than the value returned in max-unacked-bndupd. 2563 7.4. UPDREQALL message 2565 The update request all (UPDREQALL) message is used by one server to 2566 request that its partner send it all of the binding database informa- 2567 tion. This message is used to allow one server to recover from a 2568 failure of stable storage and to restore its binding database in its 2569 entirety from the other server. 2571 A server which sends an UPDREQALL message cannot proceed until all of 2572 its binding update information is restored, and it knows that all of 2573 that information is restored when an UPDDONE message is received. 2575 See section 9, Protcol state transitions, for the details of when the 2576 UPDREQALL message is sent. 2578 7.4.1. Sending the UPDREQALL message 2580 There are no options for the UPDREQALL message. 2582 The UPDREQALL message is sent with a unique xid. 2584 7.4.2. Receiving the UPDREQALL message 2586 A server receiving an UPDREQALL message MUST send all binding data- 2587 base information to the sending server. These changes are sent as 2588 undistinguished BNDUPD messages. 2590 However, the server receiving the UPDREQALL message MUST track the 2591 BNDACK messages that correspond to the BNDUPD messages triggered by 2592 the UPDREQ message and, when they are all received, the server MUST 2593 send an UPDDONE message. 2595 When queuing up the BNDUPD messages for transmission to the sender of 2596 the UPDREQALL message, the receiving server MUST honor the value 2597 returned in the max-unacked-bndupd option in the CONNECT or CONNEC- 2598 TACK message that set up the connection with the sending server. It 2599 MUST NOT send more BNDUPD messages without receiving corresponding 2600 BNDACKs than the value returned in max-unacked-bndupd. 2602 7.5. UPDDONE message 2604 The update done (UPDDONE) message is used by a server receiving an 2605 UPDREQ or UPDREQALL message to signify that it has sent all of the 2606 BNDUPD messages requested by the UPDREQ or UPDREQALL request and that 2607 it has received a BNDACK for each of those messages. 2609 7.5.1. Sending the UPDDONE message 2611 The UPDDONE message SHOULD be sent as soon as the last BNDACK message 2612 corresponding to a BNDUPD message requested by the UPDREQ or 2613 UPDREQALL is received from the server which sent the UPDREQ or 2614 UPDREQALL. 2616 7.5.2. Receiving the UPDDONE message 2618 A server receiving the UPDDONE message knows that all of the informa- 2619 tion that it requested by sending an UPDREQ or UPDREQALL message has 2620 now been sent and that it has recorded this information in its stable 2621 storage. It typically uses that the receipt of an UPDDONE message to 2622 move to a different failover state. See sections 9.5.2 and 9.8.3 for 2623 details. 2625 7.6. POOLREQ message 2627 The pool request (POOLREQ) message is used by the secondary server to 2628 request an allocation of IP addresses from the primary server. It 2629 MUST be sent by a secondary server to a primary server to request IP 2630 address allocation by the primary. The IP addresses allocated are 2631 transmitted using normal BNDUPD messages from the primary to the 2632 secondary. 2634 The POOLREQ message SHOULD be sent from the secondary to the primary 2635 whenever the secondary transitions into NORMAL state. It SHOULD 2636 periodically be resent in order that any change in the number of 2637 available IP addresses on the primary be reflected in the pool on the 2638 secondary. 2640 7.6.1. Sending the POOLREQ message 2642 The POOLREQ message has no options. It must be sent with a unique 2643 xid. 2645 7.6.2. Receiving the POOLREQ message 2647 When a primary server receives a POOLREQ message it SHOULD examine 2648 the binding database and determine how many IP addresses the secon- 2649 dary server should have, and set these IP addresses to BACKUP state. 2650 It SHOULD then send BNDUPD messages concerning all of these IP 2651 addresses to the secondary server. 2653 Servers frequently have several kinds of IP addresses available on a 2654 particular network segment. The failover protocol assumes that both 2655 primary and secondary servers are configured in such a way that each 2656 knows the type and number of IP addresses on every network segment 2657 participating in the failover protocol. The primary server is 2658 responsible for allocating the secondary server the correct propor- 2659 tion of available IP addresses of each kind, and the secondary server 2660 is responsible for being configured in such a way that it can tell 2661 the kind of every IP address based solely on the IP address itself. 2663 A primary server MUST keep track of how many IP addresses were allo- 2664 cated as a result of processing the POOLREQ message, and send that 2665 number in the POOLRESP message. 2667 A primary server MAY choose to defer processing a POOLREQ message 2668 until a more convenient time to process it, but it should not depend 2669 on the secondary server to retransmit the POOLREQ message in that 2670 case. 2672 If a secondary server receives a POOLREQ message it SHOULD report an 2673 error. 2675 7.7. POOLRESP message 2677 A primary server sends a POOLRESP message to a secondary server after 2678 the allocation process for available addresses to the secondary 2679 server is complete. Typically this message will precede some of the 2680 BNDUPD messages that the primary uses to send the actual allocated IP 2681 addresses to the secondary. 2683 7.7.1. Sending the POOLRESP message 2685 The POOLRESP message MUST contain the same xid as the corresponding 2686 POOLREQ message. 2688 The only option which MUST appear in a POOLREQ message is: 2690 o addressed-transferred 2692 The number of addresses allocated to the secondary server by the 2693 primary server as a result of a POOLREQ is contained in the 2694 addresses-transferred option in a POOLRESP message. Note this 2695 is the number of addresses that are transferred to the secondary 2696 in the primary's binding database as a result of the correspond- 2697 ing POOLREQ message, and that it may be some time before they 2698 can all be transmitted to the secondary server through the use 2699 of BNDUPD messages. 2701 7.7.2. Receiving the POOLRESP message 2703 When a secondary server receives a POOLRESP message, it SHOULD send 2704 another POOLRESP message if the value of the addresses-transferred 2705 option is non-zero. 2707 Typically, no other action is taken on the reception of a POOLRESP 2708 message. 2710 7.8. CONNECT message 2712 The connect message is used to establish an applications level con- 2713 nection over a newly created TCP connection. It gives the source 2714 information for the connection, and some important configuration 2715 information. It may be sent by either primary or secondary server. 2716 It is sent by the initiator of a TCP connection. 2718 7.8.1. Sending the CONNECT message 2720 The CONNECT message MUST be the first message sent by the initiator 2721 of a TCP connection after the establishment of a new TCP connection 2722 with another server participating in the failover protocol. 2724 The xid of the CONNECT message must be unique. 2726 The IP address of the sending server MUST be placed in the sending- 2727 server-IP-address option. This information is placed in an option 2728 inside of the packet in order to allow the identity of the sender to 2729 be covered by a shared secret. 2731 The role of the sending failover endpoint (i.e., either primary or 2732 secondary) MUST be placed in the server-role option. 2734 The current time MUST be placed in the current-time option. 2736 The number of BNDUPD messages the server can accept without blocking 2737 the TCP connection MUST be placed in the max-unacked-bndupd option. 2738 This MUST be a number equal to or greater than 1, SHOULD be a number 2739 greater than 10, and SHOULD be a number less than 100. 2741 The length of the receive timer (tReceive, see section 8.3) MUST be 2742 placed in the receive-timer option. 2744 If the sending server is a primary server, then the MCLT MUST be 2745 placed in the MCLT option. 2747 If the sending server is a primary server, then the hash-bucket- 2748 assignment option MUST be included in the CONNECT message. The value 2749 of the hash-bucket-assignment option is determined from the specific 2750 buckets that the primary server has determined that the secondary 2751 server MUST service as part of the load-balancing algorithm. The way 2752 in which the primary server determines this information is outside 2753 the scope of this protocol definition. The primary server is SHOULD 2754 be able to be configured with a percentage of clients that the secon- 2755 dary server will be instructed to service, and the primary server 2756 SHOULD convert that percentage value into a corresponding set of bits 2757 in the hash-bucket-assignment option that are set to a 1, indicating 2758 that the secondary server MUST service clients which map to those 2759 hash buckets. 2761 The vendor class identifier MUST be placed in the vendor-class- 2762 identifier option. 2764 The protocol-version option MUST be included in every CONNECT mes- 2765 sage. The current value of the protocol version is 1. 2767 The TLS-request option MUST be sent and contains the desired TLS con- 2768 nection request as well as information concerning whether TLS is sup- 2769 ported. If this CONNECT message is being sent over a already 2770 created TLS connection, the TLS-request MUST NOT appear. 2772 7.8.2. Receiving the CONNECT message 2774 When a server receives a TCP connection on the failover port, it 2775 should wait for a CONNECT message. 2777 When a server receives a CONNECT message it should: 2779 1. Record the time at which the message was received. 2781 2. Examine the protocol-version option, and decide if this server 2782 is capable of interoperating with another server running that 2783 protocol version. If not, then send the CONNECTACK message 2784 with the appropriate reject-reason. The server MUST include 2785 its protocol-version in the CONNECTACK message. 2787 3. Examine the TLS-request option. Figure out the TLS-reply 2788 value based on the capabilities and configuration of this 2789 server, and save it for the CONNECTACK message. If the 2790 results of the TLS negotiation result in a connection rejec- 2791 tion, then go immediately to send the CONNECTACK message. 2793 The possibilities are: 2795 CONNECT CONNECTACK 2796 TLS-request TLS-reply 2797 Reject 2798 req acc t1 Reason Comments 2799 --- --- -- ------ -------- 2800 0 0 0 2801 0 0 1 11 receiver requires TLS 2802 0 1 0 2803 0 1 1 2804 1 0 - request doesn't make sense 2805 1 1 0 2806 1 1 1 2807 2 0 - request doesn't make sense 2808 2 1 0 9 or 10 receiver won't do TLS 2809 2 1 1 2811 4. Check to see if there is a message-digest option in the CON- 2812 NECT message. If there was, and the server does not support 2813 message-digests, then reject the connection with the appropri- 2814 ate reject-reason in the CONNECTACK. 2816 5. Determine if the sender (from the sending-server-IP-address 2817 option) and the role of the sender (from the server-role) 2818 option represents a server with which the receiver was config- 2819 ured to engage in failover activity. 2821 If not, then the receiving server should reject the CONNECT 2822 request by sending a CONNECTACK message with a reject-reason 2823 value of: 8, invalid failover partner. 2825 If it is, then the receiving failover endpoint should be 2826 determined. 2828 6. Decide if the time delta between the sending of the packet, in 2829 the current-time option, and the receipt of the packet, 2830 recorded in step 1 above, is acceptable. A server MAY require 2831 an arbitrarily small delta in time values in order to set up a 2832 failover connection with another server. 2834 If the delta between the time values is too great, the server 2835 should reject the CONNECT request by sending a CONNECTACK mes- 2836 sage with a reject-reason of 4, time mismatch too great. 2838 If the time mismatch is not considered too great then the 2839 receiving server MUST record the delta between the servers. 2840 The receiving server MUST use this delta to correct all of the 2841 absolute times received from the other server in all time- 2842 valued options. Note that server's can participate in fail- 2843 over with arbitrarily great time mismatches, as long as it is 2844 more or less constant. 2846 7. If the receiving server is a secondary server, it MUST examine 2847 the MCLT option in the CONNECT request and use the value of 2848 the MCLT as the MCLT for this failover endpoint. 2850 A receiving secondary server SHOULD be able to operate with 2851 any MCLT sent by the primary, but if it cannot, then it 2852 should send a CONNECTACK with a reject-reason of 5, MCLT 2853 mismatch. 2855 8. The receiving server MAY use the vendor-class-identifier to do 2856 vendor specific processing. 2858 7.9. CONNECTACK message 2860 The CONNECTACK message is sent to accept or reject a CONNECT message. 2861 It is sent by the server which accepted the TCP connection and 2862 received a CONNECT message. 2864 7.9.1. Sending the CONNECTACK message 2866 The xid of the CONNECTACK message must be that of the corresponding 2867 CONNECT message. 2869 The IP address of the sending server MUST be placed in the sending- 2870 server-IP-address option. This information is placed in an option 2871 inside of the packet in order to allow the identity of the sender to 2872 be covered by a shared secret. 2874 The role of the sending failover endpoint (i.e., either primary or 2875 secondary) MUST be placed in the server-role option. 2877 The current time MUST be placed in the current-time option. 2879 The protocol-version option MUST be included in every CONNECTACK mes- 2880 sage. The current value of the protocol version is 1. 2882 If the connection has been rejected, the reject-reason option MUST be 2883 placed in the CONNECTACK message with an appropriate reason, and a 2884 message option SHOULD be included with a human-readable error message 2885 describing the reason for the rejection in some detail. If the 2886 reject-reason option appears, then the remaining options listed below 2887 do not appear. 2889 The results of the TLS negotiation MUST be placed in the TLS-reply 2890 option. If this CONNECTACK message is being sent over an already TLS 2891 secured connection, then there MUST NOT be a TLS-reply option. 2893 If there was a message-digest option in the CONNECT message, then 2894 there MUST be a message-digest in the CONNECTACK message if it does 2895 not contain a reject-reason. 2897 The number of BNDUPD messages the server can accept without blocking 2898 the TCP connection MUST be placed in the max-unacked-bndupd option. 2899 This SHOULD be a number greater than 10, and SHOULD be a number less 2900 than 100. 2902 The length of the receive timer (tReceive, see section 8.3) MUST be 2903 placed in the receive-timer option. 2905 If the sending server is a primary server, then the MCLT MUST be 2906 placed in the MCLT option. 2908 The vendor class identifier MUST be placed in the vendor-class- 2909 identifier option. 2911 If the server is rejecting the CONNECT message, then the reject- 2912 reason option MUST appear. A message option MAY appear to give a 2913 human readable version of the rejection reason. 2915 After sending a CONNECTACK message, the server MUST send a STATE mes- 2916 sage. 2918 After sending a CONNECTACK message, the server MUST start two timers 2919 for the connection: tSend and tReceive. The tSend timer SHOULD be 2920 approximately 20 percent of the time in the receiver-timer option in 2921 the corresponding CONNECT message. The tReceive timer SHOULD be the 2922 time sent in the receiver-timer option in the CONNECTACK message. 2924 The tReceive timer is reset whenever a message is received from this 2925 TCP connection. If it ever expires, the TCP connection is dropped 2926 and communications with this partner is considered not ok. 2928 The tSend timer is reset whenever a packet is sent over this connec- 2929 tion. When it expires, a CONTACT message MUST be sent. 2931 7.9.2. Receiving the CONNECTACK message 2933 When a CONNECTACK message is received, the following actions should 2934 be taken: 2936 1. Record the time the packet was received. 2938 2. Check to see if there is a reject-reason option in the CONNEC- 2939 TACK message. If not, continue with step 3. If there is a 2940 reject-reason option, the server SHOULD report the error code. 2941 If a message option appears a server SHOULD display the string 2942 from the message option in a user visible way. The server 2943 MUST close the connection if a reject-reason option appears. 2945 3. Check to see if the xid on the CONNECTACK matches an outstand- 2946 ing CONNECT message on this TCP connection. 2948 4. Check the value of the TLS-reply option, and if it was 1, then 2949 skip processing of the rest of the CONNECTACK message, and 2950 immediately enter into TLS connection setup. 2952 If it does not, a server SHOULD report an error. 2954 5. Examine the value of the protocol-version option. If this 2955 server is able to establish connections with another server 2956 running this protocol version, then continue, else close the 2957 connection. 2959 6. Check to see if the sending-server-IP-address and server-role 2960 in the CONNECTACK message correspond to the failover endpoint 2961 for which this TCP connection was created. 2963 If it was not, the server MUST drop the TCP connection and 2964 SHOULD report an error. 2966 7. Decide if the time delta between the sending of the packet, in 2967 the current-time option, and the receipt of the packet, 2968 recorded in step 1 above, is acceptable. A server MAY require 2969 an arbitrarily small delta in time values in order to set up a 2970 failover connection with another server. 2972 If the delta between the time values is too great, the server 2973 should drop the TCP connection. 2975 If the time mismatch is not considered too great then the 2976 receiving server MUST record the delta between the servers. 2977 The receiving server MUST use this delta to correct all of the 2978 absolute times received from the other server in all time- 2979 valued options. Note that the failover protocol is con- 2980 structed so that two servers can be failover partners with 2981 arbitrarily great time mismatches. 2983 8. If the receiving server is a secondary server, it MUST examine 2984 the MCLT option in the CONNECT request and use the value of 2985 the MCLT as the MCLT for this failover endpoint. 2987 A receiving secondary server SHOULD be able to operate with 2988 any MCLT sent by the primary, but if it cannot, then it MUST 2989 drop the TCP connection. 2991 9. The receiving server MAY use the vendor-class-identifier to do 2992 vendor specific processing. 2994 10. After accepting a CONNECTACK message, the server MUST send a 2995 STATE message. 2997 After receiving a CONNECTACK message, the server MUST start 2998 two timers for the connection: tSend and tReceive. The tSend 2999 timer SHOULD be approximately 20 percent of the time in the 3000 receiver-timer option in the corresponding CONNECTACK message. 3001 The tReceive timer SHOULD be set to the time sent in the 3002 receiver-timer option in the CONNECT message. 3004 The tReceive timer is reset whenever a message is received 3005 from this TCP connection. If it ever expires, the TCP connec- 3006 tion is dropped and communications with this partner is con- 3007 sidered not ok. 3009 The tSend timer is reset whenever a packet is sent over this 3010 connection. When it expires, a CONTACT message MUST be sent. 3012 7.10. STATE message 3014 The state (STATE) message is used to communicate the current failover 3015 state to the partner server. 3017 The STATE message MUST be sent after sending a CONNECTACK message 3018 that didn't contain a reject-reason option, and MUST be sent after 3019 receiving a CONNECTACK message without a reject-reason option. 3021 A STATE message MUST be sent whenever the failover endpoint changes 3022 its failover state and a connection exists to the partner. 3024 The STATE message requires no response from the failover partner. 3026 7.10.1. Sending the STATE message 3028 The current failover state is placed in the server-state option and 3029 the current state of the STARTUP flag is placed in the server-flags 3030 option. 3032 The message is sent with a unique xid. 3034 A server SHOULD only send the STATE message either when the connec- 3035 tion is created (i.e, after sending or receiving a CONNECTACK message 3036 with no reject-reason option), or when there is a change from the 3037 values sent in a previous STATE message. 3039 7.10.2. Receiving the STATE message 3041 Every STATE message SHOULD indicate a change in state or a change in 3042 the flags. 3044 When a STATE message is received, any state transitions specified in 3045 section 9 are taken. 3047 No response to a STATE message is required. 3049 7.11. CONTACT message 3051 The contact (CONTACT) message is sent to verify communications 3052 integrity with a failover partner. The CONTACT message is sent when 3053 no messages have been sent to the failover partner for a specified 3054 period of time. This is determined by the tSend timer expiring (see 3055 section 8.3). 3057 7.11.1. Sending the CONTACT message 3059 The current time is placed in the current-time option, and the CON- 3060 TACT message is sent. 3062 7.11.2. Receiving the CONTACT message 3064 When a CONTACT message is received, the tReceive timer is reset (as 3065 it is with any message that is received). 3067 A server MAY use the time in the current-time option and the time 3068 recorded above to refine the delta time calculations between the 3069 servers. 3071 8. Connection Management 3073 Servers participating in the failover protocol communicate over TCP 3074 connections. These TCP connections are used both to transmit bind- 3075 ing information from one server to another as well as to allow each 3076 server to determine whether communications is possible with the other 3077 server. 3079 Central to the operation of the failover protocol is a notion of 3080 "communications okay" or "communications failed". Failover state 3081 transitions are taken in many cases when the status of communications 3082 with the partner changes, and the existence or non-existence of a TCP 3083 connections between failover endpoints is used to determine if com- 3084 munications is "okay" or "failed". 3086 A single TCP connection exists which connects two failover endpoints. 3088 8.1. Connection granularity 3090 There exists one TCP connection between each set of failover end- 3091 points. See section 5.1.1 for an explanation of failover endpoint. 3093 There are a maximum of two TCP connections between any two servers 3094 implementing the failover protocol, one for each of the possible 3095 failover endpoints between these two servers. There is a minimum of 3096 one TCP connection between one server and every other failover server 3097 with which it implements the failover protocol. 3099 8.2. Creating the TCP connection 3101 Every server implementing the failover protocol MUST listen on port 3102 647 for incoming failover TCP connections. The source port of the 3103 TCP connection is unimportant. 3105 Every server implementing the failover protocol SHOULD attempt to 3106 connect to all of its partners periodically, where the period is 3107 implementation dependent and SHOULD be configurable. In the event 3108 that a connection has been rejected by a CONNECTACK message with a 3109 reject-reason option contained in it, a server SHOULD reduce the fre- 3110 quency with which it attempts to connect to that server but it SHOULD 3111 continue to attempt to connect periodically. 3113 Once a connection is established, the first message sent across the 3114 connection MUST be a CONNECT message. This message establishes the 3115 identity of the failover endpoint making the connection. 3117 Every CONNECT message includes a TLS-request option, and if the CON- 3118 NECTACK message does not reject the CONNECT message and the TLS-reply 3119 option says TLS MUST be used, then the servers will enter into TLS 3120 negotiation. 3122 Once that negotiation is complete, then the server MUST resend the 3123 CONNECT message on the newly secured TLS connection and then wait for 3124 the CONNECTACK message in response. The TLS-request and TLS-reply 3125 options MUST have the same values in this second CONNECT and CONNEC- 3126 TACK message has they had in the first messages. 3128 The second message sent over a new connection is a STATE message. 3129 Upon the receipt of this message, the receiver can consider communi- 3130 cations up. 3132 It is entirely possible that two servers will attempt to make connec- 3133 tions to each other essentially simultaneously, and then each will 3134 send a CONNECT message down the new connection. In this case each 3135 server will receive a CONNECT message on one connection having 3136 already sent a CONNECT message on the other connection. In the event 3137 that the primary server receives a CONNECT message from the secondary 3138 server either while waiting for a CONNECTACK message from a secondary 3139 server or when it has a valid connection open to a secondary server, 3140 it will close the connection on which the CONNECT message was 3141 received. 3143 8.3. Using the TCP connection for determining communications status 3145 The TCP connection is used to determine the communications status of 3146 the other server, i.e., communications-ok, or communications- 3147 interrupted. 3149 Three things must happen for a server to consider that communications 3150 are ok with respect to another server: 3152 1. A TCP connection must be established to the other server. 3154 2. A CONNECT message must be received and a CONNECTACK message 3155 sent in response. The CONNECT message is used to determine 3156 the identify of the failover endpoint of the other end of the 3157 TCP connection -- without it, the failover endpoint cannot be 3158 uniquely determined. Without knowledge of the failover end- 3159 point, then the entity with which communications is ok is 3160 undetermined. 3162 3. A STATE message must be received from the other server over 3163 the connection. This STATE message initializes important 3164 information necessary to the operation of the state machine 3165 the governs the behavior of this failover endpoint. 3167 There are two ways that a server can determine that communications 3168 has failed: 3170 1. The TCP connection can go down, yielding an error when 3171 attempting to send a message. This will happen at least as 3172 often as the period of the tSend timer. 3174 2. The tReceive timer can expire. 3176 In either of these cases, communications is considered interrupted. 3178 Several difficulties arise when trying to use one TCP connection for 3179 both bulk data transfer as well as to sense the communications status 3180 of the other server. One aspect of the problem stems from the dif- 3181 ferent requirements of both uses. The bulk data transfer is of 3182 course critically important to the protocol, but the speed with which 3183 it is processed is not terribly significant. It might well be 3184 minutes before a BNDUPD message is processed, and while not optimal, 3185 such an occasional delay doesn't compromise the correctness of the 3186 protocol. However, the speed with which one server detects the other 3187 server is up (or, more importantly, down) is more highly constrained. 3188 Generally one server should be able to detect that the other server 3189 is not communicating within a minute or less. 3191 These differing time constraints makes it difficult to use the same 3192 TCP connection for data transfer as well as to sense communications 3193 integrity. See section 3.5 for additional details on TCP. 3195 The solution to this problem is to require a that some message be 3196 received by each end of the connection within a limited time or that 3197 the connection will be considered down. If no messages have been 3198 sent recently, then a CONTACT message is sent. 3200 In the case where there is no data queued to be sent, this is not a 3201 problem, but in the case where there is data queued to be sent to the 3202 partner, then the CONTACT message will not actually be transmitted 3203 until the queued data is sent. Section 3.5 explains why waiting for 3204 TCP to determine that the connection is down is not acceptable, and 3205 leads a requirement that the receiving server never block the sending 3206 server from sending CONTACT packets. 3208 In order to meet this requirement, each server tells the other server 3209 the number of outstanding BNDUPD messages that it will accept. The 3210 receiving server is required to always be able to accept that many 3211 BNDUPD messages off of the connection's input queue even if it cannot 3212 process them immediately, and to accept all other messages immedi- 3213 ately. 3215 Thus, the sending server's TCP is never blocked from sending a mes- 3216 sage except for very short periods, less than a few seconds unless 3217 the network connection itself has problems. In this case, if the 3218 CONTACT messages don't make it to the partner then the partner will 3219 close the connection. 3221 8.4. Using the TCP connection for binding data 3223 Binding data, in the form of BNDUPD messages and BNDACK messages to 3224 respond to them, are sent across the TCP connection. 3226 In order to support timely detection of any failure in the partner 3227 server, the TCP connection MUST NOT block for more than a very short 3228 time, on the order of a few seconds. Therefore, a server that is 3229 sending BNDUPD messages MUST send only a restricted number before 3230 receiving BNDACK messages about previous messages sent. 3232 The number of outstanding BNDUPD messages that each server will 3233 accept without causing TCP to block transmission of additional data 3234 (i.e, CONTACT messages) is sent by each server in the CONNECT and 3235 CONNECTACK messages in the max-unacked-bndupd option. 3237 8.5. Using the TCP connection for control messages 3239 The TCP connection is used for control messages: POOLREQ, UPDREQ, 3240 STATE, UPDREQALL and the corresponding reply messages: POOLRESP, 3241 UPDDONE. A server MUST immediately accept all of these messages from 3242 the TCP connection. A server MUST immediately accept any BNDACK 3243 which is received as well. 3245 8.6. Losing the TCP connection 3247 When the TCP connection is lost, then communications is not ok with 3248 the other server. A server which has lost communications SHOULD 3249 immediately attempt to reconnect to the other server, and should 3250 retry these connection attempts periodically. 3252 Any BNDUPD or other messages that have been received but not yet pro- 3253 cessed from the partner SHOULD be processed as soon as possible. 3255 9. Protocol States 3257 This section discusses the various states that a failover endpoint may 3258 take, and the server actions required when entering the state, operating 3259 in the state, and leaving the state, as well as the events that cause 3260 transitions out of the state into another state. 3262 The state transition diagram in Figure 9.2-1 is relevant for this 3263 section. In the event that the textual description of a state differs 3264 from the state transition diagram, the textual description is to be con- 3265 sidered authoritative. This is the common state transition diagram for 3266 both servers in a failover pair. 3268 9.1. Server Initialization 3270 When a server starts it starts out in STARTUP state. See section 9.4 3271 below for details. 3273 9.2. Server State Transitions 3275 Whenever a server transitions into a new state, it MUST record the 3276 state and the time at which it entered that state in stable storage. 3277 If communications is "ok", it MUST also send a STATE message to its 3278 failover partner. 3280 Figure 9.2-1 is the diagram of the server state transitions. The 3281 remainder of this section contains information important to the 3282 understanding of that diagram. 3284 The server stays in the current state until all of the actions speci- 3285 fied on the state transition are complete. If communications fails 3286 during one of the actions, the server simply stays in the current 3287 state and attempts a transition whenever the conditions for a transi- 3288 tion are later fulfilled. 3290 In the state transition diagram below, the "+" or "-" in the upper 3291 right corner of each state is a notation about whether communication 3292 is ongoing with the other server. 3294 The legend "responsive", "balanced", or "unresponsive" in each state 3295 indicates whether the server is responsive to all DHCP client 3296 requests, running in load balanced mode, or totally unresponsive in 3297 the respective state. The terms "responsive" and "unresponsive" have 3298 the obvious meanings, while "balanced" means that a DHCP server may 3299 respond to all DHCPREQUEST messages that are RENEWAL or REBINDING, 3300 and to all other messages from clients for which the load balancing 3301 algorithm indicates that it MUST respond to. See sections 5.3 and 3302 9.6.2 for details on load balancing. 3304 In the state transition diagram below, when communication is reesta- 3305 blished between the two servers, each must record the state of the 3306 partner when communication was restored. State transitions on one 3307 server in some cases imply state transitions on the partner server, 3308 so a record of the current state of the partner server must be kept 3309 by each server. 3311 If the state of the partner changes while communicating a server 3312 moves through the communications-failed transition and into whatever 3313 state results. It then immediately moves through whatever state 3314 transition is appropriate given the current state of the partner 3315 server. A server performing this operation SHOULD NOT drop the TCP 3316 connection to its partner. 3318 DISCUSSION: 3320 The point of this technique is simplicity, both in explanation of 3321 the protocol and in its implementation. The alternative to this 3322 technique of memory of partner state and automatic state transi- 3323 tion on change of partner state is to have every state in the fol- 3324 lowing diagram have a state transition for every possible state of 3325 the partner. With the approach adopted, only the states in which 3326 communications are reestablished require a state transition for 3327 each possible partner state. 3329 The current state of a server MUST be recorded in stable storage and 3330 thus be available to the server after a server restart. 3332 +---------------+ V +--------------+ 3333 | RECOVER - | | | STARTUP - | 3334 |(unresponsive) | +->|(unresponsive)| 3335 +---------------+ +--------------+ 3336 Comm. OK +-----------------+ 3337 Other State:-RECOVER | PARTNER DOWN - |<-----+ 3338 | | | (responsive) | | 3339 All POTENTIAL- +-----------------+ | 3340 Others CONFLICT------------ | --------+ ^(see | 3341 | Comm. OK | | 9.8.3)| 3342 UPDREQ(ALL) Other State: | +-----+ | 3343 Wait UPDDONE | | | Comm. | | 3344 Wait MCLT from fail RECOVER All Others| Failed | | 3345 +--------------+ | V V | | | 3346 |RECOVER-DONE +| +--+ +--------------+ | | 3347 |(unresponsive)| | | POTENTIAL + |<--+ | 3348 +--------------+ Wait for +>| CONFLICT | | 3349 Comm. OK Other | |(unresponsive)|<--- | --+ 3350 +--Other State:-+ State: | +--------------+ | | 3351 | | | RECOVER | | | | 3352 | All POTENT. DONE | Resolve Conflict | | 3353 | Others: CONFLICT-- | ----+ (see 9.8) | | 3354 | Wait for V V | | 3355 | Other State: NORMAL +-----------------+ | | 3356 | V | NORMAL + | External | | 3357 | +--+----------+-->| (balanced) |-Command-->+ | 3358 | ^ ^ +-----------------+ | | 3359 | | | | | | 3360 | Wait for Comm. OK Comm. External | 3361 | Other Other Failed Command | 3362 | State: State: | or | | 3363 |RECOVER-DONE NORMAL Start Safe Safe | | 3364 | | COMM. INT. Period Timer Period | | 3365 | Comm. OK. | V expiration | 3366 | Other State: | +------------------+ | | 3367 | RECOVER +--| COMMUNICATIONS - |-----------+ | 3368 V +-------------| INTERRUPTED | Comm. OK | 3369 RECOVER | (responsive) |--Other State:-+ 3370 RECOVER-DONE--------->+------------------+ All Others 3372 Figure 9.2-1: Server state diagram. 3374 9.3. STARTUP state 3376 The STARTUP state affords an opportunity for a server to probe its 3377 partner server, before starting to service DHCP clients. 3379 DISCUSSION: 3381 Without the STARTUP state, a server would likely start in a state 3382 derived from its previously stored state (held in stable storage), 3383 if any. However, this may be inconsistent with the current state 3384 of the partner. The STARTUP state affords the opportunity for a 3385 server to potentially learn the partner's state and determine if 3386 that state is consistent with its derived starting state or 3387 whether some significant state change has occurred at the partner 3388 that forces the server to start in another state. This is 3389 especially critical if significant time has elapsed while the 3390 server was down. 3392 9.3.1. Operation while in STARTUP state 3394 Whenever a server is in STARTUP state, it MUST be unresponsive to 3395 DHCP client requests, and so the time spent in the STARTUP state is 3396 necessarily short, typically on the order of a few seconds to a few 3397 tens of seconds. The exact time spent in the STARTUP state is imple- 3398 mentation dependent, and the primary and secondary server are not 3399 required to spend the same amount of time in the STARTUP state. 3401 Whenever a STATE message is sent to the partner while in STARTUP 3402 state the STARTUP bit MUST be set in the server-flags option and the 3403 previously recorded failover state MUST be placed in the server-state 3404 option. 3406 9.3.2. Transition out of STARTUP state 3408 Each server starts out in startup state every time it initializes 3409 itself, and performs the following algorithm as part of its initiali- 3410 zation: 3412 1. Do not send any messages until step 5. 3414 2. Is there any record in stable storage of a previous failover 3415 state? If yes, set previous-state to the last recorded state 3416 in stable storage, and continue with step 3. 3418 Is there any configuration information that indicates that 3419 this server was previously running but lost its stable 3420 storage? Such information must typically come from some 3421 administrative intervention, since it is difficult for a 3422 server to distinguish first startup from a startup after it 3423 has lost its stable storage. If yes, then set the previous- 3424 state to RECOVER, and set the time-of-failure to whatever time 3425 was configured, and go on to step 3. This time-of-failure 3426 will be used in the transition out of the RECOVER state into 3427 the RECOVER-DONE state, below. 3429 If there is no record of any previous failover state in stable 3430 storage nor of any previous operational activity for this 3431 server, then set the previous-state to PARTNER-DOWN if this 3432 server is a primary and RECOVER if this server is a secondary, 3433 and set the time-of-failure to a time before the maximum- 3434 client-lead-time before now. If using standard Posix times, 0 3435 would typically do quite well. 3437 3. Is the previous-state NORMAL? If yes, set the previous-state 3438 to COMMUNICATIONS-INTERRUPTED. 3440 4. Start the STARTUP state timer. The time that a server remains 3441 in the STARTUP state (absent any communications with its 3442 partner) is implementation dependent and SHOULD be configur- 3443 able. It SHOULD be long enough to for a TCP connection to be 3444 created to a heavily loaded partner across a slow network. 3446 5. Attempt to create a TCP connection to the failover partner. 3447 See section 8.2. 3449 6. Wait for "communications okay", i.e., the process discussed in 3450 section 8.2 "Creating the TCP Connection", to complete, 3451 including the receipt of a STATE message from the partner. 3453 When and if communications become "okay", clear the STARTUP 3454 flag, and set the current state to the previous-state. 3456 If the partner is in PARTNER-DOWN state, and if the time at 3457 which it entered PARTNER-DOWN state (as receive in the start- 3458 time-of-state option in the STATE message) is later than the 3459 last recorded time of operation of this server, then set the 3460 current state to RECOVER. 3462 Then, transition to the current state and take the "communica- 3463 tions okay" state transition based on the current state of 3464 this server and the partner. 3466 7. If the startup time expires, take an implementation dependent 3467 action: The server MAY go to the previous-state, or the 3468 server MAY wait. 3470 Reasons to go to previous-state and begin processing: 3472 If the current server is the only operational server, then if 3473 it waits, there will be no operational DHCP servers. This 3474 situation could occur very easily where one server fails and 3475 then the other crashes and reboots. If the rebooting server 3476 doesn't start processing DHCP client requests without first 3477 being in communication with the other server, then the level 3478 of DHCP redundancy is not particularly high. This is an 3479 appropriate approach if the possibility of partition is low, 3480 or if the safe period expiration time is well beyond the time 3481 at which an operator would notice and react to a partition 3482 situation. It is also quite appropriate if the safe period 3483 will never expire. 3485 Reasons to wait: 3487 If the current server has been down for longer than the 3488 maximum-client-lead-time, and it is partitioned from the other 3489 server, then when it returns it will attempt to use its own 3490 available addresses to allocate to new DHCP clients, and the 3491 other server may well be in PARTNER-DOWN state and may have 3492 already allocated some of those available addresses to DHCP 3493 clients. In cases where the possibility of partition is high, 3494 and the safe period expiration time is less than the likely 3495 operator reaction time, this is a good approach to use. 3497 9.4. PARTNER-DOWN state 3499 PARTNER-DOWN state is a state either server can enter. When in this 3500 state, the server does not assume that the other server could still 3501 be operating and servicing a different set of clients, but instead 3502 assumes that it is the only server operating. For this reason, only 3503 one server should be operating in this state at a time. 3505 9.4.1. Upon entry to PARTNER-DOWN state 3507 No special actions are required when entering PARTNER-DOWN state. 3509 The server should continue to attempt to connect to the partner 3510 periodically. 3512 9.4.2. Operation while in PARTNER-DOWN state 3514 A server in PARTNER-DOWN state MUST respond to DHCP client requests. 3515 It will allow renewal of all outstanding leases on IP addresses, and 3516 will allocate IP addresses from its own pool, and after a fixed 3517 period of time (the MCLT interval) has elapsed from entry into 3518 PARTNER-DOWN state, it will allocate IP addresses from the set of all 3519 available IP addresses. 3521 Once a server has entered NORMAL state, the PARTNER-DOWN state is 3522 entered only on command of an external agency (typically an adminis- 3523 trator of some sort) or after the expiration of an externally config- 3524 ured minimum safe-time after the beginning of COMMUNICATIONS- 3525 INTERRUPTED state. 3527 Any available IP address tagged as belonging to the other server (at 3528 entry to PARTNER-DOWN state) MUST NOT be used until the maximum- 3529 client-lead-time beyond the entry into PARTNER-DOWN state has 3530 elapsed. 3532 A server in PARTNER-DOWN state MUST NOT allocate an IP address to a 3533 DHCP client different from that to which it was allocated at the 3534 entrance to PARTNER-DOWN state until the maximum-client-lead-time 3535 beyond the its expiration time has elapsed. If this time would be 3536 earlier than the current time plus the maximum-client-lead-time, then 3537 the current time plus the maximum-client-lead-time is used. 3539 Two options exist for lease times given out while in PARTNER-DOWN 3540 state, with different ramifications flowing from each. 3542 If the server wishes the Failover protocol to protect it from loss of 3543 stable storage in PARTNER-DOWN state, then it should ensure that the 3544 MCLT based lease time restrictions in Section 5.1 are maintained, 3545 even in PARTNER-DOWN state. 3547 If the server wishes to forego the protection of the Failover proto- 3548 col in the event of loss of stable storage, then it need recognize no 3549 restrictions on actual client lease times while in PARTNER-DOWN 3550 state. 3552 A server in PARTNER-DOWN state attempt to establish communications 3553 and synchronization with its partner. 3555 9.4.3. Transitions out of PARTNER-DOWN state 3557 When a server in PARTNER-DOWN state succeeds in establishing a con- 3558 nection to its partner, its actions are conditional on the state and 3559 flags received in the STATE message from the other server as part of 3560 the process of establishing the connection. 3562 If the STARTUP bit is set in the server-flags option of a received 3563 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 3564 transitions based on reestablishing communications. Essentially, if a 3565 server is in PARTNER-DOWN state, it ignores all STATE messages from 3566 its partner that have the STARTUP bit set in the server-flags option 3567 of the STATE message. 3569 If the STARTUP bit is not set in the server-flags option of a STATE 3570 message received from its partner, then a server in PARTNER-DOWN 3571 state take the following actions based on the value of the server- 3572 state option in the received STATE message: 3574 o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN or 3575 POTENTIAL-CONFLICT state 3577 transition to POTENTIAL-CONFLICT state 3579 o partner in RECOVER state 3581 stay in PARTNER-DOWN state 3583 o partner in RECOVER-DONE state 3585 transition into NORMAL state 3587 9.5. RECOVER state 3589 This state indicates that the server has no information in its stable 3590 storage or that it is re-integrating with a server in PARTNER-DOWN 3591 state after it has been down. A server in this state will attempt to 3592 refresh its stable storage from the other server. 3594 9.5.1. Operation in RECOVER state 3596 A server in RECOVER MUST NOT respond to DHCP client requests. 3598 A server in RECOVER state will attempt to reestablish communications 3599 with the other server. 3601 9.5.2. Transitions out of RECOVER state 3603 If the other server is in POTENTIAL-CONFLICT state when communica- 3604 tions are reestablished, then the server in RECOVER state will move 3605 to POTENTIAL-CONFLICT state itself. 3607 If the other server is in RECOVER state, then this server SHOULD 3608 signal an error and halt processing. 3610 If the other server is in any other state, then the server in RECOVER 3611 state will request an update of missing binding information by send- 3612 ing an UPDREQ message. If the server has been instructed (through 3613 configuration or other external agency) that it has lost its stable 3614 storage, it MUST send an UPDREQALL message, otherwise it MUST send an 3615 UPDREQ message. 3617 It will wait for an UPDDONE message, and upon receipt of that message 3618 it will start a timer whose expiration is set to a time equal to the 3619 time the server went down (if known) or the current time (if the 3620 down-time is unknown) plus the maximum-client-lead-time. When this 3621 timer goes off, the server will transition into RECOVER-DONE state. 3622 This is to allow any IP addresses that were allocated by this server 3623 prior to loss of its client binding information in stable storage to 3624 contact the other server or to time out. 3626 See Figure 9.5.2-1. 3628 DISCUSSION: 3630 The actual requirement on this wait period in RECOVER is that it 3631 start when the recovering server went down, not necessarily when 3632 it came back up. If the time when the recovering server failed is 3633 known, then it could be communicated to the recovering server, and 3634 the wait period could be reduced to the maximum-client-lead-time 3635 less the difference between the current time and the time the 3636 server failed. In this way, the waiting period could be minimized. 3638 If an UPDDONE message isn't received within an implementation depen- 3639 dent amount of time, and no BNDUPD message are being received, then 3640 the UPDREQ(ALL) message will be re-transmitted. 3642 A B 3643 Server Server 3645 | | 3646 RECOVER PARTNER-DOWN 3647 | | 3648 | >--UPDREQ--------------------> | 3649 | | 3650 | <---------------------BNDUPD--< | 3651 | >--BNDACK--------------------> | 3652 ... ... 3653 | | 3654 | <---------------------BNDUPD--< | 3655 | >--BNDACK--------------------> | 3656 | | 3657 | <--------------------UPDDONE--< | 3658 | | 3659 Wait MCLT from last known | 3660 time of operation | 3661 | | 3662 RECOVER-DONE | 3663 | | 3664 | >--STATE-(RECOVER-DONE)------> | 3665 | NORMAL 3666 | <-------------(NORMAL)-STATE--< | 3667 NORMAL | 3668 | | 3669 | | 3671 Figure 9.5.2-1: Transition out of RECOVER state 3673 9.6. NORMAL state 3675 NORMAL state is the state used by a server when it can communicate 3676 with the other server. 3678 9.6.1. Upon Entry to NORMAL state 3680 When entering NORMAL state, a server will send to the other server 3681 all currently unacknowledged binding updates as BNDUPD messages. 3683 When the above process is complete, if the server entering NORMAL 3684 state is a secondary server, then it will request IP addresses for 3685 allocation using the POOLREQ message. 3687 9.6.2. Processing DHCP client requests and load balancing 3689 When in NORMAL state, each server MUST process all requests from some 3690 DHCP clients, and MUST NOT process any request other than a 3691 DHCPREQUEST/RENEWAL or a DHCPREQUEST/REBINDING request from some 3692 other DHCP clients. The load balancing algorithm determines into 3693 which set a particular DHCP client falls. 3695 As discussed in section 5.3, each server will take the client- 3696 identifier from each DHCP client request (or the htype concatenated 3697 to the front of the chaddr if no client-identifier is present in the 3698 request), and hash it with the algorithm given in section 12. The 3699 results of this hash algorithm yields a number between 0 and 255. 3700 This number is used to index into the bit array received by a server 3701 in the hash-bucket-assignment option (if the server is a secondary), 3702 or into the inverse of the bit array sent to the secondary in the 3703 hash-bucket-assignment option if the server is a primary. 3705 If the bit found from this indexing process is a 1 bit, then the 3706 server MUST process this DHCP request. 3708 In NORMAL state, a server MUST processes every DHCPREQUEST/RENEWAL or 3709 DHCPREQUEST/REBINDING request it receives. 3711 9.6.3. Operation in NORMAL state 3713 When in NORMAL state, for every DHCP client request that it 3714 processes, as determined by the algorithm described in section 9.6.2, 3715 above, a server will operate in the following manner: 3717 o Lease time calculations 3719 As discussed in section 5.2.1, "Control of lease time", the 3720 lease interval given to a DHCP client can never be more than the 3721 MCLT greater than the most recently received potential- 3722 expiration-time from the failover partner or the current time, 3723 whichever is later. 3725 As long as a server adheres to this constraint, the specifics of 3726 the lease interval that it gives to a DHCP client or the value 3727 of the potential-expiration-time sent to its failover partner 3728 are implementation dependent. One possible approach is dis- 3729 cussed in section 5.2.1, but that particular approach is in no 3730 way required by this protocol. 3732 o Lazy update of partner server 3734 After an ACK of a IP address binding, the server servicing a 3735 DHCP client request attempts to update its partner with the new 3736 binding information. The lease time used in the update of the 3737 secondary MUST be at that given to the DHCP client in the 3738 DHCPACK, and the potential-expiration-time MUST be at least the 3739 lease time, and SHOULD be longer. 3741 o Reallocation of IP addresses between clients 3743 Whenever a client binding is released or expires, a BNDUPD mes- 3744 sage must be sent to partner, setting the binding state to 3745 RELEASED or EXPIRED. However, until a BNDACK is received for 3746 this message, the IP address cannot be allocated to another 3747 client. It can be allocated to the same client again. 3749 In normal state, the each server receives binding updates from its 3750 partner server in BNDUPD messages. It records these in its client 3751 binding database in stable storage and then sends a corresponding 3752 BNDACK message to the primary server. It MUST ensure that the infor- 3753 mation is recorded in stable storage prior to sending the BNDACK mes- 3754 sage back to the primary server. 3756 9.6.4. Transitions out of NORMAL state 3758 If an external command is received by a server in NORMAL state 3759 informing it that its partner is down, then transition into PARTNER- 3760 DOWN state. 3762 If a server in NORMAL state fails to receive acks to messages sent to 3763 its partner for an implementation dependent period of time, it MAY 3764 move into COMMUNICATIONS-INTERRUPTED state. This situation might 3765 occur if the partner server was capable of maintaining the TCP con- 3766 nection between the server and also capable of sending a CONTACT mes- 3767 sage every tSend seconds, but was (for some reason) incapable of pro- 3768 cessing BNDUPD messages. 3770 If the communications is determined to not be "ok" (as defined in 3771 section 8), then transition into COMMUNICATIONS-INTERRUPTED state. 3773 If a server in NORMAL state receives any messages from its partner 3774 where the partner has changed state from that expected by the server 3775 in NORMAL state, then the server should transition into 3776 COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- 3777 sition from there. For example, it would be expected for the partner 3778 to transition from POTENTIAL-CONFLICT into NORMAL state, but not for 3779 the partner to transition from NORMAL into POTENTIAL-CONFLICT state. 3781 9.7. COMMUNICATIONS-INTERRUPTED State 3783 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 3784 unable to communicate with the other server. Primary and secondary 3785 servers cycle automatically (without administrative intervention) 3786 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 3787 connection between them fails and recovers, or as the partner server 3788 cycles between operational and non-operational. No duplicate IP 3789 address allocation can occur while the servers cycle between these 3790 states. 3792 9.7.1. Upon Entry to COMMUNICATIONS-INTERRUPTED state 3794 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 3795 configured to support an automatic transition out of COMMUNICATIONS- 3796 INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period" 3797 has been configured, see section 10), then a timer MUST be started 3798 for a the length of the configured safe period. 3800 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 3801 the NORMAL state SHOULD raise some alarm condition to alert adminis- 3802 trative staff to a potential problem in the DHCP subsystem. 3804 9.7.2. Operation in COMMUNICATIONS-INTERRUPTED State 3806 In this state a server MUST respond to all DHCP client requests, and 3807 the algorithm for load balancing described in section 5.3 MUST NOT be 3808 used. When allocating new IP addresses, each server allocates from 3809 its own IP address pool, where the primary MUST allocate only FREE IP 3810 addresses, and the secondary MUST allocate only BACKUP IP addresses. 3811 When responding to renewal requests, each server will allow continued 3812 renewal of a DHCP client's current lease on an IP address irrespec- 3813 tive of whether that lease was given out by the receiving server or 3814 not, although the renewal period MUST not exceed the maximum client 3815 lead time (MCLT) beyond the potential-expiration-time already ack- 3816 nowledged by the other server or the lease-expiration-time or 3817 potential-expiration-time received from the partner server. 3819 However, since the server cannot communicate with its partner in this 3820 state, the acknowledged-potential-expiration time will not be updated 3821 in any new bindings. This is likely to eventually cause the actual- 3822 client-lease-times to be the current-time plus the maximum-client- 3823 lead-time (unless this is greater than the desired-client-lease- 3824 time). 3826 9.7.3. Transition out of COMMUNICATIONS-INTERRUPTED State 3828 If the safe period timer expires while a server is in the 3829 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 3830 PARTNER-DOWN state. 3832 If an external command is received by a server in COMMUNICATIONS- 3833 INTERRUPTED state informing it that its partner is down, it will 3834 transition immediately into PARTNER-DOWN state. 3836 If communications is restored with the other server, then the server 3837 in COMMUNICATIONS-INTERRUPTED state will transition into another 3838 state based on the state of the partner: 3840 o partner in NORMAL or COMMUNICATIONS-INTERRUPTED 3842 Transition into the NORMAL state. 3844 o partner in RECOVER 3846 Stay in COMMUNICATIONS-INTERRUPTED state. 3848 o partner in RECOVER-DONE 3850 Transition into NORMAL state. 3852 o partner in PARTNER-DOWN or POTENTIAL-CONFLICT 3854 Transition into POTENTIAL-CONFLICT state. 3856 o partner in PAUSED 3858 Stay in COMMUNICATIONS-INTERRUPTED state. 3860 o partner in SHUTDOWN 3862 Transition into PARTNER-DOWN state. 3864 The following figure illustrates the transition from NORMAL to 3865 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 3867 Primary Secondary 3868 Server Server 3870 NORMAL NORMAL 3871 | >--CONTACT-------------------> | 3872 | <--------------------CONTACT--< | 3873 | [TCP connection broken] | 3874 COMMUNICATIONS : COMMUNICATIONS 3875 INTERRUPTED : INTERRUPTED 3876 | [attempt new TCP connection] | 3877 | [connection succeeds] | 3878 | | 3879 | >--CONNECT-------------------> | 3880 | <-----------------CONNECTACK--< | 3881 | <-------------------STATE-----< | 3882 | NORMAL 3883 | >--STATE---------------------> | 3884 NORMAL | 3885 | >--BNDUPD--------------------> | 3886 | <---------------------BNDACK--< | 3887 | | 3888 | <---------------------BNDUPD--< | 3889 | >------BNDACK----------------> | 3890 ... ... 3891 | | 3892 | <--------------------POOLREQ--< | 3893 | >--POOLRESP-(2)--------------> | 3894 | | 3895 | >--BNDUPD-(#1)---------------> | 3896 | <---------------------BNDACK--< | 3897 | | 3898 | <--------------------POOLREQ--< | 3899 | >--POOLRESP-(0)--------------> | 3900 | | 3901 | >--BNDUPD-(#2)---------------> | 3902 | <---------------------BNDACK--< | 3903 | | 3905 Figure 9.7.3-1: Transition from NORMAL to COMMUNICATIONS- 3906 INTERRUPTED and back (example with 2 3907 addresses allocated to secondary) 3909 9.8. POTENTIAL-CONFLICT state 3911 This state indicates that the two servers are attempting to re- 3912 integrate with each other, but at least one of them was running in a 3913 state that did not guarantee automatic reintegration would be 3914 possible. In POTENTIAL-CONFLICT state the servers may determine that 3915 the same IP address has been offered and accepted by two different 3916 DHCP clients. 3918 It is a goal of this protocol to minimize the possibility that 3919 POTENTIAL-CONFLICT state is ever entered. 3921 9.8.1. Upon Entry to POTENTIAL-CONFLICT 3923 When a primary server enters POTENTIAL-CONFLICT state it should 3924 request that the secondary send it all updates of which it is 3925 currently unaware by sending an UPDREQ message to the secondary 3926 server. 3928 A secondary server entering POTENTIAL-CONFLICT state will wait for 3929 the primary to send it an UPDREQ message. 3931 9.8.2. Operation in POTENTIAL-CONFLICT state 3933 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 3934 DHCP requests. 3936 9.8.3. Transitions out of POTENTIAL-CONFLICT state 3938 If communications fails with the partner while in POTENTIAL-CONFLICT 3939 state, then a primary server will transition to PARTNER-DOWN state 3940 and a secondary server will stay in POTENTIAL-CONFLICT state. 3942 Whenever either server receives an UPDDONE message from its partner 3943 while in POTENTIAL-CONFLICT state, it MUST transition to NORMAL 3944 state. This will cause the primary server to leave POTENTIAL- 3945 CONFLICT state prior to the secondary, since the primary sends an 3946 UPDREQ message and receives an UPDDONE before the secondary sends an 3947 UPDREQ message and receives its UPDDONE message. 3949 When a secondary server receives an indication that the primary 3950 server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it 3951 SHOULD send an UPDREQ message to the primary server. 3953 Primary Secondary 3954 Server Server 3956 | | 3957 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 3958 | | 3959 | >--UPDREQ--------------------> | 3960 | | 3961 | <---------------------BNDUPD--< | 3962 | >--BNDACK--------------------> | 3963 ... ... 3964 | | 3965 | <---------------------BNDUPD--< | 3966 | >--BNDACK--------------------> | 3967 | | 3968 | <--------------------UPDDONE--< | 3969 NORMAL | 3970 | >--STATE--(NORMAL)-----------> | 3971 | <---------------------UPDREQ--< | 3972 | | 3973 | >--BNDUPD--------------------> | 3974 | <---------------------BNDACK--< | 3975 ... ... 3976 | >--BNDUPD--------------------> | 3977 | <---------------------BNDACK--< | 3978 | | 3979 | >--UPDDONE-------------------> | 3980 | NORMAL 3981 | | 3982 | <--------------------POOLREQ--< | 3983 | >------POOLRESP-(?)----------> | 3984 | | 3986 Figure 9.8.3-1: Transition out of POTENTIAL-CONFLICT 3988 9.9. RECOVER-DONE state 3990 This state exists to allow an interlocked transition for one server 3991 from RECOVER state and another server from PARTNER-DOWN or 3992 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 3994 9.9.1. Operation in RECOVER-DOWN state 3996 A server in RECOVER-DONE state MUST respond only to 3997 DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 3999 9.9.2. Transitions out of RECOVER-DONE state 4001 When a server in RECOVER-DONE state determines that its partner 4002 server has entered NORMAL state, then it will transition into NORMAL 4003 state as well. 4005 9.10. PAUSED state 4007 This state exists to allow one server to inform another that it will 4008 be out of service for what is predicted to be a relatively short 4009 time, and to allow the other server to transition to COMMUNICATIONS- 4010 INTERRUPTED state immediately and to begin servicing all DHCP clients 4011 with no interruption in service to new DHCP clients. 4013 A server which is aware that it is shutting down temporarily SHOULD 4014 send a STATE message with the server-state option containing PAUSED 4015 state. 4017 While a server may or may not transition internally into PAUSED 4018 state, the 'previous' state determined when it is restarted MUST be 4019 the state the server was in prior to receiving the command to shut- 4020 down and restart and which precedes its entry into the PAUSED state. 4021 See section 9.3.2 concerning the use of the previous state upon 4022 server restart. 4024 9.10.1. Upon entry to PAUSED state 4026 When entering PAUSED state, the server MUST store the previous state 4027 in stable storage, and use that state as the previous state when it 4028 is restarted. 4030 9.10.2. Transitions out of PAUSED state 4032 A server transitions out of PAUSED state by being restarted. At that 4033 time, the previous state MUST be the state the server was in prior to 4034 entering the PAUSED state. 4036 9.11. SHUTDOWN state 4038 This state exists to allow one server to inform another that it will 4039 be out of service for what is predicted to be a relatively long time, 4040 and to allow the other server to transition immediately to PARTNER- 4041 DOWN state, and take over completely for the server going down. 4043 A server which is aware that it is shutting down SHOULD send a STATE 4044 message with the server-state field containing SHUTDOWN. 4046 While a server may or may not transition internally into SHUTDOWN 4047 state, the 'previous' state determined when it is restarted MUST be 4048 the state active prior to the command to shutdown. See section 9.3.2 4049 concerning the use of the previous state upon server restart. 4051 9.11.1. Upon entry to SHUTDOWN state 4053 When entering SHUTDOWN state, the server MUST record the previous 4054 state in stable storage for use when the server is restarted. It 4055 also MUST record the current time as the last time operational. 4057 A server which is aware that it is shutting down SHOULD send a STATE 4058 message with the server-state field containing SHUTDOWN. 4060 9.11.2. Operation in SHUTDOWN state 4062 A server in SHUTDOWN state MUST NOT respond to any DHCP client input. 4064 If a server receives any message indicating that the partner has 4065 moved to PARTNER-DOWN state while it is in SHUTDOWN state then it 4066 MUST record RECOVER state as the previous state to be used when it is 4067 restarted. 4069 A server SHOULD wait for a few seconds after informing the partner of 4070 entry into SHUTDOWN state (if communications are okay) to determine 4071 if it will enter PARTNER-DOWN state. 4073 9.11.3. Transitions out of SHUTDOWN state 4075 A server transitions out of SHUTDOWN state by being restarted. 4077 10. Safe Period 4079 Due to the restrictions imposed on each server while in 4080 COMMUNICATIONS-INTERRUPTED state, long-term operation in this state 4081 is not feasible for either server. One reason that these states 4082 exist at all, is to allow the servers to easily survive transient 4083 network communications failures of a few minutes to a few days 4084 (although the actual time periods will depend a great deal on the 4085 DHCP activity of the network in terms of arrival and departure of 4086 DHCP clients on the network). 4088 Eventually, when the servers are unable to communicate, they will 4089 have to move into a state where they no longer can re-integrate 4090 without the some possibility of a duplicate IP address allocation. 4091 There are two ways that they can move into this state (known as 4092 PARTNER-DOWN). 4094 They can either be informed by external command that, indeed, the 4095 partner server is down. In this case, there is no difficulty in mov- 4096 ing into the PARTNER-DOWN state since it is an accurate reflection of 4097 reality and the protocol has been designed to operate correctly (even 4098 during reintegration) if, when in PARTNER-DOWN state the partner is, 4099 indeed, down. 4101 The more difficult scenario is when the servers are running unat- 4102 tended for extended periods, and in this case an option is provided 4103 to configure something called a "safe-period" into each server. This 4104 OPTIONAL safe-period is the period after which either the primary or 4105 secondary server will automatically transition to PARTNER-DOWN from 4106 COMMUNICATIONS-INTERRUPTED state. If this transition is completed 4107 and the partner is not down, then the possibility of duplicate IP 4108 address allocations will exist. 4110 The goal of the "safe-period" is to allow network operations staff 4111 some time to react to a server moving into COMMUNICATIONS-INTERRUPTED 4112 state. During the safe-period the only requirement is that the net- 4113 work operations staff determine if both servers are still running -- 4114 and if they are, to either fix the network communications failure 4115 between them, or to take one of the servers down before the expira- 4116 tion of the safe-period. 4118 The length of the safe-period is installation dependent, and depends 4119 in large part on the number of unallocated IP addresses within the 4120 subnet address pool and the expected frequency of arrival of previ- 4121 ously unknown DHCP clients requiring IP addresses. Many environments 4122 should be able to support safe-periods of several days. 4124 During this safe period, either server will allow renewals from any 4125 existing client. The only limitation concerns the need for IP 4126 addresses for the DHCP server to hand out to new DHCP clients and the 4127 need to re-allocate IP addresses to different DHCP clients. 4129 The number of "extra" IP addresses required is equal to the expected 4130 total number of new DHCP clients encountered during the safe period. 4131 This is dependent only on the arrival rate of new DHCP clients, not 4132 the total number of outstanding leases on IP addresses. 4134 In the unlikely event that a relatively short safe period of an hour 4135 is all that can be used (given a dearth of IP addresses or a very 4136 high arrival rate of new DHCP clients), even that can provide sub- 4137 stantial benefits in allowing the DHCP subsystem to ride through 4138 minor problems that could occur and be fixed within that hour. In 4139 these cases, no possibility of duplicate IP address allocation 4140 exists, and re-integration after the failure is solved will be 4141 automatic and require no operator intervention. 4143 11. Security 4145 It is very desirable to assure the integrity of failover partners and 4146 to thus ensure proper operation of the servers. For example, denial 4147 of service attacks are possible by the communication of invalid state 4148 information to both servers. 4150 The Failover protocol MAY be secured either by using a simple shared 4151 secret message digest which covers each message or by using TLS [TLS] 4152 (Transport Layer Security). 4154 11.1. Simple shared secret 4156 A simple shared secret message digest MAY be used to cover each mes- 4157 sage. Since there are a number of configuration parameters that must 4158 already be the same on each server in a pair, it is not unreasonable 4159 to require a shared secret to be configured as well. 4161 Only information within the packet and covered by the message digest 4162 is used for operation of the protocol. It is for this reason that the 4163 IP address of the sending server is sent in the sending-server-IP- 4164 address option of the CONNECT and CONNECTACK messages. 4166 This message digest is placed in the message-digest option. The dig- 4167 est covers the message prior to the inclusion of the message-digest 4168 option. 4170 11.2. TLS 4172 TLS, Transport Layer Security, as specified in [TLS] MAY be used. The 4173 use of TLS would be similar to the way it is used with SMTP [SMTPTLS] 4174 and IMAP/POP3/ACAP [IPAMTLS]. 4176 To request the use TLS, the server that successfully opened a connec- 4177 tion to its peer MUST send the TLS option as part of the CONNECT mes- 4178 sage. The server receiving the TLS option MUST respond with a TLS- 4179 reply option indicating its acceptace or rejection of the TLS-request 4180 in the CONNECT message. 4182 If the CONNECTACK message contained a TLS-reply of 1 , then both 4183 servers begin TLS negotiation. 4185 Upon completion of this negotiation, the server which originally sent 4186 the CONNECT message MUST resent its CONNECT message without any TLS- 4187 request, and must wait for a corresponding CONNECTACK. 4189 Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [TLS] cipher 4190 suite is REQUIRED in Failover servers supporting TLS. This is 4191 important as it assures that any two compliant implementations can be 4192 configured to interoperate. 4194 12. Hash algorithm for load balancing 4196 The following hash function is an implementation of the algorithm known 4197 as "Pearson's hash". The Pearson's hash algorithm was originally pub- 4198 lished in the Communications of the ACM Vol.33, No. 6 (June 1990), pp. 4199 677-680. The author, Peter K. Pearson, has kindly granted his permis- 4200 sion to use this algorithm, free of any encumbrances. 4202 To make Primary-backup load balancing possible , both servers MUST use 4203 the same hash function. 4205 /* A "mixing table" of 256 distinct values, in pseudo-random order. */ 4207 unsigned char failover_hash_mx_tbl[256] = 4208 { 4209 251, 175, 119, 215, 81, 14, 79, 191, 103, 49, 4210 181, 143, 186, 157, 0, 232, 31, 32, 55, 60, 4211 152, 58, 17, 237, 174, 70, 160, 144, 220, 90, 4212 57, 223, 59, 3, 18, 140, 111, 166, 203, 196, 4213 134, 243, 124, 95, 222, 179, 197, 65, 180, 48, 4214 36, 15, 107, 46, 233, 130, 165, 30, 123, 161, 4215 209, 23, 97, 16, 40, 91, 219, 61, 100, 10, 4216 210, 109, 250, 127, 22, 138, 29, 108, 244, 67, 4217 207, 9, 178, 204, 74, 98, 126, 249, 167, 116, 4218 34, 77, 193, 200, 121, 5, 20, 113, 71, 35, 4219 128, 13, 182, 94, 25, 226, 227, 199, 75, 27, 4220 41, 245, 230, 224, 43, 225, 177, 26, 155, 150, 4221 212, 142, 218, 115, 241, 73, 88, 105, 39, 114, 4222 62, 255, 192, 201, 145, 214, 168, 158, 221, 148, 4223 154, 122, 12, 84, 82, 163, 44, 139, 228, 236, 4224 205, 242, 217, 11, 187, 146, 159, 64, 86, 239, 4225 195, 42, 106, 198, 118, 112, 184, 172, 87, 2, 4226 173, 117, 176, 229, 247, 253, 137, 185, 99, 164, 4227 102, 147, 45, 66, 231, 52, 141, 211, 194, 206, 4228 246, 238, 56, 110, 78, 248, 63, 240, 189, 93, 4229 92, 51, 53, 183, 19, 171, 72, 50, 33, 104, 4230 101, 69, 8, 252, 83, 120, 76, 135, 85, 54, 4231 202, 125, 188, 213, 96, 235, 136, 208, 162, 129, 4232 190, 132, 156, 38, 47, 1, 7, 254, 24, 4, 4233 216, 131, 89, 21, 28, 133, 37, 153, 149, 80, 4234 170, 68, 6, 169, 234, 151 4235 }; 4236 unsigned char failover_p_hash( 4237 unsigned char *key, /* The key to be hashed (e.g., MAC address) 4238 */ 4239 int len /* Length of key in bytes */ ) 4240 { 4241 unsigned char hash = len; 4242 int i; 4244 for( i=len ; i > 0 ; ) 4245 { 4246 hash = failover_p_mx_tbl [ hash ^ key[ --i ] ]; 4247 } 4248 return( hash ); 4249 } 4251 13. Acknowledgments 4253 Ralph Droms started it all, by sketching out an initial interserver 4254 draft that embodied ideas from several past IETF meetings. In that 4255 draft, he acknowledged contributions by Jeff Mogul, Greg Minshall, 4256 Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group. 4258 Kim Kinnear and Bob Cole each extended that draft, separately and 4259 then together, until they created an interserver draft that supported 4260 any number of servers. The complexity of that approach was just too 4261 great, and that draft wasn't greeted with enthusiasm by many, includ- 4262 ing its authors. 4264 It did however lead to a much simpler approach embodied in the first 4265 Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph 4266 Droms. This draft posited only two servers -- a primary and a secon- 4267 dary. 4269 Kim Kinnear then wrote the Safe Failover draft to layer on top of the 4270 Failover Draft and increase its robustness in the face of certain 4271 rare network failures. 4273 At the spring 1998 IETF meeting in LA, the DHC working group said 4274 that they wanted a merged Failover and Safe Failover draft. Steve 4275 Gonczi and Bernie Volz stepped up and produced the raw material for 4276 such a merged draft, along with a new message format designed around 4277 DHCP options and other extensions and clarifications. Kim Kinnear 4278 edited their work into draft format and made other changes in time 4279 for the Summer Chicago IETF meeting. 4281 During the summer and fall of 1998, two groups worked on separate 4282 implementations of the UDP failover draft. Bernie Volz and Steve 4283 Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul 4284 Fox made up the other. These two groups worked together to produce 4285 considerable changes and simplifications of the protocol during that 4286 period, and Steve Gonczi and Kim Kinnear edited those changes into 4287 -03 draft in time for submission to the December 1998 Orlando IETF 4288 meeting. 4290 In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting on 4291 people interested in the failover draft. During that meeting a gen- 4292 eral agreement was reached to recast the failover protocol to use TCP 4293 instead of UDP. In addition, the group together brainstormed a work- 4294 able load-balancing technique. Kim Kinnear volunteered to rewrite 4295 the entire draft to include the changes made at that meeting as well 4296 as to restructure the draft along guidelines suggested by Thomas Nar- 4297 ten. The current draft represents the results of that effort. 4299 The initial idea for a hash-based load balancing approach was offered 4300 by Ted Lemon, and the determination of an algorithm and its integra- 4301 tion into the draft was done by Steve Gonczi. The security section 4302 was spearheaded by Bernie Volz. Both contributed considerably to the 4303 ideas and text in the rest of the draft with several reviews. 4305 These most recent changes have been widely circulated among the other 4306 authors, but that does not preclude any of them from expressing 4307 disagreement with what is contained in this draft at any future time. 4309 Many people have reviewed the various earlier drafts that went into 4310 this result. At American Internet, ideas were contributed by Brad 4311 Parker. At Cisco Systems, Paul Fox, and Ellen Garvey have contri- 4312 buted greatly to the form of the protocol. 4314 Glenn Waters of Bay Networks contributed ideas and enthusiasm to make 4315 a Failover protocol that was both "safe" and "lazy". 4317 Many thanks to Peter K. Pearson, the author of Pearson's hash who has 4318 kindly granted his permission to use this algorithm, for DHCP load 4319 balancing, free of any encumbrances. 4321 14. References 4323 [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 4324 2131, March 1997. 4326 [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate 4327 Requirement Levels", RFC 2119. 4329 [RFC 2132] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor 4330 Extensions", Internet RFC 2132, March 1997. 4332 [TLS] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, January 4333 1999. 4335 [SMTPTLS] Hoffman, P., "SMTP Service Extension for Secure SMTP over 4336 TLS", RFC 2487, January 1999. 4338 [IMAPTLS] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC 4339 2595, June 1999. 4341 [NAMESPACE] Carney, M., "draft-ietf-dhc-option_review_and_namespace- 4342 00.txt", June 1999. 4344 [DDNS] Rekhter, Y., Stapp, M., "draft-ietf-dhc-dhcp-dns-10.txt", 4345 June, 1999. 4347 15. Author's information 4349 Ralph Droms 4350 323 Dana Engineering 4351 Bucknell University 4352 Lewisburg, PA 17837 4354 Phone: (717) 524-1145 4355 EMail: droms@bucknell.edu 4357 Greg Rabil, Mike Dooley, Arun Kapur 4358 Lucent Technologies (Quadritek) 4359 10 Valley Stream Parkway, Suite 240 4360 Malvern, PA 19355 4362 Phone: (800) 208-2747 4364 EMail: grabil@lucent.com 4365 mdooley@lucent.com 4366 akapur@lucent.com 4368 Kim Kinnear 4369 Mark Stapp 4370 Cisco Systems 4371 250 Apollo Drive 4372 Chelmsford, MA 01824 4373 Phone: (978) 244-8000 4375 EMail: kkinnear@cisco.com 4376 mjs@cisco.com 4378 Bernie Volz 4379 Steve Gonczi 4380 Process Software Corporation 4381 959 Concord St. 4382 Framingham, MA 01701 4384 Phone: (508) 879-6994 4386 EMail: volz@process.com 4387 gonczi@process.com 4389 16. Full Copyright Statement 4391 Copyright (C) The Internet Society (1999). All Rights Reserved. 4393 This document and translations of it may be copied and furnished to oth- 4394 ers, and derivative works that comment on or otherwise explain it or 4395 assist in its implementation may be prepared, copied, published and dis- 4396 tributed, in whole or in part, without restriction of any kind, provided 4397 that the above copyright notice and this paragraph are included on all 4398 such copies and derivative works. However, this document itself may not 4399 be modified in any way, such as by removing the copyright notice or 4400 references to the Internet Society or other Internet organizations, 4401 except as needed for the purpose of developing Internet standards in 4402 which case the procedures for copyrights defined in the Internet Stan- 4403 dards process must be followed, or as required to translate it into 4404 languages other than English. 4406 The limited permissions granted above are perpetual and will not be 4407 revoked by the Internet Society or its successors or assigns. 4409 This document and the information contained herein is provided on an "AS 4410 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 4411 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 4412 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 4413 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT- 4414 NESS FOR A PARTICULAR PURPOSE. 4416 Open Issues 4418 These issues need to be resolved: 4420 1. We need to deal with the option space, and the procedures for 4421 managing it. Probably IANA. 4423 2. Figure out a better way to identify vendors. How about an 4424 SNMP Enterprise MIB value? 4426 3. Need more clarity in the conflict resolution section, probably 4427 backed up by real implementation experience. Learned a lot 4428 from the UDP implementation and experience with it in the real 4429 world, and need equivalent learning from a TCP implementation 4430 with no messages out of order or lost.