idnits 2.17.1 draft-ietf-dhc-interserver-alt-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 403 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 322: '...y failed by the external agent MUST be...' RFC 2119 keyword, line 363: '...cation but not yet allocated SHOULD be...' RFC 2119 keyword, line 1295: '...ved, the sending server MUST retry the...' RFC 2119 keyword, line 1445: '... the joining server SHOULD start over....' RFC 2119 keyword, line 1453: '...r, the joining server SHOULD also send...' (2 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 332 has weird spacing: '...ined as trans...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 1997) is 9683 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' Summary: 12 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Droms 3 INTERNET DRAFT Bucknell University 4 K. Kinnear 5 American Internet Corporation 6 April 1997 7 Expires October 1997 9 An Inter-server Protocol for DHCP 10 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are working docu- 15 ments of the Internet Engineering Task Force (IETF), its areas, and 16 its working groups. Note that other groups may also distribute work- 17 ing documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference mate- 22 rial or to cite them other than as ``work in progress.'' 24 To learn the current status of any Internet-Draft, please check the 25 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 26 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 27 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 28 ftp.isi.edu (US West Coast). 30 Abstract 32 The DHCP protocol is designed to allow for multiple DHCP servers, so 33 that reliability of DHCP service can be improved through the use of 34 redundant servers. To provide redundant service, all of the DHCP 35 servers must be configured with the same information about assigned 36 IP addresses and parameters; i.e., all of the servers must be config- 37 ured with the same bindings. Because DHCP servers may dynamically 38 assign new addresses or configuration parameters, or extend the lease 39 on an existing address assignment, the bindings on some servers may 40 become out of date. The DHCP inter-server protocol provides an auto- 41 matic mechanism for synchronization of the bindings stored on a set 42 of cooperating DHCP servers. 44 This draft is a direct extension of draft-ietf-dhc- 45 interserver-00.txt, but has been renamed draft-ietf-dhc-interserver- 46 alt-00.txt since an alternative proposal (also a direct extension of 47 draft-ietf-dhc-interserver-00.txt but in a different direction) 48 exists named draft-ietf-dhc-interserver-01.txt. 50 1. Introduction 52 DHCP servers manage the assignment of IP address and configuration 53 parameters to IP hosts. The DHCP protocol specification [1] refers 54 to the collection of configuration information assigned to a client 55 as a "binding". The DHCP protocol is designed to allow for multiple 56 DHCP servers, so that reliability of DHCP service can be improved 57 through the use of redundant servers. To provide redundant service, 58 all of the DHCP servers must be configured with the same information 59 about assigned IP addresses and parameters; i.e., all of the servers 60 must be configured with the same bindings. Because DHCP servers may 61 dynamically assign new addresses or configuration parameters, or 62 extend the lease on an existing address assignment, the bindings on 63 some servers may become out of date. 65 The DHCP inter-server protocol provides an automatic mechanism for 66 synchronization of the bindings stored on a set of cooperating DHCP 67 servers. 69 The remainder of this document is organized in the following sec- 70 tions: 72 2. Goals and Requirements 74 Defines the requirements and goals for the protocol. Discusses 75 limitations of the protocol. Also contains a definition of 76 several classes of failures as well as a list of specific fail- 77 ures (which provide a useful common ground for discussion). 79 3. Overview 81 Discusses in a general way the content of the information com- 82 municated between servers implementing this protocol as well as 83 the way that information is communicated. 85 Defines some key concepts surrounding the allowable "states" of 86 an IP address, including extensions critical to the operation 87 of this protocol. 89 Gives a brief sketch of the actions required by this protocol 90 for each DHCP client request received by the server. 92 4. Groups 94 Examines the concept of a group of servers as used by this pro- 95 tocol, and defines the "group specifier" used in all messages 96 of this protocol. 98 5. Protocol Messages 100 Examines the general structure of the messages used by this 101 protocol. For each message, it lists the format of the message 102 along with all possible success and error status returns. Mes- 103 sages discussed in two groups: Address Information Messages and 104 Configuration Messages. 106 6. Protocol Operations 108 The messages from Section 5 are grouped into some higher level 109 operations, and these are explained: POLL, PUSH, DUMP, TRANS- 110 FER, GROUP JOIN, GROUP LEAVE. 112 7. Protocol Actions 114 The actions required by this protocol in response to incoming 115 messages are detailed for each message a DHCP server can 116 receive. The messages are grouped in three sections: DHCP 117 Client Messages and Events, Address Information Messages, and 118 Configuration Messages. The first of these are the normal DHCP 119 messages, and the second and third are the new messages gener- 120 ated as part of this protocol. 122 8. IP Address State Transition 124 This protocol expands the possible states for an IP address. 125 The new states are described in Section 3.3. This section 126 describes all of the transitions between states in detail. 128 9. Server Initialization 130 This section describes how a server becomes a member of a group 131 to deliver a reliable DHCP service, as well as the actions to 132 take on every server restart. 134 10. Open Questions 136 Poses open questions about the protocol. The questions from 137 draft-ietf-dhc-interserver-00.txt are included verbatim, and 138 for some answers are supplied. Questions new to this draft are 139 included as well. 141 1.1. The Language of Requirements 143 Throughout this document, the words that are used to define the sig- 144 nificance of particular requirements are capitalized. These words 145 are: 147 o "MUST" 149 This word or the adjective "REQUIRED" means that the item is an 150 absolute requirement of this specification. 152 o "MUST NOT" 154 This phrase means that the item is an absolute prohibition of 155 this specification. 157 o "SHOULD" 159 This word or the adjective "RECOMMENDED" means that there may 160 exist valid reasons in particular circumstances to ignore this 161 item, but the full implications should be understood and the case 162 carefully weighed before choosing a different course. 164 o "SHOULD NOT" 166 This phrase means that there may exist valid reasons in particu- 167 lar circumstances when the listed behavior is acceptable or even 168 useful, but the full implications should be understood and the 169 case carefully weighed before implementing any behavior described 170 with this label. 172 o "MAY" 174 This word or the adjective "OPTIONAL" means that this item is 175 truly optional. One vendor may choose to include the item 176 because a particular marketplace requires it or because it 177 enhances the product, for example; another vendor may omit the 178 same item. 180 1.2. Terminology 182 This document uses the following terms: 184 o "DHCP client" 186 A DHCP client is an Internet host using DHCP to obtain configura- 187 tion parameters such as a network address. 189 o "client" 191 Whenever the term client is used in this draft, it refers to a 192 DHCP client (and not a server communicating with another server 193 using this protocol). 195 o "DHCP server" 197 A DHCP server is an Internet host that returns configuration 198 parameters to DHCP clients. 200 o "binding" 202 A binding is a collection of configuration parameters, including 203 at least an IP address, associated with or "bound to" a DHCP 204 client. Bindings are managed by DHCP servers. 206 o "active server" 208 An active server is one which is capable of offering IP addresses 209 to clients. 211 o "stable storage" 213 Every DHCP server is assumed to have some form of what is called 214 "stable storage". Stable storage is used to hold information 215 concerning IP address bindings (among other things) so that this 216 information is not lost in the event of a server failure which 217 requires restart of the server. 219 2. Goals and Requirements 221 There are several levels of goals for this protocol. There are a set 222 of requirements with which it must comply, and then there are a set 223 of goals for the protocol and the way that it is used that are listed 224 in priority order. 226 2.1. Requirements on this Protocol 228 The following list of requirements must be (and are) achieved by this 229 protocol. 231 1. Implementations of this protocol works with existing DHCP client 232 implementations based on the DHCP protocol [1]. It must work 233 with today's clients! 234 2. Implementation works with existing BOOTP relay implementations. 236 3. Can be specified with sufficient clarity that unique implementa- 237 tions will work well together the first time (e.g. DHCP today 238 largely meets this requirement). 240 4. Work with minimum of two and a maximum of 16 servers. 242 2.2. Goals of this Protocol 244 The following are the goals of this protocol. These goals are listed 245 in priority order. The protocol meets all of these goals. 247 1. Avoid binding an IP address to a client while that binding is 248 currently valid for another client. In other words, don't allo- 249 cate the same IP address to two clients. 251 2. Ensure that an existing client can keep its existing IP address 252 binding if it can communicate with any DHCP server using this 253 protocol -- not just the server that originally offered it the 254 binding. 256 DISCUSSION: 258 There is a subtle but very important point here. For exam- 259 ple, assume that there are five servers using this protocol. 260 Everything is running fine, and then the network becomes par- 261 titioned, and three servers can communicate among themselves, 262 and the other two can communicate among themselves -- but the 263 set of three cannot communicate with the set of two. Each 264 set, however, can communicate with some clients. 266 In this situation, every client that can communicate with a 267 DHCP server in either set should be able to continue to use 268 its existing binding, even if the server that originally cre- 269 ated the binding is not included in the set of servers with 270 which it can communicate. 272 3. Do not add any requirement for communication with another server 273 to the processing between a DHCPDISCOVER and a DHCPOFFER or 274 between a DHCPREQUEST and a DHCPACK. 276 DISCUSSION: 278 This is another subtle point. The implications of this goal 279 are that "lazy" update of IP address binding information is 280 required. In other words, because of this goal, the protocol 281 cannot require one server to update another server with 282 information concerning a new IP address binding prior to 283 sending the DHCPACK to the DHCP client. 285 As a result of this goal, a server may fail immediately after 286 sending the DHCPACK to the client but prior to successfully 287 sending a record of that information to any other server. 288 Should this happen, the DHCP client is the only operational 289 machine with a record of this binding -- and the protocol must 290 be (and has been) designed to properly deal with this situation. 292 3. Ensure that a new client can get an IP address from some server. 294 4. If a server goes down, and an external agent determines that it 295 is actually down as opposed to running but simply unable to com- 296 municate with other servers, then the addresses that it cur- 297 rently owns but are not yet bound may be recovered for use by 298 other servers. 300 5. Ensure that in the face of partition, where servers continue to 301 run but cannot communicate with each other, the above goals and 302 requirements are met. In addition, when the partition condition 303 is removed, allow graceful re-integration. 305 2.3. Limitations of this Protocol 307 The following are explicit limitations of this protocol. This is not 308 to say that they are not useful capabilities to have (that's why they 309 are explicitly listed, so that it will be clear that this protocol 310 does not supply them). 312 1. Determination of permanent server failure. 314 The protocol provides a way to propagate information about the 315 permanent failure of a server, but no way to detect a permanent 316 failure. Transient failures are detected, but there is no mech- 317 anism in this protocol to determine when a transient failure is 318 really a permanent failure. Some external agent must make this 319 determination -- and must ensure that the server declared perma- 320 nently failed is not simply partitioned from the other servers 321 and unable to communicate with them. The server which has been 322 declared permanently failed by the external agent MUST be 323 informed of that declaration prior to restart. 325 DISCUSSION: 327 The existing configuration messages would allow one server to 328 declare another server as permanently failed and remove it 329 from the group. That is not the issue. What makes fully 330 automatic determination of permanent server failure impracti- 331 cal is distinguishing between permanent server failure (which 332 is easily defined as transient server failure that has gone 333 on too long) and partition of the group of servers. 335 Once communication fails with a server, the other servers 336 cannot know if it is still operating or not, and removing an 337 operating server from the group is an activity fraught with 338 peril. 340 This protocol is designed that it will re-integrate cleanly 341 when it can communicate again with the rest of the group. 343 Group membership protocols typically handle a partition situ- 344 ation (when they bother to handle it at all) by having the 345 partitioned server determine that it has been partitioned and 346 shut itself down. It detects a partition condition in one of 347 two ways: either it can't communicate with the "master", or 348 it can't communicate with the "majority" of the group. In 349 either case, it shuts down. 351 We believe that this is not an appropriate response for a 352 DHCP server. If my DHCP client can talk to a DHCP server, I 353 want my client to continue to operate -- I'm not interested 354 in having the only DHCP server to which I can talk shut 355 itself down! 357 2. Some addresses are temporarily unavailable during transient 358 server failure. 360 The full range of existing IP addresses that are potentially 361 available for allocation is reduced during the period of a tran- 362 sient server failure. The size of the pool of addresses that 363 are available for allocation but not yet allocated SHOULD be 364 configurable for each server. If the server is subsequently 365 declared to have undergone a permanent failure, these addresses 366 will be made available again. 368 Note that it is only the addresses not yet allocated but avail- 369 able for allocation that are unusable during the period of a 370 transient server failure. IP addresses that have been allocated 371 to clients may continue to be used by those clients even during 372 server failure. Indeed -- to allow existing clients to be able 373 to renew their existing IP addresses even if the server who 374 granted them the lease has failed is a primary reason why this 375 protocol exists. 377 2.4. Failures 379 This section makes explicit both classes of failures as well as a 380 list of specific failure scenarios in order to facilitate discussion 381 of the capabilities of this protocol. 383 o "transient server failure" 385 A transient server failure is one where a server is unable to 386 respond to requests, but later becomes operational and able to 387 respond to requests. Its local stable storage (i.e. whatever 388 mechanism it uses to preserve its binding information) is accu- 389 rate as of the time that transient server failure began. 391 o "permanent server failure" 393 A permanent server failure is one where a server is unable to 394 respond to requests -- probably for an extended period. While the 395 protocol defined in this document supports declaration of a per- 396 manent server failure, the decision that a transient server fail- 397 ure is in reality a permanent server failure is beyond the scope 398 of this protocol. 400 This determination will be likely be performed by some adminis- 401 trative entity, although in the future a group membership proto- 402 col could be integrated with the protocol defined in this docu- 403 ment to make such determinations automatically. 405 o "partition" 407 A network partition is caused by a failure of the underlying com- 408 munications substrate, such that two systems that could previ- 409 ously communicate cannot now do so. This may mimic transient 410 server failure, but is not the same because in this case the 411 server that appears to have failed may still be operational and 412 interacting with clients. 414 There is a form of partition known as "partial partition", where 415 the transitivity of communication usually expected is not 416 achieved. Imagine a set of servers organized (for the purposes 417 of exposition only) as a ring where each server can communicate 418 with its neighbors, but nobody else -- and when the number of 419 servers is greater than three, a partial partition situation 420 exists. 422 This term may also be used as a noun, as in "each partition may 423 communicate with ...", and in this case it refers to the group of 424 servers which can communicate normally (as distinguished from 425 those with which that group cannot communicate). 427 o "communication failure" 429 Communications failure describes the condition where the communi- 430 cation channel between two servers becomes impossible. "Partial 431 communication failure" describes the case where the normally 432 bidirectional communications channel becomes unidirectional, 433 where one server can send to but not receive from another server. 435 Some examples of the above failures are given below: 437 1. A single server crashes and reboots. [transient failure] 439 2. A single server crashes and stays down for a period of hours and 440 then reboots (either automatically or through some external 441 agent). [transient failure] 443 3. A single server fails and never returns. No permanent failure 444 is declared for this server. [transient failure] 446 4. A single server fails. A permanent failure is declared for this 447 server. [permanent failure] 449 5. A group of two servers are partitioned so that they cannot com- 450 municate, but each can communicate to some clients. [partition] 452 6. A group of five servers are partitioned so that three can commu- 453 nicate together and the remaining two can also communicate, but 454 the two partitions cannot communicate. Each partition can com- 455 municate with a subset of the clients, and these subsets are 456 disjoint. [partition] 458 7. A group of five servers are partitioned so that three can commu- 459 nicate together and the remaining two can also communicate, but 460 the two partitions cannot communicate. Each server continues to 461 be able to communicate with all of the clients. [partition] 463 DISCUSSION: 465 This situation is unlikely to occur, but the protocol should 466 be able to handle it. 468 8. Server A can send packets to server B, but cannot receive pack- 469 ets from server B. [partial communications failure] 471 9. There are four servers, A, B, C, and D. A cannot communicate 472 with C, B cannot communicate with D. [partial partition] 473 DISCUSSION: 475 This section on failures may well not belong in the final docu- 476 ment. For the purposes of review of the rest of the protocol, 477 however, defining a common language to describe failures and giv- 478 ing specific examples of failures as an aid to discussion seemed 479 useful. 481 3. Overview 483 At the most basic level, the DHCP protocol specifies the behavior of 484 DHCP servers which communicate with DHCP clients in order to allocate 485 IP address to the clients as well as provide a variety of configura- 486 tion parameters information to them. It is the allocation of IP 487 addresses to clients by the server that creates a requirement to 488 update what is known as "stable storage" -- typically held on disk. 489 This information is used to "remember" the IP address bindings that 490 have been made by the DHCP server in order to avoid allocating the 491 same IP address to two clients. 493 The key motivation for an inter-server protocol is the desire to 494 allow a client to continue to use its IP address (i.e. be able to 495 renew its lease on an IP address) even if the server who initially 496 offered it the lease on its IP address is unavailable for some rea- 497 son. In addition, no IP address should ever be bound to two clients 498 simultaneously. 500 Providing multiple DHCP servers to which each client can communicate 501 is the first step in creating this reliable DHCP capability. 503 In addition, these DHCP servers must communicate in order to provide 504 this reliable DHCP capability. 506 3.1. What information must be communicated between servers implementing 507 the inter-server protocol? 509 Information about IP addresses is what is communicated among DHCP 510 servers in order to provide this reliable DHCP service. There are 511 two types of information about IP addresses that are relevant to the 512 inter-server protocol: 514 o IP Address State Information 516 Information on whether an IP address is bindable (i.e. could be 517 offered to a DHCP client) or bound (i.e. is currently bound to a 518 client). 520 o IP Address Binding Information 522 If an IP address is bound to a client, then considerable informa- 523 tion about that client must be stored in the stable storage of a 524 DHCP server. This information is maintained to allow a lease on 525 an IP address to expire and that IP address to be re-used by 526 another client. It is also maintained to allow a client to check 527 to see if it is using the "proper" addresses -- i.e. the one to 528 which it was bound. As well, the server uses this information to 529 check for errors when a client attempts to renew the lease on an 530 IP address. 532 The inter-server protocol described here involves communicating both 533 types of information between servers. 535 3.2. How is this information communicated between servers implementing 536 the inter-server protocol? 538 The protocol requires that servers who implement it can communicate, 539 each with the other, in a point-to-point manner (when all are operat- 540 ing correctly). It allows for the possibility that they can fail 541 entirely (i.e. crash) or be unable to communicate with each other for 542 a variety of reasons. 544 These servers will periodically need to communicate with other 545 servers in the group. There are several recurring styles of communi- 546 cation that, if defined, will assist in explaining the major concepts 547 of this protocol. These major styles of group communication are as 548 follows: 550 o POLL 552 A POLL operation is used when one server must contact every other 553 server in the group in order to request that they respond with 554 some information (typically concerning an IP address). Usually, 555 if the server executing the POLL cannot contact all of the other 556 servers, it will use whatever information it could glean from 557 those it could contact. 559 A COMPLETE POLL is like a POLL in that one server attempts to 560 contact every other server -- but in a COMPLETE POLL it must 561 receive a reply from all of them or the operation fails to com- 562 plete. 564 o PUSH 566 A PUSH operation is used when one server wants to send informa- 567 tion to all of the other servers in the group. 569 o DUMP 571 A DUMP operation is used when one server sends information about 572 every IP address binding it holds in its stable storage to 573 another server. This bulk transfer can be initiated by the 574 server sending the information, or by the server who wishes to 575 receive the information. 577 o TRANSFER 579 A TRANSFER operation is where one server engages in a 580 request/reply dialog with a single other server, usually to 581 transfer ownership of an IP address. 583 Note that both PUSH and POLL involves operations to all of the 584 servers in the group, while DUMP and TRANSFER are operations between 585 two servers in the group. 587 3.3. IP Address State 589 Section 3.1 discussed the two kinds of IP address information that 590 are communicated using this protocol. The first of them, IP Address 591 State, needs to be explained in more detail. 593 3.3.1. IP Address State: Basic DHCP Protocol 595 When an IP address is always controlled by a single DHCP server 596 (implicit in the definition of DHCP in the current DHCP draft [1]) 597 the IP address is either in the BINDABLE state or the BOUND state. 598 The following state diagram represents the states that an IP address 599 may occupy based on the current DHCP draft. 601 +-----------------+ 602 | | 603 | BINDABLE |<-+ 604 | | | 605 +-----------------+ | 606 | | 607 V | 608 +-----------------+ | 609 | | | 610 | BOUND |--+ 611 | | 612 +-----------------+ 614 Figure 1: Basic DHCP IP address state transition diagram 616 When an IP address transitions from one of these states to the other, 617 that transition must be recorded in the server's stable storage prior 618 to the transition being "published" to any observer outside of the 619 server. 621 3.3.2. IP Address State: The Inter-server Protocol Extension 623 The situation is more complex when multiple servers are managing the 624 same set of IP addresses as required by this protocol. Two new 625 states are defined for an IP address. One is called UNBINDABLE, the 626 other EXPIRED. 628 This is the state diagram for IP address state required by this pro- 629 tocol: 631 +-----------------+ 632 | | 633 | UNBINDABLE |<--+ 634 | | | 635 +-----------------+ | 636 | | 637 V | 638 +-----------------+ | 639 | | | 640 | BINDABLE | | 641 | |-->| 642 +-----------------+ | 643 | | 644 V | 645 +-----------------+ | 646 | | | 647 +-->| BOUND |-->| 648 | | | | 649 | +-----------------+ | 650 | | | 651 | V | 652 | +-----------------+ | 653 | | | | 654 +---| EXPIRED |---+ 655 | | 656 +-----------------+ 658 Figure 2: Extended DHCP IP address state transition diagram 659 required for the Inter-server protocol. 661 For every server which cooperates using this protocol, an IP address 662 is in one of the following four states: 664 o UNBINDABLE 666 This state represents the default state for every IP address. 667 Explicit action must be taken to move an IP address from this 668 state into the BINDABLE state. A COMPLETE POLL must be per- 669 formed. 671 o BINDABLE 673 In this state, the IP address is available to be offered to a 674 DHCP client, and if the client accepts the offer, it may be bound 675 to that client. 677 An IP address is only BINDABLE by a single server at a time. A 678 server must know for precisely which IP addresses it has on its 679 list of BINDABLE addresses. A server does not know about any 680 other server's list of BINDABLE addresses. (Although performance 681 optimizations are possible where a server may develop hints about 682 this information, they are not required). 684 An IP address can move from the BINDABLE state into the BOUND 685 state through the normal activity of the DHCP protocol where a 686 server interacts with a client. 688 A server can also transfer ownership of a BINDABLE IP address to 689 another server upon request from that other server (and without 690 any interaction beyond that with the other server). 692 o BOUND 694 An address that is BOUND is associated with a particular DHCP 695 client, and usually is in use by that client (although it may 696 have abandoned the lease on that IP address). It may be termed 697 BOUND to that client. 699 When a DHCP client releases a lease on an IP address it moves 700 into the UNBINDABLE state, but no explicit PUSH operation is 701 required. 703 When the lease time and any grace period implemented by a server 704 both expire, then an IP address moves into the EXPIRED state. 706 DISCUSSION: 708 Many DHCP servers implement something called a "grace 709 period", which is a period after the the lease on a binding 710 expires that an IP address will not be offered to another 711 DHCP client. A lease which is in this "grace period" is 712 still BOUND as far as the inter-server protocol is con- 713 cerned. 715 o EXPIRED 717 An IP address is EXPIRED when it was BOUND and the term of the 718 lease (and any implemented grace period) run out. It may be 719 termed EXPIRED to that client. 721 An EXPIRED IP address may be made UNBINDABLE though a POLL of 722 another server, or it may be moved back into the BOUND state by 723 an REQUEST/INIT-REBOOT request from the previously bound client. 725 3.4. Overview of Server Operation 727 This section will give a brief sketch of the IP address binding parts 728 of the protocol (from the perspective of an already configured group 729 of servers). Many of the possible cases are not described here, and 730 this section is not to be considered definitive. The definitive 731 description of this information is in Section 7.1, and in the case of 732 conflicts between this section and that one, the description in Sec- 733 tion 7.1 will govern. 735 3.4.1. DISCOVER 737 Prior to the receipt of a DISCOVER message, each server should have 738 built up a list of BINDABLE IP addresses -- for two reasons. First, 739 because a COMPLETE POLL is required to get a BINDABLE IP address, and 740 a COMPLETE POLL may not be possible due to server failure at any 741 given instant. Second, because even if a COMPLETE POLL was possible 742 it would generally take too long to do between a DISCOVER and an 743 OFFER message. 745 A server should offer a BINDABLE address to a client upon receipt of 746 a DISCOVER message. 748 There are no inter-server protocol activities required when a DIS- 749 COVER is processed and an OFFER is returned to the client. 751 3.4.2. REQUEST/SELECTING 753 When a client accepts an offer by sending a SELECTING message, then 754 the server updates its stable storage with the binding information 755 and ACKs the client. It must then perform a PUSH operation to push 756 the binding information to all of the other servers (to which it can 757 communicate at that time). 759 3.4.3. REQUEST/INIT-REBOOT 761 In the usual case where the server who created the binding for the 762 requesting client managed to PUSH that information to the other 763 servers, the receiving server will have (or be able to discover) the 764 binding information for this client. If this information can be ver- 765 ified, then ACK the client -- else NAK it. 767 3.4.4. REQUEST/RENEWING 769 Upon receipt of a RENEWAL message (which is unicast from the client 770 to the server), it is expected that the server will have accurate 771 information concerning the binding of the client. If it does not, 772 process the message like a REBINDING, below. Given that the server 773 has information sufficient to extend the lease, it should update its 774 stable storage with the lease extension, and then ACK the client with 775 the extended time. Then it must perform a PUSH operation to the 776 other servers with the updated binding information. 778 3.4.5. REQUEST/REBINDING 780 Upon receipt of a REBINDING message (which is broadcast from the 781 client), the server will check to see if it has any information about 782 the binding for this client. There are several cases possible: 784 1. Current information shows that this client owns the IP address. 786 Extend the lease, update stable storage, ACK the client, and 787 perform a PUSH with the information to the other servers. 789 2. Current information shows that some other client is BOUND to 790 this IP address. 792 This is a problem. Make the IP address UNAVAILABLE (see Section 793 10 for details). 795 3. Current information says this IP address is UNBINDABLE. 797 In this case, a server has probably created a binding and then 798 failed to propagate the information to this server. Perform a 799 POLL operation to see if any communicating server has any better 800 information. 802 If information is returned, then move to the appropriate case in 803 this list. 805 If no information is returned, then extend the lease on the IP 806 address, update stable storage, ACK the client, and PUSH the 807 information to the other servers. 809 3.4.6. Release 811 When a release is received, if the client matches the binding infor- 812 mation in the server, then update stable storage with the release, 813 set the IP address UNBINDABLE, and PUSH the information to the other 814 servers. 816 3.4.7. Expiration 818 When a lease on an IP address expires, move the lease to the EXPIRED 819 state and update stable storage with this information. From now on, 820 if some server performs a POLL operation to gather information about 821 this IP address, make the IP address UNBINDABLE, update stable stor- 822 age, and respond with the state of the IP address UNBINDABLE. 824 4. Groups 826 Fundamental to this protocol is the "group" of servers which are com- 827 municating and with which the clients can communicate in order to 828 provide a reliable DHCP service. 830 4.1. Group Membership Definition 832 Each "group" to which a server belongs is associated with a particu- 833 lar set of address pools. These address pools are those which exist 834 on a single network segment (sometimes called a single "wire"). 836 An active server can be (and typically would be) a member of several 837 groups simultaneously. The groups to which a server attempts to 838 become members are defined externally to this protocol. 840 Each group has a unique 32bit group id which is used in the protocol 841 messages of every type in this protocol. 843 A server attempts to become a member of a particular group by using 844 the configuration messages described below. In addition, a server 845 can remove another server from the group using these messages -- but 846 in this case an external agent must ensure that the server being 847 removed is truly inactive and not just partitioned. 849 4.2. Group Specifier Definition 851 Every protocol message (excluding only those mentioned later in this 852 section) includes something called a "group specifier". A group 853 specifier consists of two 32 bit quantities: 855 o Group ID 856 The group id is a 32 bit unsigned quantity which defines the 857 group to which this message applies. It is defined in the series 858 of configuration messages below. This group id applies to a set 859 of address pools which exist on the same physical network. 861 DISCUSSION: 863 Just how does the first server in the group get selected? 865 As well, just how does it select the group-id for the group? 866 Group-id's don't have to be globally unique -- just unique 867 amongst all of the servers who are connected using this proto- 868 col. But, this is pretty much the same thing. 870 Possibly there is a way to figure out how to generate a group- 871 id from the network numbers of the subnets contained in the 872 group definition. 874 o Group Sequence Number 876 This 32 bit unsigned sequence number is incremented every time 877 that the group moves into the proposal stage. When it overflows 878 beyond the 32 bit boundary, it will never increment back to zero, 879 but will go to 1 instead. 881 DISCUSSION: 883 I've been told that there is a excellent and precise specifi- 884 cation of a sequence number like this in the DNS RFC. It 885 should replace the paragraph above. 887 This is the "generation number" of the group. 889 A group specifier containing these two values is a part of every mes- 890 sage in the inter-server protocol, except for the messages listed 891 below: 893 o REQUEST-GROUPS 895 o REPLY-GROUPS 897 o REQUEST-GROUP-CONFIG 899 o REPLY-GROUP-CONFIG 901 o REQUEST-GROUP-MEMBERSHIP 903 4.3. Group Specifier Usage 905 For every message sent which includes a group specifier, if the 906 receiving server doesn't have a matching group sequence number in its 907 current group specifier for that group, it will return an error: NAK- 908 GROUP-SPECIFIER-MISMATCH. 910 This error return will include its current group specifier, as well 911 as the information that would be included in the REPLY-GROUP- 912 MEMBERSHIP message (i.e. the list of servers currently in this group 913 from the replying server's standpoint). 915 It will also take additional action based on the relationship of the 916 message's group sequence number to its current group sequence number. 918 o message group sequence number > server group sequence number 920 In this case, the server sending the message has a "more up to 921 date" version of the group than the receiving server. The 922 receiving server will drop the incoming message and return an 923 error response as specified above, and then it will send a 924 REQUEST-GROUP-MEMBERSHIP message to the server from which the 925 message originated. The REPLY-GROUP-MEMBERSHIP message which is 926 returned will be used to update the server's group specifier and 927 group definition. 929 In the event that the current server is not a member of the 930 group after that membership is updated by the REPLY-GROUP- 931 MEMBERSHIP message, it will immediately cease to operate on all 932 address pools associated with that group. 934 o message group sequence number < server group sequence number 936 In this case, the server sending the message has a "less up to 937 date" version of the group than the receiving server. The error 938 message the receiving server has returned contains the informa- 939 tion necessary for the sending server to update its conception 940 of group membership and retry the original packet. 942 In this way, the most recent view of the membership of the group will 943 eventually propagate throughout the group. 945 5. Protocol Messages 947 The various messages that make up the inter-server protocol are 948 described in this section. First, the overall structure of each mes- 949 sage is described, and the the messages are described in two groups: 951 Address Information Messages, and Configuration Messages. 953 The way the messages are used is explained in Sections 6 and 7. 955 5.1. Message Structure 957 All of the interserver messages have the following fields: 959 o Group ID 961 This is the group from the group specifier described in Section 962 TBD. A value of zero is not a legal group id and is used when no 963 group id should be specified (i.e. for those few messages which 964 don't have a group id). 966 o Group Sequence Number 968 This is the group sequence number from Section TBD. It must be 969 non-zero if the group id is non-zero. 971 o Operation 973 The operation is either a request or a reply, and there are a 974 wide variety of each of them. Possible operations are listed 975 below: 977 REQUEST-ADDRESS-INFORMATION | REPLY-ADDRESS-INFORMATION 979 REQUEST-ADDRESS-INFORMATION-BINDABLE 981 REQUEST-UPDATE-ADDRESS-INFORMATION | REPLY-UPDATE-ADDRESS- 982 INFORMATION 984 REQUEST-ADDRESS-INFORMATION-DUMP | REPLY-ADDRESS-INFORMATION-DUMP 986 REQUEST-BINDABLE-ADDRESS | REPLY-BINDABLE-ADDRESS 988 REQUEST-GROUPS | REPLY-GROUPS 990 REQUEST-GROUP-CONFIG | REPLY-GROUP-CONFIG 992 REQUEST-GROUP-MEMBERSHIP | REPLY-GROUP-MEMBERSHIP 994 REQUEST-PROPOSE-GROUP-JOIN | REPLY-PROPOSE-GROUP-JOIN 996 REQUEST-COMMIT-GROUP-JOIN | REPLY-COMMIT-GROUP-JOIN 997 REQUEST-PROPOSE-GROUP-LEAVE | REPLY-PROPOSE-GROUP-LEAVE 999 REQUEST-COMMIT-GROUP-LEAVE | REPLY-COMMIT-GROUP-LEAVE 1001 o Result 1003 When the operation is a reply, the result is one of the follow- 1004 ing: 1006 ACK 1008 ACK-DATA 1010 NAK 1012 NAK-GROUP-SPECIFIER-MISMATCH-DATA 1014 o Data 1016 If there is any data for the operation, then it appears last. It 1017 is possible from the Result of the operation to determine if 1018 there is any data. For all of the results listed above, if they 1019 end in -DATA, then data appears in the data section. 1021 5.2. Address Information Messages 1023 The address information messages are used to exchange information 1024 about the state and binding of an IP address among the servers in the 1025 group. The general content and usage of the binding data is first 1026 discussed, and following that the individual address information mes- 1027 sages are discussed. 1029 5.2.1. Binding Data and State Information 1031 When binding data is sent as part of an address information message, 1032 it contains the following information: 1034 o IP Address [ipaddr] 1036 o Expiration [int32] (delta from now) 1038 o Client ID [string] 1040 o MAC Address [string] 1041 o Last Transaction [int32] 1043 o Last Transaction Time [int32] (delta from now) 1045 o Last Transaction Server [ipaddr] 1047 Each server must maintain as part of the binding information the 1048 "last transaction time", the "last transaction", and the "last trans- 1049 action server" associated with that binding. 1051 The last transaction time is the time at which the binding changed in 1052 response to a request (the last transaction) from the client. The 1053 last transaction time is returned in an address information message 1054 as a number of seconds from "now". 1056 The possible last transactions are listed below. This list is 1057 ordered by the precedence of the transactions and is used to help 1058 determine if a response to an address information message contains 1059 more recent information than that currently held by a server. 1061 The last transaction is one of the following: 1063 o DHCPREQUEST/SELECTING 1065 o DHCPREQUEST/REBINDING 1067 o DHCPREQUEST/INIT-REBOOT 1069 o DHCPREQUEST/RENEWING 1071 o DHCPRELEASE 1073 o EXPIRATION 1075 The IP address state information is transmitted as well, and it con- 1076 sists of one of the following states: 1078 o UNBINDABLE 1080 o BINDABLE 1082 o BOUND 1084 o EXPIRED 1086 5.2.2. REQUEST-ADDRESS-INFORMATION | REPLY-ADDRESS-INFORMATION 1088 The REQUEST-ADDRESS-INFORMATION message contains a list of of all of 1089 the IP addresses for which a REPLY is requested. 1091 The REPLY-ADDRESS-INFORMATION message contains the binding data (see 1092 Section 5.2.1) for each IP address listed in the REQUEST. 1094 Additional detailed information describing the format and all possi- 1095 ble success and error returns of these messages is TBD. 1097 5.2.3. REQUEST-ADDRESS-INFORMATION-BINDABLE 1099 The REQUEST-ADDRESS-INFORMATION-BINDABLE message contains a list of 1100 of all of the IP addresses for which a REPLY is requested. It is in 1101 the same format as the REQUEST-ADDRESS-INFORMATION message, but con- 1102 tains the additional information that the requester wishes to make 1103 the IP addresses listed BINDABLE if possible. 1105 A REPLY-ADDRESS-INFORMATION message (see above) is used to reply to 1106 this message. 1108 Additional detailed information describing the format and all possi- 1109 ble success and error returns of these messages is TBD. 1111 5.2.4. REQUEST-UPDATE-ADDRESS-INFORMATION | REPLY-UPDATE-ADDRESS- 1112 INFORMATION 1114 The REQUEST-UPDATE-ADDRESS-INFORMATION message contains address bind- 1115 ing information (see Section 5.2.1) for every IP address for which an 1116 update is requested. 1118 Additional detailed information describing the format and all possi- 1119 ble success and error returns of these messages is TBD. 1121 5.2.5. REQUEST-ADDRESS-INFORMATION-DUMP | REPLY-ADDRESS-INFORMATION- 1122 DUMP 1124 Detailed information describing the format and all possible success 1125 and error returns of these messages is TBD. 1127 5.2.6. REQUEST-BINDABLE-ADDRESS | REPLY-BINDABLE-ADDRESS 1129 In the REQUEST-BINDABLE-ADDRESS message the requesting server must 1130 specify 1132 o The address pool in the group for which it wishes to acquire some 1133 BINDABLE addresses. 1135 o The number of number of BINDABLE addresses it is requesting. 1137 o The number of number of BINDABLE addresses it currently has for 1138 that address pool. 1140 Additional detailed information describing the format and all possi- 1141 ble success and error returns of these messages is TBD. 1143 5.3. Configuration Messages 1145 Configuration messages are used add a server to a group as well as to 1146 remove a server from a group. A server must add itself to a group -- 1147 it cannot be added by another server. A server may be removed by any 1148 server in the group, including itself. 1150 DISCUSSION: 1152 As written, it is a requirement for a server to add itself to the 1153 group. Is this a good idea? This prevents an external agent from 1154 adding a server to the group to which some existing group members 1155 could not communicate. 1157 Likewise, should an existing member of a group be required to 1158 remove a server from a group? Again, as written, the answer is 1159 yes. Of course, an external agent could become a member of the 1160 group (nothing requires it to be a DHCP server if it deals with 1161 the protocol messages successfully), remove another server from 1162 the group, and then remove itself from the group. 1164 In addition to changing the group membership, configuration messages 1165 are used to keep the various servers up to date with respect to the 1166 current membership of the group. 1168 5.3.1. REQUEST-GROUPS | REPLY-GROUPS 1170 Detailed information describing the format and all possible success 1171 and error returns of these messages is TBD. 1173 5.3.2. REQUEST-GROUP-CONFIG | REPLY-GROUP-CONFIG 1175 Detailed information describing the format and all possible success 1176 and error returns of these messages is TBD. 1178 5.3.3. REQUEST-GROUP-MEMBERSHIP | REPLY-GROUP-MEMBERSHIP 1180 Detailed information describing the format and all possible success 1181 and error returns of these messages is TBD. 1183 5.3.4. REQUEST-PROPOSE-GROUP-JOIN | REPLY-PROPOSE-GROUP-JOIN 1185 Detailed information describing the format and all possible success 1186 and error returns of these messages is TBD. 1188 5.3.5. REQUEST-COMMIT-GROUP-JOIN | REPLY-COMMIT-GROUP-JOIN 1190 Detailed information describing the format and all possible success 1191 and error returns of these messages is TBD. 1193 5.3.6. REQUEST-PROPOSE-GROUP-LEAVE | REPLY-PROPOSE-GROUP-LEAVE 1195 Detailed information describing the format and all possible success 1196 and error returns of these messages is TBD. 1198 5.3.7. REQUEST-COMMIT-GROUP-LEAVE | REPLY-COMMIT-GROUP-LEAVE 1200 Detailed information describing the format and all possible success 1201 and error returns of these messages is TBD. 1203 6. Protocol Operations 1205 The protocol messages from the previous section can be combined to 1206 form the following, more complicated, operations: 1208 o POLL and COMLETE POLL 1210 o PUSH 1212 o DUMP 1213 o TRANSFER 1215 o Determine the Available Groups 1217 o GROUP JOIN 1219 o GROUP LEAVE 1221 6.1. POLL and COMPLETE POLL 1223 In POLL operation, the exchange of REQUEST-ADDRESS-INFORMATION and 1224 REPLY-ADDRESS-INFORMATION messages is used by a server in order to 1225 determine if an IP address is in use by any other server, or to 1226 update its internal database with the most recent binding informa- 1227 tion. 1229 It will send a REQUEST-ADDRESS-INFORMATION message to every server in 1230 the group, and expect a REPLY-ADDRESS-INFORMATION message in response 1231 from each. This can be done either serially, stepping through all of 1232 the servers in the group, or in parallel -- sending REQUEST-ADDRESS- 1233 INFORMATION messages to all of them at once. 1235 When COMPLETE POLL operation is used to move an address from the 1236 UNBINDABLE state into the BINDABLE state, the REQUEST-ADDRESS- 1237 INFORMATION-BINDABLE request is used. The REPLY-ADDRESS-INFORMATION 1238 message is still used as a reply. 1240 No address can be offered to a client until all servers in the group 1241 have been queried and responded. All of the responses must have been 1242 ACK-DATA and the state of the IP addresses must have been UNBINDABLE. 1243 Once this operation is complete, the server can consider the IP 1244 address to be BINDABLE and must update its stable storage to that 1245 effect. 1247 Note that this operation would typically *not* be performed immedi- 1248 ately prior to making an offer to a client, but would be done in 1249 advance to build up a list of BINDABLE IP addresses that could be 1250 offered to clients. The reasons for this are: 1252 1. It could take a fair amount of time to contact each DHCP server 1253 in the group to ask about the status of an address, and that 1254 would slow down the offer process. 1256 2. If *any* server in the group is down, this protocol cannot com- 1257 plete, and can never yield a positive answer. 1259 6.1.1. PUSH 1261 This exchange of REQUEST-UPDATE-ADDRESS-INFORMATION and REPLY-UPDATE- 1262 ADDRESS-INFORMATION messages are used by one server to inform another 1263 server of the address binding information it has about a lease. 1265 The data part of the REQUEST-UPDATE-ADDRESS-INFORMATION message has 1266 the same form as the REPLY-ADDRESS-INFORMATION from the poll mode of 1267 this protocol, except that it is used to inform another server of 1268 updated information from the requester. 1270 The responding server will return an REPLY-UPDATE-ADDRESS-INFORMATION 1271 if the information sent in the REQUEST-UPDATE-ADDRESS-INFORMATION 1272 message was more recent than that available in its cache. Prior to 1273 sending the ACK, it will update its stable storage with the new 1274 information. 1276 In the event that the responding server determines that it has more 1277 recent information than the requesting server (based on the algorithm 1278 in Section TBD above), it will reply with a REPLY-UPDATE-ADDRESS- 1279 INFORMATION message with a NAK-DATA which will also contain all of 1280 its latest information. The requesting server -- which now is the 1281 recipient of a lot of information which it didn't anticipate -- 1282 should update its stable storage with this latest information. The 1283 requesting server is under no obligation to reply to the NAK message. 1285 DISCUSSION: 1287 Just how long should a server doing a PUSH of information try to 1288 get the information to the rest of the servers? Since the entire 1289 protocol has been designed to allow "lazy update", then perhaps it 1290 is sufficient to try once or retry several times over less than a 1291 minute -- and then to stop trying. 1293 Actually, since the mismatch of group specifiers can at any time 1294 cause a packets to be dropped, whenever a NAK-GROUP-SPECIFIER- 1295 MISMATCH message is received, the sending server MUST retry the 1296 message that was sent after correcting its view of the group spec- 1297 ifier (and therefore the group definition). 1299 6.2. DUMP 1301 The push of all of the binding information for all IP addresses where 1302 the last transaction server is the sending server to another server 1303 can be triggered by a REQUEST-ADDRESS-INFORMATION-DUMP message sent 1304 to a server. When a server receives a REQUEST-ADDRESS-INFORMATION- 1305 DUMP message, it will send a series of REQUEST-UPDATE-ADDRESS- 1306 INFORMATION messages to the requester. When it has completed the 1307 DUMP operation, it will send a REPLY-ADDRESS-INFORMATION-DUMP message 1308 with an ACK. 1310 6.3. TRANSFER 1312 The exchange of REQUEST-BINDABLE-ADDRESS and REPLY-BINDABLE-ADDRESS 1313 messages is used by a server in order to ask another single server 1314 for one of its BINDABLE addresses. The address returned by the query 1315 must be BINDABLE by the responding server and, prior to this message 1316 being sent, must be set to be UNBINDABLE and recorded in that 1317 server's stable storage. 1319 This protocol exchange would typically be used by a server who ran 1320 out of available addresses to offer to new clients and could not gen- 1321 erate any new ones by using the COMPLETE POLL operation because: 1323 1. Some other server was down and so a COMPLETE POLL could not com- 1324 plete. 1326 2. While the COMPLETE POLL could complete, it could not yield any 1327 new addresses for allocation because all of them were currently 1328 either allocated to a client or already on the list of available 1329 addresses of other servers. 1331 6.4. Determine the Available Groups 1333 The first stage of becoming a server participating in the inter- 1334 server protocol is to determine the existing group id for each set of 1335 address pools for which participation in the inter-server protocol is 1336 desired. 1338 Assuming that a server has been provided or can discover the IP 1339 address of a server that is already in the group to which it wants to 1340 join, a server who wants to become a member of a group will send a 1341 REQUEST-GROUPS message to some server it thinks might belong to a 1342 group to which it wishes to join. 1344 Any server who receives a REQUEST-GROUPS message will reply with a 1345 REPLY-GROUPS message containing the set of group specifiers for every 1346 group to which it is a member. 1348 For each of the group specifiers specified in the REPLY-GROUPS mes- 1349 sage, the joining server will send a REQUEST-GROUP-CONFIG request to 1350 the server it is interrogating. This message asks for the group 1351 information for one group specifier. 1353 The response to the REQUEST-GROUP-CONFIG message will be a REPLY- 1354 GROUP-CONFIG message which will contain the latest group specifier, 1355 and the network number and subnet mask of every subnet associated 1356 with that group. 1358 From this information, the requesting server can determine if it 1359 wishes to participate in this group. 1361 6.5. GROUP JOIN 1363 There are two phases to involved in a server adding itself to a 1364 group. The first is the proposal stage, and the second is the commit 1365 stage. 1367 6.5.1. GROUP JOIN -- Proposal Stage 1369 In the proposal stage, all of the servers in the group are synchro- 1370 nized by the joining server with respect to their current concept of 1371 group membership as well as the identity of the joining server. 1373 When a server decides to join a group, then it will issue a REQUEST- 1374 GROUP-MEMBERSHIP request, and the responding server will reply with 1375 REPLY-GROUP-MEMBERSHIP. This message contains the latest group spec- 1376 ifier, along with the list of IP addresses that make up the group. 1378 The joining server must check to see that it is not already a member 1379 of this group before proceeding. 1381 The joining server now has the list of existing servers in the group, 1382 and has verified that it makes sense to be a member of this group. 1383 Now, it has to interact with each server currently in the group. 1385 It will send a REQUEST-PROPOSE-GROUP-JOIN request to every server in 1386 the group. This message has the current group specifier in the mes- 1387 sage along with a revised group membership (i.e. the response from 1388 REPLY-GROUP-MEMBERSHIP with the addition of the joining server). 1390 Upon receipt of a REQUEST-PROPOSE-GROUP-JOIN request, if no existing 1391 proposal exists that has not timed out, a server will create a single 1392 "proposed" group specifier from the current group specifier by incre- 1393 menting the group sequence number by 1. The creation of this pro- 1394 posed group specifier will inhibit the creation of another proposed 1395 group specifier for a 30 seconds. The responding server will reply 1396 with REPLY-PROPOSE-GROUP-JOIN and an ACK. 1398 If an existing proposal exists that has not timed out, the responding 1399 server will reply with REPLY-PROPOSE-GROUP-JOIN and a NAK-DATA. This 1400 will include the same information as a REPLY-GROUP-MEMBERSHIP. (From 1401 this, the joining server can determine just who is attempting to join 1402 the group.) 1404 DISCUSSION: 1406 Clearly a deadlock situation can occur where two servers are try- 1407 ing to join a group at the same time, and each is working from 1408 "opposite ends" of the group. In this case, where the joining 1409 server gets a failure from a REQUEST-PROPOSE-GROUP-JOIN message 1410 due to the existence of a valid proposal that has not timed out, 1411 then the joining server should backoff an amount of time that is 1412 based in part on its IP address before trying again. The exact 1413 algorithm is TBD. 1415 This proposed group specifier will not be used in any messages until 1416 it moves to the accepted stage and become the current group specifier 1417 (see below for how it does that). 1419 If a second REQUEST-PROPOSE-GROUP-JOIN request is received from a 1420 server, that message will supersede the existing proposal and the 1421 timer will be reset. 1423 As the joining server cycles through the existing members of the 1424 group, it will be rationalizing the group specifiers among the group 1425 and the entire group's picture of the membership of the group. If it 1426 encounters a server whose view of the group membership lags behind 1427 that of the server from which the joining server received its idea of 1428 group membership, then it will bring that server up to date. 1430 If, on the other hand, it encounters a server that has a more up to 1431 date version of the group membership than the one from which it is 1432 operating, it will have to update its idea of the group membership 1433 and then start the proposal sequence over. All of the servers with 1434 which it has created proposals will be forced to update their view of 1435 group membership as part of this process. 1437 At the end of this process of proposal generation, all of the servers 1438 in the group share a common picture of both the group membership as 1439 well as the current proposal. 1441 6.5.2. GROUP JOIN -- Commit Stage 1443 The joining server must have started a timer when it sent out the 1444 first REQUEST-PROPOSE-GROUP-JOIN message, and if that timer has less 1445 than time/2 time left on it, or the joining server SHOULD start over. 1447 Now, the joining server sends a REQUEST-COMMIT-GROUP-JOIN message 1448 (which contains the same information as the REQUEST-PROPOSE-GROUP- 1449 JOIN message) to the first server to which it sent the REQUEST- 1450 PROPOSE-GROUP-JOIN message. That server must update its stable stor- 1451 age with the new group membership. When that server has returned an 1452 REPLY-COMMIT-GROUP-JOIN message with an ACK, then the server has 1453 joined the group. However, the joining server SHOULD also send 1454 REQUEST-COMMIT-GROUP-JOIN messages to all remaining servers in the 1455 group. 1457 Upon receipt of a REQUEST-COMMIT-GROUP-JOIN message, the current pro- 1458 posal is compared with the data in the REQUEST-COMMIT-GROUP-JOIN mes- 1459 sage, and if it compares successfully, the proposed new group becomes 1460 the current group and the group specifier is changed. It returns 1461 REPLY-COMMIT-GROUP-JOIN and an ACK. 1463 6.6. GROUP LEAVE 1465 The process of removing a server from a group is largely identical to 1466 that used in a GROUP JOIN and described above. It contains the same 1467 two phases -- "proposal" and "commit". The messages used are: 1468 REQUEST-PROPOSE-GROUP-LEAVE -> REPLY-PROPOSE-GROUP-LEAVE, and 1469 REQUEST-COMMIT-GROUP-LEAVE -> REPLY-COMMIT-GROUP-LEAVE. 1471 The only other change from GROUP JOIN above is that when sending 1472 REQUEST-PROPOSE-GROUP-LEAVE messages and REQUEST-COMMIT-GROUP-LEAVE 1473 messages, while they are sent to all servers in the current group 1474 (including the server who is supposed to be leaving the group), if no 1475 reply from the server leaving the group is received, it is not con- 1476 sidered an error. 1478 The messages are sent to the leaving server in order to help preserve 1479 correct operation in the event that server is still operational. 1481 If a server receives a REQUEST-COMMIT-GROUP-LEAVE message from 1482 another server where the group defined does not include itself, it 1483 will cease operations on the address pools associated with that 1484 group. 1486 A server must be removed from a group by another server which is cur- 1487 rently a member of that group. 1489 7. Protocol Actions 1491 This section gives the definitive details on the response a server 1492 should make to the receipt of various messages. The messages are 1493 grouped into three sections: 1495 1. DHCP Client Messages and Events 1497 These are the messages that normally flow from a DHCP client to 1498 DHCP servers. This section explains the actions required by the 1499 inter-server protocol for each DHCP client message. 1501 2. Address Information Messages 1503 This section explains the required responses to Address Informa- 1504 tion messages. 1506 3. Configuration Messages 1508 This section explains the required responses to Configuration 1509 Messages. 1511 7.1. DHCP Client Messages and Events 1513 This section details the actions to be taken in response to the mes- 1514 sages that may be received by a DHCP server from a DHCP client. 1516 DISCUSSION: 1518 There is considerable commonality in the sections that describe 1519 the various DHCP client messages below. Once the details have 1520 stabilized, it should be possible to compress the explanations. 1522 7.1.1. DISCOVER 1524 Prior to the receipt of a DISCOVER message, each server should have 1525 built of a list of BINDABLE IP addresses -- for two reasons. First, 1526 because a COMPLETE POLL is required to get a BINDABLE IP address, and 1527 a COMPLETE POLL may not be possible due to server failure at any 1528 given instant. Second, because even if a COMPLETE POLL were possi- 1529 ble, it would be unwise to require such an operation between a 1530 receipt of a DISCOVER message and the response of an OFFER to a 1531 client. 1533 There are several cases involved in processing a DISCOVER request, 1534 depending on the state of the requested IP address in the DISCOVER 1535 request: 1537 o No specific IP address requested. 1539 Offer a BINDABLE address to the client. Record that this address 1540 was offered in the cache memory of the server, but there is no 1541 need to update the stable storage of the server with any informa- 1542 tion. The IP address continues to be BINDABLE. 1544 o Requested IP address is UNBINDABLE. 1546 If the IP address is UNBINDABLE, then perform a COMPLETE POLL 1547 operation in an attempt to make the IP address BINDABLE. If the 1548 operation is successful, then respond as though the IP address 1549 were BINDABLE, below. If the results of the attempt to make the 1550 IP address BINDABLE resulted in a discovery that the IP address 1551 is now BOUND, then respond as for BOUND, below. Otherwise (i.e. 1552 the IP address is BINDABLE for some other server, or no a com- 1553 plete POLL was not possible) then respond as above for "No spe- 1554 cific IP address requested". 1556 o Requested IP address is BINDABLE. 1558 Offer the IP address to the client. IP address remains BINDABLE. 1560 o Requested IP address is BOUND or EXPIRED. 1562 If the IP address is BOUND or EXPIRED to the requesting client, 1563 then offer it to the client. Otherwise, respond as in "No spe- 1564 cific IP address requested", above. 1566 7.1.2. REQUEST/SELECTING 1568 The client uses a REQUEST/SELECTING to accept the offer of a lease 1569 made by a server. When a server receives such a message, and where 1570 the server-id option reflects the IP address of that server, then if 1571 the IP address is in the following states the server should respond 1572 in the following way: 1574 o UNBINDABLE 1576 If the IP address is UNBINDABLE, then perform a COMPLETE POLL 1577 operation in an attempt to make the IP address BINDABLE. If the 1578 operation is successful, then respond as though the IP address 1579 were BINDABLE, below. If the results of the attempt to make the 1580 IP address BINDABLE resulted in a discovery that the IP address 1581 is now BOUND, then respond as for BOUND, below. Otherwise (i.e. 1583 the IP address is BINDABLE for some other server, or no a com- 1584 plete POLL was not possible) NAK the REQUEST. 1586 o BINDABLE 1588 If the IP address is BINDABLE and has been offered to the 1589 requester, then bind the IP address to the client, set the IP 1590 address BOUND, and update stable storage. Then, ACK the client, 1591 and finally perform a PUSH operation of the binding information 1592 to the other servers. 1594 o BOUND or EXPIRED 1596 If the IP address is BOUND or EXPIRED to the requesting client, 1597 set the IP address to be BOUND, update the expiration time, 1598 update stable storage, and ACK the client. Finally, perform a 1599 PUSH operation of the updated binding information to the other 1600 servers. 1602 If the IP address is BOUND or EXPIRED to some other client, then 1603 NAK the request. 1605 7.1.3. REQUEST/INIT-REBOOT 1607 The client uses a REQUEST/INIT-REBOOT to query the server (as part of 1608 the client boot process) to determine if a "remembered" binding is 1609 still valid. If the requested IP address will be in one of the fol- 1610 lowing states: 1612 o UNBINDABLE 1614 If the IP address is UNBINDABLE, then perform a COMPLETE POLL 1615 operation in an attempt to make the IP address BINDABLE. If the 1616 operation is successful, then respond as though the IP address 1617 were BINDABLE, below. If the results of the attempt to make the 1618 IP address BINDABLE resulted in a discovery that the IP address 1619 is now BOUND, then respond as for BOUND, below. Otherwise (i.e. 1620 the IP address is BINDABLE for some other server, or a complete 1621 POLL was not possible) NAK the REQUEST. 1623 DISCUSSION: 1625 This means that if a server creates a binding for a client and 1626 fails to PUSH the information to any other server prior to 1627 undergoing a server failure, and if the client is powered off 1628 prior to the time when it will issue a REBINDING message, it 1629 will not get back the same lease when it is powered back on. 1631 The reasoning for this (and the difference from the REBINDING 1632 case below) is that in this case the server has no way to 1633 determine if the requested address in the INIT-REBOOT request 1634 is current or perhaps very old indeed. In the REBINDING case 1635 the client is currently using the address, so the client at 1636 least believes that it is current and not in use by some other 1637 client. In this case, however, no such assumption is possi- 1638 ble. 1640 In the case where a server which creates a binding fails prior to 1641 PUSHing the information about a lease to some other server, and 1642 the client which receives that binding makes it to a REBINDING 1643 request prior to either failing or being shutdown, it will get 1644 back the existing binding upon restart and INIT-REBOOT -- since 1645 the REBINDING will have caused a recovery of the binding informa- 1646 tion and that will have been distributed through a PUSH. 1648 o BINDABLE 1650 If the IP address is BINDABLE, then bind the IP address to the 1651 client, set the IP address BOUND, and update stable storage. 1652 Then, ACK the client, and finally perform a PUSH operation of the 1653 binding information to the other servers. 1655 o BOUND or EXPIRED 1657 If the IP address is BOUND or EXPIRED to the requesting client 1658 then set the IP address BOUND, update the expiration time, update 1659 stable storage, and ACK the client. Finally, perform a PUSH 1660 operation of the updated binding information to the other 1661 servers. If the IP address is BOUND or EXPIRED to some other 1662 client, then NAK the request. 1664 7.1.4. REQUEST/RENEWING 1666 Upon receipt of a RENEWAL message (which is unicast from the client 1667 to the server), it is expected that the server will have accurate 1668 information concerning the binding of the client. 1670 Perform the following actions if the IP address being renewed (i.e. 1671 the IP address in ciaddr) is in one of these states: 1673 o UNBINDABLE 1675 If the IP address is UNBINDABLE, then perform a COMPLETE POLL 1676 operation in an attempt to make the IP address BINDABLE. If the 1677 operation is successful, then respond as though the IP address 1678 were BINDABLE, below. If the results of the attempt to make the 1679 IP address BINDABLE resulted in a discovery that the IP address 1680 is now BOUND, then respond as for BOUND, below. 1682 If the IP address is determined to be BINDABLE for some other 1683 server, then NAK the request, and set the IP address to be 1684 UNAVAILABLE since this likely represents a duplicate allocation 1685 of an IP address (see Section 10, Open Questions, for details). 1687 Otherwise NAK the request. 1689 o BINDABLE 1691 If the IP address is BINDABLE, then bind the IP address to the 1692 client, set the IP address BOUND, and update stable storage. 1693 Then, ACK the client, and finally perform a PUSH operation of the 1694 binding information to the other servers. 1696 o BOUND or EXPIRED 1698 If the IP address is BOUND or EXPIRED to the requesting client 1699 then update the expiration time, update stable storage, and ACK 1700 the client. Finally, perform a PUSH operation of the updated 1701 binding information to the other servers. 1703 If the IP address is BOUND or EXPIRED to some other client, then 1704 NAK the request. 1706 Set the IP address to be UNAVAILABLE since this likely represents 1707 a duplicate allocation of an IP address (see Section 10, Open 1708 Questions, for details). 1710 7.1.5. REQUEST/REBINDING 1712 Upon receipt of a REBINDING message (which is broadcast from the 1713 client), the server will check to the state of the address requested 1714 for rebinding (i.e. the ciaddr). There are several cases possible: 1716 o UNBINDABLE 1718 If the IP address is UNBINDABLE, then perform a COMPLETE POLL 1719 operation in an attempt to make the IP address BINDABLE. If the 1720 operation is successful, then respond as though the IP address 1721 were BINDABLE, below. If the results of the attempt to make the 1722 IP address BINDABLE resulted in a discovery that the IP address 1723 is now BOUND, then respond as for BOUND, below. 1725 If the IP address is determined to be BINDABLE for some other 1726 server, then NAK the request. Set the IP address to be UNAVAIL- 1727 ABLE since this likely represents a duplicate allocation of an IP 1728 address (see Section 10, Open Questions, for details). 1730 If no information is returned from any server that this IP 1731 address is anything but UNBINDABLE, then consider the address 1732 BOUND to this client, and proceed as in BOUND below. 1734 DISCUSSION: 1736 This is one of the key points of the inter-server protocol. 1737 In this case, a server has created a binding and then failed 1738 prior to telling any other server about that binding. Eventu- 1739 ally, the client to whom that binding was made will attempt a 1740 REQUEST/REBINDING and contact a different server. That dif- 1741 ferent server will be able to determine nothing about that IP 1742 address. As far as can be determined, it is not BOUND to any 1743 client, and it is not BINDABLE for any other server. In this 1744 restricted case, the server will renew the lease for the 1745 client and move the IP address into the BOUND state -- and 1746 PUSH this information to the rest of the servers. 1748 How can this be safe? Well, remember that the client is 1749 presently using the IP address to make this request. In this 1750 limited case where a server crashes before PUSHing information 1751 about a BOUND IP address to any other server, the client to 1752 whom the IP address is BOUND is the only running machine with 1753 any record of that binding. In this case, the DHCP servers 1754 will accept that client's information about the binding as 1755 correct. 1757 o BINDABLE 1759 If the IP address is BINDABLE, then bind the IP address to the 1760 client, set the IP address BOUND, and update stable storage. 1761 Then, ACK the client, and finally perform a PUSH operation of the 1762 binding information to the other servers. 1764 o BOUND or EXPIRED 1766 If the IP address is BOUND or EXPIRED to the requesting client 1767 then update the expiration time, update stable storage, and ACK 1768 the client. Finally, perform a PUSH operation of the updated 1769 binding information to the other servers. 1771 If the IP address is BOUND or EXPIRED to some other client, then 1772 NAK the request. 1774 Set the IP address to be UNAVAILABLE since this likely represents 1775 a duplicate allocation of an IP address (see Section 10, Open 1776 Questions, for details). 1778 7.1.6. RELEASE 1780 When a RELEASE is received, an IP address will be in one of the fol- 1781 lowing states: 1783 o UNBINDABLE 1785 If the IP address is UNBINDABLE, then perform a POLL operation in 1786 an attempt to determine if this IP address is BOUND to any 1787 client. 1789 If the results of the POLL operation indicate that the IP address 1790 is now BOUND, then respond as for BOUND, below. 1792 If the IP address is determined to be BINDABLE for some other 1793 server, then NAK the request. Set the IP address to be UNAVAIL- 1794 ABLE since this likely represents a duplicate allocation of an IP 1795 address (see Section 6, Open Questions, for details). 1797 Otherwise, ignore the RELEASE. 1799 o BINDABLE 1801 If the IP address is BINDABLE, ignore the RELEASE. 1803 o BOUND or EXPIRED 1805 If the IP address is BOUND or EXPIRED to the requesting client 1806 set the IP address to be UNBINDABLE, update stable storage, and 1807 PUSH the information to the other servers. 1809 7.1.7. Lease Period Expiration 1811 When the lease period on a BOUND IP address expires, set the IP 1812 address to be EXPIRED and update stable storage. 1814 7.2. Address Information Messages 1815 7.2.1. REQUEST-ADDRESS-INFORMATION 1817 Build a REPLY-ADDRESS-INFORMATION message with binding information 1818 about each requested IP address. 1820 7.2.2. REPLY-ADDRESS-INFORMATION 1822 Compare the information received in the REPLY-ADDRESS-INFORMATION 1823 message with the information held in by this server. Determine the 1824 "most recent" information in the following way: 1826 Compare the current most recent binding data (known as the current 1827 data) to binding data just received from the requesting server (known 1828 as the new data). If the new last transaction time is: 1830 o Later than the current time 1832 Replace the current data with the new data. 1834 o Eearlier than the current time 1836 Leave the current data intact. 1838 o within epsilon (value TBD) of the current time 1840 If the responding server for the new data matches the last 1841 transaction server in the new data and the last transaction 1842 server in the current data, replace the current data with the 1843 new data. 1845 Otherwise, compare the last transactions. If they are the same, 1846 use the data that corresponds with the longest lease time. If 1847 they are different, use the data whose corresponding last trans- 1848 action appears first in the list of possible last transactions 1849 in Section 5.2.1. 1851 DISCUSSION: 1853 This situation with multiple address information responses (or 1854 requests) with essentially identical transaction times would occur 1855 because several servers sent out a response to a broadcast REBIND- 1856 ING request, and the lease period was not configured the same on 1857 all of them. There is absolutely no way to determine which of the 1858 ACK's the client accepted, and so using the information from the 1859 server which sent the latest lease expiration time is the only 1860 prudent course. 1862 7.2.3. REQUEST-ADDRESS-INFORMATION-BINDABLE 1864 For each IP address in the message, if that IP address is currently 1865 EXPIRED, set it to UNBINDABLE and update stable storage prior to 1866 building the REPLY-ADDRESS-INFORMATION message. Then build a REPLY- 1867 ADDRESS-INFORMATION message with binding information about each 1868 requested IP address. 1870 7.2.4. REQUEST-UPDATE-ADDRESS-INFORMATION 1872 Compare the binding data received in this message with the current 1873 binding information held by this server using the algorithm listed in 1874 REPLY-ADDRESS-INFORMATION, above. 1876 If the new information is more recent than the current information, 1877 replace the current information and return a REPLY-UPDATE-ADDRESS- 1878 INFORMATION message with an ACK. 1880 If the new information is not more recent than the current informa- 1881 tion, return the current information in a REPLY-UPDATE-ADDRESS- 1882 INFORMATION with a NAK-DATA. 1884 7.2.5. REPLY-UPDATE-ADDRESS-INFORMATION 1886 If the result is an ACK, do nothing. 1888 If the result is a NAK-DATA, compare the binding data received in 1889 this message with the current binding information held by this server 1890 using the algorithm in REPLY-ADDRESS-INFORMATION above. If the new 1891 information is more recent than the current information, replace the 1892 current information. Otherwise do nothing. 1894 7.2.6. REQUEST-ADDRESS-INFORMATION-DUMP 1896 Iterate though all of the IP addresses associated with this group, 1897 and send REQUEST-UPDATE-ADDRESS-INFORMATION messages to the request- 1898 ing server. When this operation is complete, send a REPLY-ADDRESS- 1899 INFORMATION-DUMP with an ACK to the requesting server. 1901 7.2.7. REPLY-ADDRESS-INFORMATION-DUMP 1903 Mark the dump in progress complete. 1905 7.2.8. REQUEST-BINDABLE-ADDRESS 1907 Build a REPLY-BINDABLE-ADDRESS message with TBD BINDABLE addresses. 1908 Set all of those addresses to be UNBINDABLE in this server, and prior 1909 to sending the message, update stable storage with the new state of 1910 these IP addresses. 1912 7.2.9. REPLY-BINDABLE-ADDRESS 1914 Add the BINDABLE IP addresses in the message to the list of BINDABLE 1915 IP addresses and update stable storage with this list. 1917 7.3. Configuration Messages 1919 7.3.1. REQUEST-GROUPS | REPLY-GROUPS 1921 Respond with a REPLY-GROUPS message. 1923 7.3.2. REQUEST-GROUP-CONFIG | REPLY-GROUP-CONFIG 1925 Respond with the group configuration in a REPLY-GROUP-CONFIG. 1927 7.3.3. REQUEST-GROUP-MEMBERSHIP | REPLY-GROUP-MEMBERSHIP 1929 Respond with the group membership in a REPLY-GROUP-MEMBERSHIP mes- 1930 sage. 1932 7.3.4. REQUEST-PROPOSE-GROUP-JOIN | REPLY-PROPOSE-GROUP-JOIN 1934 If there is an existing active proposal (i.e. one that has not timed 1935 out), reply with REPLY-PROPOSE-GROUP-JOIN and a NAK. Note that there 1936 is only one active proposal per group per server -- and that it is 1937 used by both the JOIN and LEAVE messages. 1939 If there is no existing active proposal or if the existing active 1940 proposal is from the sending server of the REQUEST-PROPOSE-GROUP- 1941 JOIN, then create a new (or updated) proposal and start (restart) the 1942 timer for that proposal. In that new proposal, increment the group 1943 sequence number. 1945 7.3.5. REQUEST-COMMIT-GROUP-JOIN | REPLY-COMMIT-GROUP-JOIN 1947 Make the outstanding proposal the current proposal. Reply with a 1948 REPLY-COMMIT-GROUP-JOIN message and an ACK. 1950 7.3.6. REQUEST-PROPOSE-GROUP-LEAVE | REPLY-PROPOSE-GROUP-LEAVE 1952 If there is an existing active proposal (i.e. one that has not timed 1953 out), reply with REPLY-PROPOSE-GROUP-LEAVE and a NAK. Note that 1954 there is only one active proposal per group per server -- and that it 1955 is used by both the JOIN and LEAVE messages. 1957 If there is no existing active proposal or if the existing active 1958 proposal is from the sending server of the REQUEST-PROPOSE-GROUP- 1959 LEAVE, then create a new (or updated) proposal and start (restart) 1960 the timer for that proposal. In that new proposal, increment the 1961 group sequence number. 1963 7.3.7. REQUEST-COMMIT-GROUP-LEAVE | REPLY-COMMIT-GROUP-LEAVE 1965 Make the outstanding proposal the current proposal. Reply with a 1966 REPLY-COMMIT-GROUP-JOIN message and an ACK. 1968 8. IP Address State Transitions 1970 The possible states of an IP address were defined in Section 3.2.2, 1971 and the state transition diagram appears there. The state transi- 1972 tions though which an IP address can move were discussed implicitly 1973 in Section 7 in the context of the receipt of DHCP messages from DHCP 1974 clients. However, an explicit examination of the processing required 1975 of a server by this protocol on each of the state transitions will 1976 serve to highlight some important aspects of this protocol. 1978 The IP address state transitions are handled in the following way: 1980 o UNBINDABLE -> BINDABLE 1982 A fundamental point and guarantee of this state transition dia- 1983 gram is that for an IP address to move from the UNBINDABLE state 1984 (where it is not owned by any server) to the BINDABLE state 1985 (where it is owned by a single server) requires the server seek- 1986 ing to own the IP address to contact all of the other servers in 1987 the group. It requires a COMPLETE POLL. 1989 The server attempting to move an IP address from the UNBINDABLE 1990 to the BINDABLE state must ask every other server in the group if 1991 it believes that the IP address is currently UNBINDABLE. If any 1992 server says that the IP address is either BINDABLE (i.e. it cur- 1993 rently owns the IP address) or BOUND (i.e. a client currently 1994 owns the IP address), then the server attempting to move the IP 1995 address from the UNBINDABLE to BINDABLE state MUST abandon the 1996 attempt. 1998 DISCUSSION: 2000 In addition (and this is important!) if the server attempting 2001 to move the IP address from the UNBINDABLE to the BINDABLE 2002 state fails to hear from some other server, then the attempt 2003 cannot complete. This means that if a server cannot communi- 2004 cate with every other server (due to communications failure, 2005 transient server failure, or network partition) then this 2006 state transition cannot be made. 2008 Thus, all addresses in the UNBINDABLE state will stay in that 2009 state while any server in the group is out of communication with 2010 the group for any reason at all. 2012 Of course, the detailed description of the protocol suggests that 2013 a server build up a supply of BINDABLE IP addresses so that in 2014 the event of server failure it has BINDABLE addresses that are 2015 available to offer to new DHCP clients. 2017 o BINDABLE -> BOUND 2019 Once an IP address is BINDABLE it may be BOUND to a client 2020 through the normal actions of the DHCP protocol. Once a server 2021 has received a DHCPREQUEST/SELECTING message from a client it can 2022 move the IP address into the BOUND state, update its stable stor- 2023 age, and reply with a DHCPACK message to the client. 2025 After the DHCPACK has been sent, the DHCP server MUST also 2026 attempt to update all servers in the group with information indi- 2027 cating that the IP address is now BOUND to a particular client. 2028 It must perform a PUSH operation with this information. 2030 DISCUSSION: 2032 In an ideal world, the server who created the binding would 2033 always succeed in updating all other servers in the group with 2034 the binding information. Then, in the event that the binding 2035 server failed at some later time, another server to whom the 2036 client could broadcast would receive a DHCPREQUEST/REBINDING 2037 request and could reply with updated binding information. 2039 However, there is obviously a window where a server can crash 2040 after sending a DHCPACK and prior to updating even one additional 2041 server. This protocol has been designed so that not only is the 2042 process of updating all of the servers in the group with informa- 2043 tion concerning a new binding "lazy" (i.e. performed after the 2044 actual binding is made), but also unnecessary for correct opera- 2045 tion. The protocol only requires that a server try to update the 2046 other servers -- not that it succeed at updating even one server. 2048 The protocol accomplishes this by allowing a server to respond to 2049 a DHCPREQUEST/REBINDING message from a client without any infor- 2050 mation having been propagated from the server who created the 2051 binding. Thus, a server who receives a rebinding request for an 2052 IP address about which it has no information must check with all 2053 available servers in the group, but in the absence of information 2054 to the contrary arriving within a relatively short timeout 2055 period, the server should respond to the rebinding request with 2056 an extension of the existing lease on the IP address. 2058 o BINDABLE -> UNBINDABLE 2060 A server can relinquish an IP address in the BINDABLE state that 2061 it owns simply by responding to requests for information about 2062 the IP address as if it were UNBINDABLE. No explicit action need 2063 be taken other than to respond correctly to POLL operations from 2064 other servers. 2066 o BOUND -> UNBINDABLE 2068 In order for an IP address to move from the BOUND to the UNBIND- 2069 ABLE state, client that owns the IP address (i.e. to which it is 2070 BOUND) must send a DHCPRELEASE message. In this case, the 2071 receiving server (which may or may not be the server who created 2072 original binding) will update its stable storage with information 2073 that the IP address is not currently BOUND by any client. It 2074 should then transmit this information to all other servers to 2075 which it can communicate at that time by performing a PUSH opera- 2076 tion. 2078 In the event that the server fails to update any other server 2079 with the new information about the IP address prior to undergoing 2080 some failure, then the worst that will happen is that the other 2081 servers will believe that an IP address is in the BOUND state 2082 when it need not be. Ultimately the lease on the IP address will 2083 expire. 2085 o BOUND -> EXPIRED 2087 Any server which has information concerning a BOUND IP address 2088 may determine that the lease on the IP address has expired, and 2089 after an appropriate grace period has elapsed, that the IP 2090 address should be EXPIRED. 2092 o EXPIRED -> UNBINDABLE 2094 In this case, all the server need do is to respond to request for 2095 information on this IP address in such a way that it is clear 2096 that (as far as this server knows) no client is using the IP 2097 address. If any server asks for information concerning this IP 2098 address, then the receiving server should set the IP address to 2099 be UNBINDABLE, update its stable storage, and respond to the 2100 requesting server. 2102 o EXPIRED -> BOUND 2104 If a server receives a message from a client and the IP address 2105 is EXPIRED, but was last BOUND to that client, then the IP 2106 address can be moved back into the BOUND state. This is possible 2107 because no other server can have attempted to make this IP 2108 address BINDABLE. If it had, the IP address would not be in the 2109 EXPIRED state anymore, but in the UNBINDABLE state (see the 2110 EXPIRED -> UNBINDABLE transition above). 2112 9. Server Initialization 2114 With regard to the inter-server protocol, there are two distinct 2115 forms of server initialization. Remember that group membership is 2116 persistent -- i.e. saved in stable storage. Given this, whenever a 2117 server initializes itself, it either has a record in its persistent 2118 storage of being a member of a group or it doesn't. Each of these 2119 cases is described below. 2121 9.1. No record of any group membership. 2123 Use the technique in Section 6.4, Determining the Available Groups, 2124 determine the groups to which the server should belong. 2126 Use the GROUP JOIN technique from Section 6.5 to join the appropriate 2127 groups. 2129 Then use the address information messages to build up a list of BIND- 2130 ABLE IP addresses, one for each address pool in each group. 2132 If insufficient IP addresses can be obtained using that technique, 2133 use the TRANSFER technique from Section 6.3 to acquire some BINDABLE 2134 IP addresses from some other server in the group. 2136 DISCUSSION: 2138 Just how many IP addresses should a server acquire? First, it 2139 should be configurable for each server. Second, it appears that 2140 all of the addresses should be acquired by one server or another. 2141 In any of the possible failure modes, it is better that the 2142 addresses not be UNBINDABLE -- since during transient server fail- 2143 ure the UNBINDABLE addresses will stay that way. 2145 The server should then initiate a DUMP operation (see Section 6.2) 2146 from each server in the group. 2148 9.2. The server believes that it is currently the member of a group. 2150 It is assumed that the list of groups to which this server belongs is 2151 held in stable storage. Thus group membership is persistent. 2153 When a server is restarted for any reason, for all of the groups for 2154 which it believes that it is currently a member, it should send 2155 REQUEST-GROUP-MEMBERSHIP messages to the a server in that group. It 2156 should use the reply to determine if it is or is not a member of that 2157 group, and take the appropriate action: 2159 o Still a member of the group 2161 The server should update its group specifier. 2163 It should revalidate the list of BINDABLE leases owned by this 2164 server if possible using a series of COMPLETE POLL operations 2165 (see Section 6.1). If responses cannot be obtained from all of 2166 the other group members, then assume that the current list of 2167 BINDABLE leases is all right. 2169 For every IP address where the current server is listed as the 2170 "last transaction server" in the state, use a POLL operation to 2171 determine the latest information about that IP address. 2173 Request a DUMP operation from each server in the group. This 2174 will cause each server to update the requester with address 2175 information messages for all bindings for which that server is 2176 listed as the last transaction server. 2178 o Not currently a member of the group 2180 The server should drop its current list of BINDABLE IP addresses 2181 associated with this group. 2183 The server should verify that the group is still the same group, 2184 i.e. that it still is associated with the same subnets. If it 2185 is, it should rejoin the group. 2187 It should rebuild its list of BINDABLE IP addresses using a COM- 2188 PLETE POLL operation. 2190 Request a DUMP operation from each server in the group. This 2191 will cause each server to update the requester with address 2192 information messages for all bindings for which that server is 2193 listed as the last transaction server. 2195 10. Open questions 2197 The following open questions set off by the "*" character remain from 2198 the original draft: draft-ietf-dhc-interserver-00.txt. Comments 2199 have been added in square brackets []. Additional open questions new 2200 to draft: draft-ietf-dhc-interserver-01.txt are listed with the "o" 2201 character. 2203 * Are these the only cases in which binding information may become 2204 out of date? 2206 * Are these solutions correct? 2208 * INIT case needs EXISTING/NEW binding option [done] 2210 * Because of the "lazy synchronization" of DHCP servers, it is pos- 2211 sible that some servers may know about an existing binding while 2212 others do not. As an optimization, DHCP clients should be able 2213 to select between existing bindings and new bindings in DHCPOFFER 2214 messages from servers. A new option could be defined to indicate 2215 to the client whether a DHCPOFFER message represents a new or an 2216 existing binding. 2218 [A great idea, but requires client changes to be really effec- 2219 tive. Still, no reason not to put it in the servers now.] 2221 * Each server must know all other servers. 2223 Requiring each server to know about every other server imposes 2224 additional administrative overhead in the configuration of DHCP 2225 servers. However, this configuration overhead is probably mini- 2226 mal relative to any other configuration required for DHCP 2227 servers. 2229 [The configuration messages provide a step towards an answer 2230 here.] 2232 * Each server must contact all other servers before reassigning an 2233 address. 2235 [This is fundamental if we wish to use the "lazy synchronization" 2236 above -- you can't get one without the other.] 2238 There is a potential issue here in which no new DHCP clients can 2239 be configured if any of the DHCP servers cannot be contacted. 2240 Servers can mitigate this problem by maintaining a list of pre- 2241 checked addresses that can be allocated without contacting all 2242 other servers at the time of address allocation. 2244 The protocol may need additional definition of specific actions 2245 on the part of DHCP servers in response to situations in which a 2246 server cannot contact all other servers. [Added a lot of these 2247 in this draft.] 2249 * Servers cooperating to achieve "fair" distribution of available 2250 addresses. 2252 The protocol may need additional mechanisms or definition of 2253 default behavior through which servers cooperate among themselves 2254 to ensure that each has a sufficient pool of prechecked-addresses 2255 on each network. 2257 [Not yet addressed, and needs work.] 2259 * User intervention in case of database incoherency. 2261 Fixing the collective database on the DHCP servers in case of a 2262 problem could be a *real* nightmare. 2264 * Potential deadlock in checking address - suppose two servers 2265 check the same address for reassignment simultaneously? 2267 [Needs some work, but easily solved by a bit of work in the 2268 address information messages specification.] 2270 * Potential configuration for new server? 2272 One ancillary use of the inter-server protocol might be in con- 2273 figuring new DHCP servers. Suppose the inter-server protocol 2274 were extended to allow download of a server's configuration file 2275 and to allow addition of a new server to the list of DHCP 2276 servers. A new server might be configured by simply giving it 2277 the address of an existing server. The new server could then 2278 download a list of all other known servers, the pool of candidate 2279 addresses, any special configuration information (e.g., vendor 2280 class information) and the existing bindings. The new server 2281 could also announce itself to all of the other existing servers. 2283 [Pieces of this are in the current draft, principally in the con- 2284 figuration messages. At this stage, a server can figure out 2285 which groups correspond with which subnets -- and can therefore 2286 determine which groups it wishes to join. It must have a priori 2287 configuration information about the allocatable IP addresses for 2288 each subnet, and all other configuration information. 2290 Downloading configuration files would not be a great idea for 2291 servers which don't use configuration files. I do believe that 2292 we could easily extend the configuration messages to support 2293 information about ranges of addresses in each subnet, and go a 2294 long way toward not only making the protocol more flexible but 2295 also more correct.] 2297 * DHCP server maintenance 2299 There is likely an opportunity for the development of a server 2300 management tool that would download the database information from 2301 all servers and check for conflicts/inconsistencies such as 2302 assignment of an IP address to multiple clients, bindings that 2303 are not replicated across all servers, bindings that have incon- 2304 sistent lease expiration times, etc. 2306 o Group-id selection. 2308 The group-id's for various groups need to be sufficiently unique 2309 that no server will ever be a member of two groups with the same 2310 group-id. No mechanism is provided yet in this protocol to gen- 2311 erate group-id's which conform to this requirement. 2313 Possibly a group-id can be synthesized in some manner to ensure 2314 that they conform to this requirement. 2316 o The original draft discussed the requirement for each server to 2317 have a synchronized clock using available time synchronization 2318 protocols. That requirement has been removed in this draft, and 2319 in its place all times are sent in "seconds from now" as a signed 2320 32 bit number. There is clearly a bit of additional complexity 2321 required to do this, but I have been so impressed at how well 2322 DHCP works with "relative" instead of "absolute" time that I felt 2323 the complexity of using relative time worth it (since using syn- 2324 chronized time is not without its own complexities). 2326 o There is clearly a need to batch multiple updates, and litle men- 2327 tion has yet been made as to how to achieve that batch operation. 2329 o What should the actual packet format look like? 2331 There is nothing in this draft which specifies the details of the 2332 packet format. One approach would be to format the packets as a 2333 small delta from the current DHCP packets, and use presently one 2334 or more undefined dhcp-message-type values for the different pro- 2335 tocol messages. The data in the packets could be easily format- 2336 ted as options. All current DHCP servers have parsers built in 2337 which can handle the current packet formats, and so why invent 2338 yet another format when this one will do as well? 2340 o Do we really need TCP? 2342 Certainly the initial focus on this protocol has all of the 2343 servers using TCP to each other. Within the confines of the 2344 actual draft I have not altered that approach, although I feel 2345 that UDP packets would be as effective. The gains from having a 2346 connection "always up" seem to me to be outweighed by the diffi- 2347 culty of keeping a connection "always up" in the face of tran- 2348 sient server failures. With proper care, idempotent UDP packets 2349 can solve the problems this protocol needs to solve with no addi- 2350 tional complexity beyond retransmission timeouts -- which are 2351 needed anyway if a server is down and the TCP connection is bro- 2352 ken. 2354 o UNAVAILABLE IP addresses 2356 There are several cases where a server can determine that some 2357 sort of serious error has occurred, and apparently an IP address 2358 is in an inconsistent state. In these cases, the server should 2359 make the IP address UNAVAILABLE -- i.e. no other server should be 2360 able to operate on it. Just what is necessary to make this hap- 2361 pen? Could it be a passive response to address information mes- 2362 sages, or must it involve a complete push to all of the other 2363 servers, and a new IP address state? 2365 11. Acknowledgments 2367 Many of the ideas in this proposal are due to Jeff Mogul, Greg Min- 2368 shall, Rob Stevens, Walt Wimer, Ted Lemon and the DHC working group. 2369 Thanks to all who have contributed their ideas and participated in 2370 the discussion of the inter-server protocol. 2372 At American Internet, Brad Parker and Mark Stapp have been key con- 2373 tributors to the design discussions that have resulted in our contri- 2374 butions to the this draft. They have each invested many hours of 2375 work in this protocol. 2377 12. References 2379 [1] Droms, R., "draft-ietf-dhc-dhcp-09.txt", Work in progress, 2380 December 1996. 2382 13. Security Considerations 2384 Minimal security would be provided by configuring every server in a 2385 group with the IP addresses of the allowable servers that could ever 2386 join that group. 2388 Other, more powerful security approaches are TBD. 2390 14. Author's information 2392 Ralph Droms 2393 Computer Science Department 2394 323 Dana Engineering 2395 Bucknell University 2396 Lewisburg, PA 17837 2398 Phone: (717) 524-1145 2399 EMail: droms@bucknell.edu 2401 Kim Kinnear 2402 American Internet Corporation 2403 4 Preston Ct. 2404 Bedford, MA 01730-2334 2406 Phone: (617) 276-4587 2407 EMail: kinnear@american.com