idnits 2.17.1 draft-ietf-nfsv4-migration-issues-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 11, 2012) is 4399 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530) ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft EMC 4 Intended status: Informational P. Shivam 5 Expires: October 13, 2012 C. Lever 6 B. Baker 7 ORACLE 8 April 11, 2012 10 NFSv4 migration: Implementation experience and spec issues to resolve 11 draft-ietf-nfsv4-migration-issues-00 13 Abstract 15 The migration feature of NFSv4 provides for moving responsibility for 16 a single filesystem from one server to another, without disruption to 17 clients. Recent implementation experience has shown problems in the 18 existing specification for this feature. This document discusses the 19 issues which have arisen and explores the options available for 20 curing the issues via clarification and correction of the NFSv4.0 and 21 NFSv4.1 specifications. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on October 13, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. NFSv4.0 Implementation Experience . . . . . . . . . . . . . . 5 60 3.1. Implementation issues . . . . . . . . . . . . . . . . . . 5 61 3.1.1. Failure to free migrated state on client reboot . . . 5 62 3.1.2. Server reboots resulting in a confused lease 63 situation . . . . . . . . . . . . . . . . . . . . . . 6 64 3.1.3. Client complexity issues . . . . . . . . . . . . . . . 7 65 3.2. Sources of Protocol difficulties . . . . . . . . . . . . . 9 66 3.2.1. Issues with nfs_client_id4 generation and use . . . . 9 67 3.2.2. Issues with lease proliferation . . . . . . . . . . . 11 68 4. Issues to be resolved in NFSv4.0 . . . . . . . . . . . . . . . 11 69 4.1. Possible changes to nfs_client_id4 client-string . . . . . 11 70 4.2. Possible changes to handle differing nfs_client_id4 71 string values . . . . . . . . . . . . . . . . . . . . . . 12 72 4.3. Other issues within migration-state sections . . . . . . . 12 73 4.4. Issues within other sections . . . . . . . . . . . . . . . 13 74 5. Proposed resolution of NFSv4.0 protocol difficulties . . . . . 13 75 5.1. Proposed changes: nfs_client_id4 client-string . . . . . . 13 76 5.2. Client-string Models (AS PROPOSED) . . . . . . . . . . . . 14 77 5.2.1. Non-Uniform Client-string Model . . . . . . . . . . . 15 78 5.2.2. Uniform Client-string Model . . . . . . . . . . . . . 16 79 5.2.3. Trunking Determination in the Uniform 80 Client-string Model . . . . . . . . . . . . . . . . . 17 81 5.3. Proposed changes: merged (vs. synchronized) leases . . . . 20 82 5.4. Other proposed changes to migration-state sections . . . . 21 83 5.4.1. Proposed changes: Client ID migration . . . . . . . . 22 84 5.4.2. Proposed changes: Callback re-establishment . . . . . 23 85 5.4.3. Proposed changes: NFS4ERR_LEASE_MOVED rework . . . . . 23 86 5.5. Proposed changes to other sections . . . . . . . . . . . . 23 87 5.5.1. Proposed changes: callback update . . . . . . . . . . 23 88 5.5.2. Proposed changes: clientid4 handling . . . . . . . . . 24 89 5.6. Migration, Replication and State (AS PROPOSED) . . . . . . 25 90 5.6.1. Migration and State . . . . . . . . . . . . . . . . . 26 91 5.6.2. Replication and State . . . . . . . . . . . . . . . . 28 92 5.6.3. Notification of Migrated Lease . . . . . . . . . . . . 28 93 5.6.4. Migration and the Lease_time Attribute . . . . . . . . 30 94 6. Results of proposed changes for NFSv4.0 . . . . . . . . . . . 31 95 6.1. Results: Failure to free migrated state on client 96 reboot . . . . . . . . . . . . . . . . . . . . . . . . . . 31 97 6.2. Results: Server reboots resulting in confused lease 98 situation . . . . . . . . . . . . . . . . . . . . . . . . 32 99 6.3. Results: Client complexity issues . . . . . . . . . . . . 33 100 6.4. Result summary . . . . . . . . . . . . . . . . . . . . . . 34 101 7. Issues for NFSv4.1 . . . . . . . . . . . . . . . . . . . . . . 34 102 8. Security Considerations . . . . . . . . . . . . . . . . . . . 35 103 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 104 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35 105 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 35 106 11.1. Normative References . . . . . . . . . . . . . . . . . . . 35 107 11.2. Informative References . . . . . . . . . . . . . . . . . . 36 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36 110 1. Introduction 112 This document is in the informational category, and while the facts 113 it reports may have normative implications, any such normative 114 significance reflects the readers' preferences. For example, we may 115 report that the reboot of a client with migrated state results in 116 state not being promptly cleared and that this will prevent granting 117 of conflicting lock requests at least for the lease time, which is a 118 fact. While it is to be expected that client and server implementers 119 will judge this to be a situation that is best avoided, the judgment 120 as to how pressing this issue should be considered is a judgment for 121 the reader, and eventually the nfsv4 working group to make. 123 We do explore possible ways in which such issues can be avoided, with 124 minimal negative effects, in the expectation that the working group 125 will choose to address these issues, but the choice of exactly how to 126 address these is best given effect in one or more standards-track 127 documents and/or errata. 129 This document focuses on NFSv4.0, since that is where the majority of 130 implementation experience has been. Nevertheless, there is some 131 discussion of the implications of the NFSv4.0 experience for 132 migration in NFSv4.1. 134 2. Conventions 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in [RFC2119]. 140 In the context of this informational document, these normative 141 keywords will always occur in the context of a quotation, most often 142 direct but sometimes indirect. The context will make it clear 143 whether the quotation is from: 145 o The current definitive definition of the NFSv4.0 protocol, whether 146 that is the original NFSv4.0 specification [RFC3530], the current 147 pending draft of RFC3530bis expected to become the definitive 148 definition of NFSv4.0 once certain procedural steps are taken 149 [cur-v4.0-bis], or an eventual RFC3530bis RFC, taking over the 150 role of definitive definition of NFSv4.0 from RFC3530. 152 As the identity of that document may change during the lifetime of 153 this document, we will often refer to the current or pending 154 definition of NFSv4.0 and quote from portions of the documents 155 that are identical among all existing drafts. Given that RFC3530 156 and all RFC3530bis drafts agree as to the issues under discussion, 157 this should not cause undue difficulty. Note that to simplify 158 document maintenance, section names rather than section numbers 159 are used when referring to sections in existing documents so that 160 only minimal changes will be necessary as the identity of the 161 document defining NFSv4.0 changes. 163 o The current definitive definition of the NFSv4.1 protocol 164 [RFC5661]. 166 o A proposed or possible text to serve as a replacement for the 167 current definitive document text. Sometimes, a number of possible 168 alternative texts may be listed and benefits and detriments of 169 each examined in turn. 171 3. NFSv4.0 Implementation Experience 173 3.1. Implementation issues 175 Note that the examples below reflect current experience which arises 176 from clients implementing the recommendation to use different 177 nfs_client_id4 id strings for different server addresses, i.e. using 178 what is later referred to herein as the "non-uniform client-string 179 model" 181 This is simply because that is the experience implementers have had. 182 The reader should not assume that in all cases, this practice is the 183 source of the difficulty. It may be so in some cases but clearly it 184 is not in all cases. 186 3.1.1. Failure to free migrated state on client reboot 188 The following sort of situation has proved troublesome: 190 o A client C establishes a clientid4 C1 with server ABC specifying 191 an nfs_client_id4 with "id" value "C-ABC" and verifier 0x111. 193 o The client begins to access files in filesystem F on server ABC, 194 resulting in generating stateids S1, S2, etc. under the lease for 195 clientid C1. It may also access files on other filesystems on the 196 same server. 198 o The filesystem is migrated from ABC to server XYZ. When 199 transparent state migration is in effect, stateids S1 and S2 and 200 clientid4 C1 are now available for use by client C at server XYZ. 201 So far, so good. 203 o Client C reboots and attempts to access data on server XYZ, 204 whether in filesystem F or another. It does a SETCLIENTID with an 205 nfs_client_id4 with "id" value "C-XYZ" and verifier 0x112. There 206 is thus no occasion to free stateids S1 and S2 since they are 207 associated with a different client name and so lease expiration is 208 the only way that they can be gotten rid of. 210 Note here that while it seems clear to us in this example that C-XYZ 211 and C-ABC are from the same client, the server has no way to 212 determine the structure of the "opaque" id. In the protocol, it 213 really is opaque. Only the client knows which nfs_client_id4 values 214 designate the same client on a different server. 216 3.1.2. Server reboots resulting in a confused lease situation 218 Further problems arise from scenarios like the following. 220 o Client C talks to server ABC using an nfs_client_id4 id like 221 "C-ABC" and verifier v1. As a result a lease with clientid4 c.i 222 is established: {v1, "C-ABC", c.i}. 224 o fs_a1 migrates from server ABC to server XYZ along with its state. 225 Now server XYZ also has a lease: {v1, "C-ABC", c.i}. 227 o Server ABC reboots. 229 o Client C talks to server ABC using an nfs_client_id4 id like 230 "C-ABC" and verifier v1. As a result a lease with clientid4 c.j 231 is established: {v1, "C-ABC", c.j}. 233 o fs_a2 migrates from server ABC to server XYZ. Now server XYZ also 234 has a lease: {v1, "C-ABC", c.j}. 236 o Now server XYZ has two leases that match {v1, "C-ABC", *}, when 237 the protocol clearly assumes there can be only one. 239 Note that if the client used "C" (rather than "C-ABC") as the 240 nfs_client_id4 id string, the exact same situation would arise. 242 One of the first cases in which this sort of situation has resulted 243 in difficulties is in connection with doing a SETCLIENTID for 244 callback update. 246 The SETCLIENTID for callback update only includes the nfs_client_id4, 247 assuming there can only be one such with a given nfs_client_id4 248 value. If there are multiple, confirmed client records with 249 identical nfs_client_id4 values, there is no way to map the callback 250 update request to the correct client record. 252 One possible accommodation for this particular issue that has been 253 used is to add a RENEW operation along with SETCLIENTID (on a 254 callback update) to disambiguate the client. 256 When the client updates the callback info to the destination, the 257 client would, by convention, send a compound like this: 259 { RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb } 261 The presence of the clientid4 in the compound would allow the server 262 to differentiate among the various leases that it knows of, all with 263 the same nfs_client_id4 value. 265 While this would be a reasonable patch for an isolated protocol 266 weakness, interoperable clients and servers would require that the 267 protocol truly be updated to allow such a situation, specifically 268 that of multiple clientid4's with the same nfs_client_id4 value. The 269 protocol is currently designed and implemented assuming this can't 270 happen. We need to either prevent the situation from happening, or 271 fully adapt to the possibilities which can arise. See Section 4 for 272 a discussion of such issues. 274 3.1.3. Client complexity issues 276 Consider the following situation: 278 o There are a set of clients C1 through Cn accessing servers S1 279 through Sm. Each server manages some significant number of 280 filesystems with the filesystem count L being significantly 281 greater than m. 283 o Each client Cx will access a subset of the servers and so will 284 have up to m clientid's, which we will call Cxy for server Sy. 286 o Now assume that for load-balancing or other operational reasons, 287 numbers of filesystems are migrated among the servers. As a 288 result, each client-server pair will have up to m clientid's and 289 each client will have up to m**2 clientids. If we add the 290 possibility of server reboot, the only bound on a client's 291 clientid count is L. 293 Now, instead of a clientid4 identifying a client-server pair, we have 294 many more entities for the client to deal with. In addition, it 295 isn't clear how new state is to be incorporated in this structure. 297 The limitations of the migrated state (inability to be freed on 298 reboot) would argue against adding more such state but trying to 299 avoid that would run into its own difficulties. For example, a 300 single lockowner string presented under two different clientids would 301 appear as two different entities. 303 Thus we have to choose between: 305 o indefinite prolongation of foreign clientid's even after all 306 transferred state is gone. 308 o having multiple requests for the same lockowner-string-named 309 entity carried on in parallel by separate identically named 310 lockowners under different clientid4's 312 o Adding serialization at the lock-owner string level, in addition 313 to that at the lockowner level. 315 In any case, we have gone (in adding migration as it was described) 316 from a situation in which 318 o Each client has a single clientid4/lease for each server it talks 319 to. 321 o Each client has a single nfs_client_id4 for each server it talks 322 to. 324 o Every state id can be mapped to an associated lease based on the 325 server it was obtained from. 327 To one in which 329 o Each client may have multiple clientid4's for a single server. 331 o For each stateid, the client must separately record the clientid4 332 that it is assigned to, or it must manage separate "state blobs" 333 for each fsid and map those to clientid4's. 335 o Before doing an operation that can result in a stateid, the client 336 must either find a "state blob" based on fsid or create a new one, 337 possibly with a new clientid4. 339 o There may be multiple clientid4's all connected to the same server 340 and using the same nfs_clientid4. 342 This sort of additional client complexity is troublesome and needs to 343 be eliminated. 345 3.2. Sources of Protocol difficulties 347 3.2.1. Issues with nfs_client_id4 generation and use 349 The current definitive definition of the NFSv4.0 protocol [RFC3530], 350 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 351 agree. The section entitled "Client ID" says: 353 The second field, id is a variable length string that uniquely 354 defines the client. 356 There are two possible interpretations of the phrase "uniquely 357 defines" in the above: 359 o The relation between strings and clients is a function from such 360 strings to clients so that each string designates a single client. 362 o The relation between strings and clients is a bijection between 363 such strings and clients so that each string designates a single 364 client and each client is named by a single string. 366 The first interpretation would make these client-strings like phone 367 numbers (a single person can have several) while the second would 368 make them like social security numbers. 370 Endless debate about the true meaning of "uniquely defines" in this 371 context is quite possible but not very helpful. The following points 372 should be noted though: 374 o The second interpretation is more consistent with the way 375 "uniquely defines" is used elsewhere in the spec. 377 o The spec as now written intends the first interpretation (or is 378 internally inconsistent). In fact, it recommends, although it 379 doesn't "RECOMMEND" that a single client have at least as many 380 client-strings as server addresses that it interacts with. It 381 says, in the third bullet point regarding construction of the 382 string (which we shall henceforth refer to as client-string-BP3): 384 The string should be different for each server network address 385 that the client accesses, rather than common to all server 386 network addresses. 388 o If internode interactions are limited to those between a client 389 and its servers, there is no occasion for servers to be concerned 390 with the question of whether two client-strings designate the same 391 client, so that there is no occasion for the difference in 392 interpretation to matter. 394 o When transparent migration of client state occurs between two 395 servers, it becomes important to determine when state on two 396 different servers is for the same client or not, and this 397 distinction becomes very important. 399 Given the need for the server to be aware of client identity with 400 regard to migrated state, either client-string construction rules 401 will have to change or there will be need to get around current 402 issues, or perhaps a combination of these two will be required. 403 Later sections will examine the options and propose a solution. 405 One consideration that may indicate that this cannot remain exactly 406 as it is today has to do with the fact that the current explanation 407 for this behavior is not correct. The current definitive definition 408 of the NFSv4.0 protocol [RFC3530], and the current pending draft of 409 RFC3530bis [cur-v4.0-bis] both agree. The section entitled "Client 410 ID" says: 412 The reason is that it may not be possible for the client to tell 413 if the same server is listening on multiple network addresses. If 414 the client issues SETCLIENTID with the same id string to each 415 network address of such a server, the server will think it is the 416 same client, and each successive SETCLIENTID will cause the server 417 to begin the process of removing the client's previous leased 418 state. 420 In point of fact, a "SETCLIENTID with the same id string" sent to 421 multiple network addresses will be treated as all from the same 422 client but will not "cause the server to begin the process of 423 removing the client's previous leased state" unless the server 424 believes it is a newer instance of the same client, i.e. if the id is 425 the same and there is a different verifier. If the client does not 426 reboot, the verifier should not change. If it does reboot, the 427 verifier will change, and the server should "begin the process of 428 removing the client's previous leased state. 430 The situation of multiple SETCLIENTID requests received by a server 431 on multiple network addresses is exactly the same, from the protocol 432 design point of view, as when multiple (i.e. duplicate) SETCLIENTID 433 requests are received by the server on a single network address. The 434 same protocol mechanisms that prevent erroneous state deletion in the 435 latter case prevent it in the former case. There is no reason for 436 special handling of the multiple-network-appearance case, in this 437 regard. 439 3.2.2. Issues with lease proliferation 441 It is often felt that this is a consequence of the client-string 442 construction issues, and it is certainly the case that the two are 443 closely connected in that non-uniform client-strings make it 444 impossible for the server to appropriately combine leases from the 445 same client. See Section 5.2.1 for a discussion of non-uniform 446 client-strings. 448 However, even where the server could combine leases from the same 449 client, it needs to be clear how and when it will do so, so that the 450 client will be prepared. These issues will have to be addressed at 451 various places in the spec. 453 This could be enough only if we are prepared to do away with the 454 "should" recommending non-uniform client-strings and replace it with 455 a "should not" or even a "SHOULD NOT". Current client implementation 456 patterns make this an unpalatable choice for use as a general 457 solution, but it is reasonable to "RECOMMEND" this choice for a well- 458 defined subset of clients. One alternative would be to create a way 459 for the server to infer from client behavior which leases are held by 460 the same client and use this information to do appropriate lease 461 mergers. Prototyping and detailed specification work has shown that 462 this could be done but the resulting complexity is such that a better 463 choice is to "RECOMMEND" use of the uniform model for clients 464 supporting the migration feature. 466 4. Issues to be resolved in NFSv4.0 468 4.1. Possible changes to nfs_client_id4 client-string 470 The fact that the reason given in client-string-BP3 is not valid 471 makes the existing "should" insupportable. We can't either 473 o Keep a reason we know is invalid. 475 o Keep saying "should" without giving a reason. 477 What are often presented as reasons that motivate use of the non- 478 uniform model always turn out to be cases in which, if the uniform 479 model were used, the server will treat a client which accesses that 480 server via two different IP addresses as part of a single client, as 481 it in fact is. This may be disconcerting to a client unaware that 482 the two IP addresses connect to the same server. This is thus not a 483 reason to use the non-uniform model but rather an illustration of the 484 fact that those using the uniform model must use server behavior to 485 determine whether any trunking of IP addresses exists, as is 486 described in Section 5.2.2. 488 It is always possible that a valid new reason will be found, but so 489 far none has been proposed. Given the history, the burden of proof 490 should be on those asserting the validity of a proposed new reason. 492 So we will assume for now that the "should" will have to go. The 493 question is what to replace it with. 495 o We can't say "MUST NOT", despite the problems this raises for 496 migration since this is pretty late in the day for such a change. 497 Many currently operating clients obey the existing "should". 498 Similar considerations would apply for "SHOULD NOT" or "should 499 not". 501 o Dropping client-string-BP3 entirely is a possibility but, given 502 the context and history, it would just be a confusing version of 503 "SHOULD NOT". 505 o Using "MAY" would clearly specify that both ways of doing this are 506 valid choices for clients and that servers will have to deal with 507 clients that make either choice. 509 o This might be modified by a "SHOULD" (or even a "MUST") for 510 particular groups of clients. 512 o There will have to be some text explaining why a client might make 513 either choice but, except for the particular cases referred to 514 above, we will have to make sure that it is truly descriptive, and 515 not slanted in either direction. 517 4.2. Possible changes to handle differing nfs_client_id4 string values 519 Given the difficulties caused by having different nfs_client_id4 520 client-string values for the same client, we have two choices: 522 o Deprecate the existing treatment and basically say the client is 523 on its own doing migration, if it follows it. 525 o Introduce a way of having the client provide client identity 526 information to the server, if it can be done compatibly while 527 staying within the bounds of v4.0. 529 4.3. Other issues within migration-state sections 531 There are a number of issues where the existing text is unclear 532 and/or wrong and needs to be fixed in some way. 534 o Lack of clarity in the discussion of moving clientids (as well as 535 stateids) as part of moving state for migration. 537 o The discussion of synchronized leases is wrong in that there is no 538 way to determine (in the current spec) when leases are for the 539 same client and also wrong in suggesting a benefit from leases 540 synchronized at the point of transfer. What is needed is merger 541 of leases, which is necessary to keep client complexity 542 requirements from getting out of hand. 544 o Lack of clarity in the discussion of LEASE_MOVED handling. 546 4.4. Issues within other sections 548 There are a number of cases in which certain sections, not 549 specifically related to migration require additional clarification. 550 This is generally because text that is clear in a context in which 551 leases and clientids are created in one place and live there forever 552 may need further refinement in the more dynamic environment that 553 arises as part of migration. 555 Some examples: 557 o Some people are under the impression that updating callback 558 endpoint information for an existing client, which is part of the 559 client's handling of migration, may cause the destination server 560 to free existing state. There needs to be additions to clarify 561 the situation. 563 o The handling of the sets of clientid4's maintained by each server 564 needs to be clarified. In particular, the issue of how the client 565 adapts to the presumably independent and uncoordinated clientid4 566 sets needs to be clearly addressed 568 o Statements regarding handling of invalid clientid4's need to be 569 clarified and/or refined in light of the possibilities that arise 570 due to lease motion and merger. 572 5. Proposed resolution of NFSv4.0 protocol difficulties 574 5.1. Proposed changes: nfs_client_id4 client-string 576 We propose replacing client-string-BP3 with the following text and 577 adding the following proposed Section 5.2 to provide implementation 578 guidance. 580 o The string MAY be different for each server network address that 581 the client accesses, rather than common to all server network 582 addresses. The considerations that might influence a client to 583 use different strings for each are explained in Section 5.2. 585 o Despite the use of the word "string" for this identifier, and the 586 fact that using strings will often be convenient, it should be 587 understood that the protocol defines this as opaque data. In 588 particular, those receiving such an id should not assume that it 589 will be in UTF-8 format nor should they reject it if it is not. 591 5.2. Client-string Models (AS PROPOSED) 593 One particular aspect of the construction of the nfs4_client_id4 594 string has proved recurrently troublesome. The client has a choice 595 of: 597 o Presenting the same id string to each server address accessed. 598 This is referred to as the "uniform client-string model" and is 599 discussed in Section 5.2.2. 601 o Presenting a different id string to each server address accessed. 602 This is referred to as the "non-uniform client-string model" and 603 is discussed in Section 5.2.1. 605 Construction of the client-string has been a troublesome issue 606 because of the way in which the NFS protocols have evolved. 608 o NFSv3 as a stateless protocol had no need to identify the state 609 shared by a particular client-server pair. Thus there was no 610 occasion to consider the question of whether a set of requests 611 come from the same client, or whether two server IP addresses are 612 connected to the same server. As the environment was one in which 613 the user supplied the target server IP address as part of 614 incorporating the remote filesystem in the client's file name 615 space, there was no occasion to take note of server trunking. 616 Within a stateless protocol, the situation was symmetrical. The 617 client has no server identity information and the server has no 618 client identity information. 620 o NFSv4.1 is a stateful protocol with full support for client and 621 server identity determination. This enables the server to be 622 aware when two requests come from the same client (they are on 623 sessions sharing a clientid4) and the client to be aware when two 624 server IP addresses are connected to the same server (they return 625 the same server name in responding to an EXCHANGE_ID). 627 NFSv4.0 is unfortunately halfway between these two. The two client- 628 string models have arisen in attempts to deal with the changing 629 requirements of the protocol as implementation has proceeded and 630 features that were not very substantial in [RFC3530], got more 631 substantial. 633 o In the absence of any implementation of the fs_locations-related 634 features (replication, referral, and migration), the situation is 635 very similar to that of NFSv3, with the addition of state but with 636 no concern to provide accurate client and server identity 637 determination. This is the situation that gave rise to the non- 638 uniform client-string model. 640 o In the presence of replication and referrals, the client may have 641 occasion to take advantage of knowledge of server trunking 642 information. Even more important, migration, by transferring 643 state among servers, causes difficulties for the non-uniform 644 client-string model, in that the two different client-strings sent 645 to different IP addresses may wind up on the same IP address, 646 adding confusion. 648 Both models have to deal with the asymmetry in client and server 649 identity information between client and server. Each seeks to make 650 the client's and the server's views match. In the process, each 651 encounters some combination of inelegant protocol features and/or 652 implementation difficulties. The choice of which to use is up to the 653 client implementer and the sections below try to give some useful 654 guidance. 656 5.2.1. Non-Uniform Client-string Model 658 The non-uniform client-string model is an attempt to handle these 659 matters in NFSv4.0 client implementations in as NFSv3-like a way as 660 possible. 662 For a client using the non-uniform model, all internal recording of 663 clientid4 values is to include, whether explicitly or implicitly, the 664 server IP address so that one always has an (IP-address, clientid4) 665 pair. Two such pairs from different servers are always distinct even 666 when the clientid4 values are the same, as they may occasionally be. 667 In this model, such equality is always treated as simple 668 happenstance. 670 Making the client-string different on different servers means that a 671 server has no way of tying together information from the same client 672 and so will treat a single client as multiple clients with multiple 673 leases for each server network address. Since there is no way in the 674 protocol for the client to determine if two network addresses are 675 connected to the same server, the resulting lack of knowledge is 676 symmetrical and can result in simpler client implementations in which 677 there is a single clientid/lease per server network addresses. 679 Support for migration, particularly with transparent state migration, 680 is more complex in the case of non-uniform client-strings. For 681 example, migration of a lease can result in multiple leases for the 682 same client accessing the same server addresses, vitiating many of 683 the advantages of this approach. Therefore, client implementations 684 that support migration with transparent state migration SHOULD NOT 685 use the non-uniform client-string model. 687 5.2.2. Uniform Client-string Model 689 When the client-string is kept uniform, the server has the basis to 690 have a single clientid4/lease for each distinct client. The problem 691 that has to be addressed is the lack of explicit server identity 692 information, which is made available in NFSv4.1. 694 When the same client-string is given to multiple IP addresses, the 695 client can determine whether two IP addresses correspond to a single 696 server, based on the server's behavior. This is the inverse of the 697 strategy adopted for the non-uniform model in which different server 698 IP addresses are told about different clients, simply to prevent a 699 server from manifesting behavior that is inconsistent with there 700 being a single server for each IP address, in line with the 701 traditions of NFS. So, to compare: 703 o In the non-uniform model, servers are told about different clients 704 because, if the server were to use accurate information as to 705 client identity, two IP addresses on the same server would behave 706 as if they were talking to the same client, which might prove 707 disconcerting to a client not expecting such behavior. 709 o In the uniform model, the servers are told about there being a 710 single client, which is, after all, the truth. Then, when the 711 server uses this information, two IP addresses on the same server 712 will behave as if they are talking to the same client, and this 713 difference in behavior allows the client to infer the server IP 714 address trunking configuration, even though NFSv4.0 does not 715 explicitly provide this information. 717 The approach given in the section below shows one example of how 718 this might be done. 720 The following are advantages for the implementation of using the 721 uniform client-string model: 723 o Clients can take advantage of server trunking (and clustering with 724 single-server-equivalent semantics) to increase bandwidth or 725 reliability. 727 o There are advantages in state management so that, for example, we 728 never have a delegation under one clientid revoked because of a 729 reference to the same file from the same client under a different 730 clientid. 732 o The uniform client-string model allows the server to do any 733 necessary automatic lease merger in connection with migration, 734 without requiring any client involvement. This consideration is 735 of sufficient weight to cause us RECOMMEND use of the uniform 736 client-string model for clients supporting transparent state 737 migration. 739 The following implementation considerations might cause issues for 740 client implementations. 742 o This model is considerably different from the non-uniform model, 743 which most client implementations have been following. Until 744 substantial implementation experience is obtained with this model, 745 reluctance to embrace something so new is to be expected. 747 o Mapping between server network addresses and leases is more 748 complicated in that it is no longer a one-to-one mapping. 750 How to balance these considerations depends on implementation goals. 752 5.2.3. Trunking Determination in the Uniform Client-string Model 754 This section provides an example of how trunking determination could 755 be done by a client following the uniform client-string model. 756 Clients need not follow this procedure but implementers should make 757 sure that the issues dealt with by this procedure are all properly 758 addressed. 760 For a client using the uniform model, clientid4 values are treated as 761 important information in determining server trunking patterns. For 762 two different IP addresses to return the same clientid4 value is a 763 necessary, though not a sufficient condition for them to be 764 considered as connected to the same server. As a result, when two 765 different IP addresses return the same clientid4, the client needs to 766 determine, using the procedure given below or otherwise, whether the 767 IP addresses are connected to the same server. For such clients, all 768 internal recording of clientid4 values needs to include, whether 769 explicitly or implicitly, identification of the server from which the 770 clientid4 was received so that one always has a (server clientid4) 771 pair. Two such pairs from different servers are always considered 772 distinct even when the clientid4 values are the same, as they may 773 occasionally be. 775 In order to make this approach work, the client must have accessible, 776 for each nfs4_client_id4 used (only one in the uniform model) a list 777 of all server IP addresses, together with the associated clientid4 778 values. As a part of the associated data structures, there should be 779 the ability to mark a server IP structure as having the same server 780 as another and to mark an IP-address as currently unresolved. One 781 way to do this is to a allow each such entry to point to another with 782 the pointer value being one of: 784 o A pointer to another entry for an IP address associated with the 785 same server, where that IP address is the first one referenced to 786 access that server. 788 o A pointer to the current entry if there is no earlier IP address 789 associated with the same server, i.e. where the current IP address 790 is the first one referenced to access that server. We'll refer to 791 such an IP address as the lead IP address for a given server. 793 o The value NULL if the address's server identity is currently 794 unresolved. 796 When a SETCLIENTID is done and a clientid4 returned, the data 797 structure is searched for a matching clientid4 and processing depends 798 on what is found. We will refer to the IP address on which this 799 SETCLIENTID is done as X. The SETCLIENTID will use the common 800 nfs_client_id4 and specify X as part of the callback parameters. We 801 call the clientid4 and verifier returned by this operation XC and XV. 803 Note that at this point no SETCLIENTID_CONFIRM has yet been done. 804 This is because we have either established a new clientid4 on a 805 previously unknown server or changed the callback parameters on a 806 clientid4 associated with some already known server. We don't want 807 to confirm something that we are not sure we want to happen. 809 o If no matching clientid4 is found, the IP address X and clientid4 810 XC are added to the list and considered as having no existing 811 known IP addresses trunked with it. The IP address is marked as a 812 lead IP address for a new server. A SETCLIENTID_CONFIRM is done 813 using XC and XV. 815 o If a matching clientid4 is found which is marked unresolved, 816 processing on the new IP address is suspended. In order to 817 simplify processing, there can only be one unresolved IP address 818 for any given clientid4. 820 o If one or more matching clientid4's is found, none of which is 821 marked unresolved, the new IP address in entered and marked 822 unresolved. After applying the steps below to each of the lead IP 823 addresses with a matching clientid4, the address will have been 824 resolved: either it will be part of the same server as a new IP 825 address to be added to an existing set of IP addresses for a 826 server, or it will be recognized as a new server. At the point at 827 which this determination is made, the unresolved indication is 828 cleared and any suspended SETCLIENTID processing is restarted 830 So for each lead IP address IPn with a clientid4 matching XC, the 831 following steps are done. 833 o A SETCLIENTID is done to update the callback parameters to reflect 834 the possibility that X will be marked as associated with the 835 server whose lead IP address is IPn. So assume that we do that 836 SETCLIENTID on IP address IPn and get back a setclientid_confirm 837 value (in the form of a verifier4) SCn. 839 Note that the v4.0 spec requires the server to make sure that such 840 value are very unlikely to be regenerated. Given that it is 841 already highly unlikely that the clientid XC is duplicated by 842 distinct servers, the probability that Sc is duplicated as well 843 has to be considered vanishingly small. Note also that the 844 callback update procedure can be repeated multiple times to reduce 845 the probability of spurious matches further. 847 o Note that we don't want this to happen if address X is not 848 associated with this server. So we do a SETCLIENTID_CONFIRM on 849 address X using the setclientid_confirm value SCn. 851 o If the setclientid_confirm value generated on X is accepted on 852 IPn, then X and IPn are recognized as connected to the same server 853 and the entry for X is marked as associated with IPn. The entry 854 is now resolved and processing can be restarted for IP addresses 855 whose clientid4 matched XC but whose resolution had been deferred. 857 o If the confirm value generated on IPn is not accepted on X, then X 858 and IPn are distinct and the callback update will not be 859 confirmed. So we go on to the next IPn, until we run out of them. 861 The procedure above has made no explicit mention of the possibility 862 that server reboot can occur at any time. To address this 863 possibility the client should periodically use the clientid4 XC in 864 RENEW operations, directed to both the IP address X and the current 865 lead IP address that is currently being tested for identity. 867 o When XC becomes invalid on X, the resolution process should be 868 terminated, subject to being redone later. Before redoing the 869 resolution, XC should be checked on all the lead IP addresses on 870 which it was valid. Once a new clientid4 is established on any 871 servers on which XC became invalid, a new clientid4 can be 872 established on X and the resolution process for X can be 873 restarted. 875 o When XC does not becomes invalid on X, but becomes invalid on the 876 current IPn being tested, it should be concluded that X and IPn do 877 not match and that it is time to advance to the next IPn, if any. 879 o In the event of a reboot detected on any server lead IP, the set 880 of IP addresses associated with the server should not change and 881 state should be re-established for the lease as a whole, using all 882 available connected server IP addresses. It is prudent to verify 883 connectivity by doing a RENEW using the new clientid4 on each such 884 server address before using it, however. 886 If we have run out of IPn's without finding a matching server, X is 887 considered as having no existing known IP addresses trunked with it. 888 The IP address is marked as a lead IP address for a new server. A 889 SETCLIENTID_CONFIRM is done using XC and XV. 891 5.3. Proposed changes: merged (vs. synchronized) leases 893 The current definitive definition of the NFSv4.0 protocol [RFC3530], 894 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 895 agree. The section entitled "Migration and State" says: 897 As part of the transfer of information between servers, leases 898 would be transferred as well. The leases being transferred to the 899 new server will typically have a different expiration time from 900 those for the same client, previously on the old server. To 901 maintain the property that all leases on a given server for a 902 given client expire at the same time, the server should advance 903 the expiration time to the later of the leases being transferred 904 or the leases already present. This allows the client to maintain 905 lease renewal of both classes without special effort: 907 There are a number of problems with this and any resolution of our 908 difficulties must address them somehow. 910 o The current v4.0 spec recommends that the client make it 911 essentially impossible to determine when two leases are from "the 912 same client". 914 o It is not appropriate to speak of "maintain[ing] the property that 915 all leases on a given server for a given client expire at the same 916 time", since this is not a property that holds even in the absence 917 of migration. A server listening on multiple network addresses 918 may have the same client appear as multiple clients with no way to 919 recognize the client as the same. 921 o Even if the client identity issue could be resolved, advancing the 922 lease time at the point of migration would not maintain the 923 desired synchronization property. The leases would be 924 synchronized until one of them was renewed, after which they would 925 be unsynchronized again. 927 To avoid client complexity, we need to have no more than one lease 928 between a single client and a single server. This requires merger of 929 leases since there is no real help from synchronizing them at a 930 single instant. 932 For the uniform model, the destination server would simply merge 933 leases as part of state transfer, since two leases with the same 934 nfs_client_id4 values must be for the same client. 936 We have made the following decisions as far as proposed normative 937 statements regarding for state merger. They reflect the facts that 938 we want to support fully migration support in the simplest way 939 possible and that we can't say MUST since we have older clients and 940 servers to deal with. 942 o Clients SHOULD use the uniform client-string model in order to get 943 good migration support. 945 o Servers SHOULD provide automatic lease merger during state 946 migration so that clients using the uniform id model get the 947 support automatically. 949 If the clients and the servers obey the SHOULD's, having more than a 950 single lease for a given client-server pair will be a transient 951 situation, cleaned up as part of adapting to use of migrated state. 953 Since clients and servers will be a mixture of old and new and 954 because nothing is a MUST we have to ensure that no combination will 955 show worse behavior than is exhibited by current (i.e. old) clients 956 and servers. 958 5.4. Other proposed changes to migration-state sections 959 5.4.1. Proposed changes: Client ID migration 961 The current definitive definition of the NFSv4.0 protocol [RFC3530], 962 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 963 agree. The section entitled "Migration and State" says: 965 In the case of migration, the servers involved in the migration of 966 a filesystem SHOULD transfer all server state from the original to 967 the new server. This must be done in a way that is transparent to 968 the client. This state transfer will ease the client's transition 969 when a filesystem migration occurs. If the servers are successful 970 in transferring all state, the client will continue to use 971 stateids assigned by the original server. Therefore the new 972 server must recognize these stateids as valid. This holds true 973 for the client ID as well. Since responsibility for an entire 974 filesystem is transferred with a migration event, there is no 975 possibility that conflicts will arise on the new server as a 976 result of the transfer of locks. 978 This poses some difficulties, mostly because the part about "client 979 ID" is not clear: 981 o It isn't clear what part of the paragraph the "this" in the 982 statement "this holds true ..." is meant to signify. 984 o The phrase "the client ID" is ambiguous, possibly indicating the 985 clientid4 and possibly indicating the nfs_client_id4. 987 o If the text means to suggest that the same clientid4 must be used, 988 the logic is not clear since the issue is not the same as for 989 stateids of which there might be many. Adapting to the change of 990 a single clientid, as might happen as a part of lease migration, 991 is relatively easy for the client. 993 We have decided to address this issue as follows, with the relevant 994 changes all reflected in Section 5.6. 996 o Make it clear that both clientid4 and nfs_client_id4 are to be 997 transferred. 999 o Indicate that the initial transfer will result in the same 1000 clientid4 after transfer but this is not guaranteed since there 1001 may conflict with an existing clientid4 on the destination server 1002 and because lease merger can result in a change of the clientid4. 1004 5.4.2. Proposed changes: Callback re-establishment 1006 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1007 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1008 agree. The section entitled "Migration and State" says: 1010 A client SHOULD re-establish new callback information with the new 1011 server as soon as possible, according to sequences described in 1012 sections "Operation 35: SETCLIENTID - Negotiate Client ID" and 1013 "Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID". This 1014 ensures that server operations are not blocked by the inability to 1015 recall delegations. 1017 The above will need to be fixed to reflect the possibility of merging 1018 of leases and the text to do this appears as part of Section 5.6. 1020 5.4.3. Proposed changes: NFS4ERR_LEASE_MOVED rework 1022 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1023 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1024 agree. The section entitled "Notification of Migrated Lease" says: 1026 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that 1027 supports filesystem migration MUST probe all filesystems from that 1028 server on which it holds open state. Once the client has 1029 successfully probed all those filesystems which are migrated, the 1030 server MUST resume normal handling of stateful requests from that 1031 client. 1033 There is a lack of clarity that is prompted by ambiguity about what 1034 exactly probing is and what the interlock between client and server 1035 must be. This has led to some worry about the scalability of the 1036 probing process, and although the time required does scale linearly 1037 with the number of fs's that the client may have state for with 1038 respect to a given server, the actual process can be done 1039 efficiently. 1041 To address these issues we propose replacing the above with the text 1042 addressing NFS4RR_LEASE_MOVED as given in Section 5.6.3. 1044 5.5. Proposed changes to other sections 1046 5.5.1. Proposed changes: callback update 1048 Some changes are necessary to reduce confusion about the process of 1049 callback information update and in particular to make it clear that 1050 no state is freed as a result: 1052 o Make it clear that after migration there are confirmed entries for 1053 transferred clientid4/nfs_client_id4 pairs. 1055 o Be explicit in the sections headed "otherwise," in the 1056 descriptions of SETCLIENTID and SETCLIENTID_CONFIRM, that these 1057 don't apply in the cases we are concerned about. 1059 5.5.2. Proposed changes: clientid4 handling 1061 To address both of the clientid4-related issues mentioned in 1062 Section 4.4, we propose replacing the last three paragraphs of the 1063 section entitled "Client ID" with the following: 1065 Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has 1066 successfully completed, the client uses the shorthand client 1067 identifier, of type clientid4, instead of the longer and less 1068 compact nfs_client_id4 structure. This shorthand client 1069 identifier (a client ID) is assigned by the server and should be 1070 chosen so that it will not conflict with a client ID previously 1071 assigned by same server. This applies across server restarts or 1072 reboots. 1074 Distinct servers MAY assign clientid4's independently, and will 1075 generally do so. Therefore, a client has to be prepared to deal 1076 with multiple instances of the same clientid4 value received on 1077 distinct IP addresses, denoting separate entities. When trunking 1078 of server IP addresses is not a consideration, a client should 1079 keep track of (IP-address, clientid4) pairs, so that each pair is 1080 distinct. For a discussion of how to address the issue in the 1081 face of possible trunking of server IP addresses, see Section 5.2. 1083 When a clientid4 is presented to a server and that clientid4 is 1084 not recognized, the server will reject the request with the error 1085 NFS4ERR_STALE_CLIENTID. This can occur for a number of reasons: 1087 * A server reboot causing loss of the server's knowledge of 1088 client 1090 * Client error sending an incorrect clientid4 or valid clientid4 1091 to the wrong server. 1093 * Loss of lease state due to lease expiration. 1095 * Client or server error causing the server to believe that the 1096 client has rebooted (i.e. receiving a SETCLIENTID with an 1097 nfs_client_id4 which has a matching id and a non-matching 1098 verifier. 1100 * Migration of all state under the associated lease causes its 1101 non-existence to be recognized on the source server. 1103 * Merger of state under the associated lease with another lease 1104 under a different clientid causes the clientid4 serving as the 1105 source of the merge to cease being recognized on its server. 1107 In the event of a server reboot, or loss of lease state due to 1108 lease expiration, the client must obtain a new clientid4 by use of 1109 the SETCLIENTID operation and then proceed to any other necessary 1110 recovery for the server reboot case (See the section entitled 1111 "Server Failure and Recovery"). In cases of server or client 1112 error resulting in this error, use of SETCLIENTID to establish a 1113 new lease is desirable as well. 1115 In the last two cases, different recovery procedures are required. 1116 See Section 5.6 for details. Note that in cases in which there is 1117 any uncertainty about which sort of handling is applicable, the 1118 distinguishing characteristic is that in reboot-like cases, the 1119 clientid4 and all associated stateid cease to exist while in 1120 migration-related cases, the clientid4 ceases to exist while the 1121 stateids are still valid. 1123 The client must also employ the SETCLIENTID operation when it 1124 receives a NFS4ERR_STALE_STATEID error using a stateid derived 1125 from its current clientid4, since this indicates a situation, such 1126 as server reboot which has invalidated the existing clientid4 and 1127 associated stateids (see the section entitled "lock-owner" for 1128 details). 1130 See the detailed descriptions of SETCLIENTID and 1131 SETCLIENTID_CONFIRM for a complete specification of the 1132 operations. 1134 5.6. Migration, Replication and State (AS PROPOSED) 1136 When responsibility for handling a given filesystem is transferred to 1137 a new server (migration) or the client chooses to use an alternate 1138 server (e.g., in response to server unresponsiveness) in the context 1139 of filesystem replication, the appropriate handling of state shared 1140 between the client and server (i.e., locks, leases, stateids, and 1141 client IDs) is as described below. The handling differs between 1142 migration and replication. 1144 If a server replica or a server immigrating a filesystem agrees to, 1145 or is expected to, accept opaque values from the client that 1146 originated from another server, then it is a wise implementation 1147 practice for the servers to encode the "opaque" values in network 1148 byte order. When doing so, servers acting as replicas or immigrating 1149 filesystems will be able to parse values like stateids, directory 1150 cookies, filehandles, etc. even if their native byte order is 1151 different from that of other servers cooperating in the replication 1152 and migration of the filesystem. 1154 5.6.1. Migration and State 1156 In the case of migration, the servers involved in the migration of a 1157 filesystem SHOULD transfer all server state from the original to the 1158 new server. This must be done in a way that is transparent to the 1159 client. This state transfer will ease the client's transition when a 1160 filesystem migration occurs. If the servers are successful in 1161 transferring all state, the client will continue to use stateids 1162 assigned by the original server. Therefore the new server must 1163 recognize these stateids as valid. 1165 If transferring stateids from server to server would result in a 1166 conflict for an existing stateid for the destination server with the 1167 existing client, transparent state migration MUST NOT happen for that 1168 client. Servers participating in using transparent state migration 1169 should co-ordinate their stateid assignment policies to make this 1170 situation unlikely or impossible. The means by which this might be 1171 done, like all of the inter-server interactions for migration, are 1172 not specified by the NFS version 4.0 protocol. 1174 Handling of clientid values is similar but not identical. The 1175 clientid4 and nfs_client_id4 information (id and verifier) will be 1176 transferred with the rest of the state information and the 1177 destination server should use that information to determine 1178 appropriate clientid4 handling. Although the destination server may 1179 make state stored under an existing lease available under the 1180 clientid4 used on the source server, the client should not assume 1181 that this is always so. In particular, 1183 o If there is an existing lease with an nfs_client_id4 that matches 1184 a migrated lease (same id and verifier), the server SHOULD merge 1185 the two, making the union of the sets of stateids available under 1186 the clientid4 for the existing lease. As part of the lease 1187 merger, the expiration time of the lease will reflect renewal done 1188 within either of the ancestor leases (and so will reflect the 1189 latest of the renewals). 1191 o If there is an existing lease with an nfs_client_id4 that 1192 partially matches a migrated lease (same id and a different 1193 verifier), the server MUST eliminate one of the two, possibly 1194 invalidating one of the ancestor clientid4's. Since verifiers are 1195 not ordered, the later lease renewal time will prevail. 1197 When leases are not merged, the transfer of state should result in 1198 creation of a confirmed client record with empty callback information 1199 but matching the {v, x, c} for the transferred client information. 1200 This should enable establishment of new callback information using 1201 SETCLIENTID and SETCLIENTID_CONFIRM. 1203 A client may determine the disposition of migrated state by using a 1204 stateid associated with the migrated state and in an operation on the 1205 new server and using the associated clientid4 in a RENEW on the new 1206 server. 1208 o If the stateid is not valid and an error NFS4ERR_BAD_STATEID is 1209 received, either transparent state migration has not occurred or 1210 the state was purged due to verifier mismatch. 1212 o If the stateid is valid and an error NFS4ERR_STALE_CLIENTID is 1213 received on the RENEW, transparent state migration has occurred 1214 and the lease has been merged with an existing lease on the 1215 destination server. 1217 o If the stateid is valid and the clientid4 is valid, the lease has 1218 been transferred intact. 1220 Since responsibility for an entire filesystem is transferred with a 1221 migration event, there is no possibility that conflicts will arise on 1222 the new server as a result of the transfer of locks. 1224 The servers may choose not to transfer the state information upon 1225 migration. However, this choice is discouraged, except where 1226 specific issues such as stateid conflicts make it necessary. In the 1227 case of migration without state transfer, when the client presents 1228 state information from the original server (e.g. in a RENEW op or a 1229 READ op of zero length), the client must be prepared to receive 1230 either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new 1231 server. The client should then recover its state information as it 1232 normally would in response to a server failure. The new server must 1233 take care to allow for the recovery of state information as it would 1234 in the event of server restart. 1236 When a lease is transferred to a new server (as opposed to being 1237 merged with a lease already on the new server), a client SHOULD re- 1238 establish new callback information with the new server as soon as 1239 possible, according to sequences described in sections "Operation 35: 1240 SETCLIENTID - Negotiate Client ID" and "Operation 36: 1241 SETCLIENTID_CONFIRM - Confirm Client ID". This ensures that server 1242 operations are not blocked by the inability to recall delegations. 1244 5.6.2. Replication and State 1246 Since client switch-over in the case of replication is not under 1247 server control, the handling of state is different. In this case, 1248 leases, stateids and client IDs do not have validity across a 1249 transition from one server to another. The client must re-establish 1250 its locks on the new server. This can be compared to the re- 1251 establishment of locks by means of reclaim-type requests after a 1252 server reboot. The difference is that the server has no provision to 1253 distinguish requests reclaiming locks from those obtaining new locks 1254 or to defer the latter. Thus, a client re-establishing a lock on the 1255 new server (by means of a LOCK or OPEN request), may have the 1256 requests denied due to a conflicting lock. Since replication is 1257 intended for read-only use of filesystems, such denial of locks 1258 should not pose large difficulties in practice. When an attempt to 1259 re-establish a lock on a new server is denied, the client should 1260 treat the situation as if its original lock had been revoked. 1262 5.6.3. Notification of Migrated Lease 1264 In the case of lease renewal, the client may not be submitting 1265 requests for a filesystem that has been migrated to another server. 1266 This can occur because of the implicit lease renewal mechanism. The 1267 client renews a lease containing state of multiple filesystems when 1268 submitting a request to any one filesystem at the server. 1270 In order for the client to schedule renewal of leases that may have 1271 been relocated to the new server, the client must find out about 1272 lease relocation before those leases expire. To accomplish this, all 1273 operations which implicitly renew leases for a client (such as OPEN, 1274 CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error 1275 NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be 1276 renewed has been transferred to a new server. Note that when the 1277 transfer of responsibility leaves remaining state for that lease on 1278 the source server, the lease is renewed just as it would have been in 1279 the NFS4ERR_OK case, despite returning the error. The transfer of 1280 responsibility happens when the server receives a 1281 GETATTR(fs_locations) from the client for each filesystem for which a 1282 lease has been moved to a new server. Normally it does this after 1283 receiving an NFS4ERR_MOVED for an access to the filesystem but the 1284 server is not required to verify that this happens in order to 1285 terminate the return of NFS4ERR_LEASE_MOVED. By convention, the 1286 compounds containing GETATTR(fs_locations) SHOULD include an appended 1287 RENEW operation to permit the server to identify the client getting 1288 the information. 1290 Note that the NFS4ERR_LEASE_MOVED error is only required when 1291 responsibility for at least one stateid has been transferred. In the 1292 case of a null lease, where the only associated state is a clientid, 1293 no NFS4ERR_LEASE_MOVED error need be generated. 1295 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports 1296 filesystem migration MUST perform the necessary GETATTR operation for 1297 each of the filesystems containing state that have been migrated and 1298 so give the server evidence that it is aware of the migration of the 1299 filesystem. Once the client has done this for all migrated 1300 filesystems on which the client holds state, the server MUST resume 1301 normal handling of stateful requests from that client. 1303 One way in which clients can do this efficiently in the presence of 1304 large numbers of filesystems is described below. This approach 1305 divides the process into two phases, one devoted to finding the 1306 migrated filesystems and the second devoted to doing the necessary 1307 GETATTRs. 1309 The client can find the migrated filesystems by building and issuing 1310 one or more COMPOUND requests, each consisting of a set of PUTFH/ 1311 GETFH pairs, each pair using an fh in one of the filesystems in 1312 question. All such COMPOUND requests can be done in parallel. The 1313 successful completion of such a request indicates that none of the 1314 fs's interrogated have been migrated while termination with 1315 NFS4ERR_MOVED indicates that the filesystem getting the error has 1316 migrated while those interrogated before it in the same COMPOUND have 1317 not. Those whose interrogation follows the error remain in an 1318 uncertain state and can be interrogated by restarting the requests 1319 from after the point at which NFS4ERR_MOVED was returned or by 1320 issuing a new set of COMPOUND requests for the filesystems which 1321 remain in an uncertain state. 1323 Once the migrated filesystems have been found, all that is needed is 1324 for client to give evidence to the server that it is aware of the 1325 migrated status of filesystems found by this process, by 1326 interrogating the fs_locations attribute for an fh each of the 1327 migrated filesystems. The client can do this building and issuing 1328 one or more COMPOUND requests, each of which consists of a set of 1329 PUTFH operations, each followed by a GETATTR of the fs_locations 1330 attribute. A RENEW follows to help tie the operations to the lease 1331 returning NFS4ERR_LEASE_MOVED. Once the client has done this for all 1332 migrated filesystems on which the client holds state, the server will 1333 resume normal handling of stateful requests from that client. 1335 In order to support legacy clients that do not handle the 1336 NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after 1337 a wait of at least two lease periods, at which time it will resume 1338 normal handling of stateful requests from all clients. If a client 1339 attempts to access the migrated files, the server MUST reply 1340 NFS4ERR_MOVED. 1342 When the client receives an NFS4ERR_MOVED error, the client can 1343 follow the normal process to obtain the new server information 1344 (through the fs_locations attribute) and perform renewal of those 1345 leases on the new server. If the server has not had state 1346 transferred to it transparently, the client will receive either 1347 NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, 1348 as described above. The client can then recover state information as 1349 it does in the event of server failure. 1351 Aside from recovering from a migration, there are other reasons a 1352 client may wish to retrieve fs_locations information from a server. 1353 When a server becomes unresponsive, for example, a client may use 1354 cached fs_locations data to discover an alternate server hosting the 1355 same fs data. A client may periodically request fs_locations data 1356 from a server in order to keep its cache of fs_locations data fresh. 1358 Since a GETATTR(fs_locations) operation would be used for refreshing 1359 cached fs_locations data, a server could mistake such a request as 1360 indicating recognition of an NFS4ERR_LEASE_MOVED condition. 1361 Therefore a compound which is not intended to signal that a client 1362 has recognized a migrated lease SHOULD be prefixed with a guard 1363 operation which fails with NFS4ERR_MOVED if the file handle being 1364 queried is no longer present on the server. The guard can be as 1365 simple as a GETFH operation. 1367 Though unlikely, it is possible that the target of such a compound 1368 could be migrated in the time after the guard operation is executed 1369 on the server but before the GETATTR(fs_locations) operation is 1370 encountered. When a client issues a GETATTR(fs_locations) operation 1371 as part of a compound not intended to signal recognition of a 1372 migrated lease, it SHOULD be prepared to process fs_locations data in 1373 the reply that shows the current location of the fs is gone. 1375 5.6.4. Migration and the Lease_time Attribute 1377 In order that the client may appropriately manage its leases in the 1378 case of migration, the destination server must establish proper 1379 values for the lease_time attribute. 1381 When state is transferred transparently, that state should include 1382 the correct value of the lease_time attribute. The lease_time 1383 attribute on the destination server must never be less than that on 1384 the source since this would result in premature expiration of leases 1385 granted by the source server. Upon migration in which state is 1386 transferred transparently, the client is under no obligation to re- 1387 fetch the lease_time attribute and may continue to use the value 1388 previously fetched (on the source server). 1390 In the case in which lease merger occurs as part of state transfer, 1391 the lease_time attribute of the destination lease remains in effect. 1392 The client can simply renew that lease with its existing lease_time 1393 attribute. State in the source lease is renewed at the time of 1394 transfer so that it cannot expire, as long as the destination lease 1395 is appropriately renewed. 1397 If state has not been transferred transparently (i.e., the client 1398 sees a real or simulated server reboot), the client should fetch the 1399 value of lease_time on the new (i.e., destination) server, and use it 1400 for subsequent locking requests. However the server must respect a 1401 grace period at least as long as the lease_time on the source server, 1402 in order to ensure that clients have ample time to reclaim their 1403 locks before potentially conflicting non-reclaimed locks are granted. 1404 The means by which the new server obtains the value of lease_time on 1405 the old server is left to the server implementations. It is not 1406 specified by the NFS version 4.0 protocol. 1408 6. Results of proposed changes for NFSv4.0 1410 The purpose of this section is to examine the troubling results 1411 reported in Section 3.1. We will look at the scenarios as they would 1412 be handled within the proposal. 1414 Because the choice of uniform vs. non-uniform nfs_client_id4 id 1415 strings is a "SHOULD" in these cases, we will designate clients that 1416 follow this recommendation by SHOULD-UF-CID. 1418 We will also have to take account of the various merger-related 1419 "SHOULD" clauses to better understand how they have addressed the 1420 issues seen, we abbreviate these (collectively known as "SHOULD- 1421 merges") as follows: 1423 o SHOULD-SVR-AM refers to the server obeying the SHOULD which 1424 RECOMMENDS that they merge leases with identical nfs_client_id4 id 1425 strings and verifiers. 1427 6.1. Results: Failure to free migrated state on client reboot 1429 Let's look at the troublesome situation cited in Section 3.1.1. We 1430 have already seen what happens when SHOULD-UF-CID does not hold. Now 1431 let's look at the situation in which SHOULD-UF-CID holds, whether 1432 SHOULD-SVR-AM is in effect or not. 1434 o A client C establishes a clientid4 C1 with server ABC specifying 1435 an nfs_client_id4 with "id" value "C" and verifier 0x111. 1437 o The client begins to access files in filesystem F on server ABC, 1438 resulting in generating stateids S1, S2, etc. under the lease for 1439 clientid C1. It may also access files on other filesystems on the 1440 same server. 1442 o The filesystem is migrated from ABC to server XYZ. When 1443 transparent state migration is in effect, stateids S1 and S2 and 1444 lease {0x111, "C", C1} are now available for use by client C at 1445 server XYZ. So far, so good. 1447 o Client C reboots and attempts to access data on server XYZ, 1448 whether in filesystem F or another. It does a SETCLIENID with an 1449 nfs_client_id4 with "id" value "C" and verifier 0x112. The state 1450 associated with lease {0x111, "C", C1} is deleted as part of 1451 creating {0x112, "C", C2}. No problem. 1453 The correctness signature for this issue is 1455 SHOULD-UF-CID 1457 so if you have clients and servers that obey the SHOULD clauses, the 1458 problem is gone regardless of the choice on the MAY. 1460 6.2. Results: Server reboots resulting in confused lease situation 1462 Now let's consider the scenario given in Section 3.1.2. We have 1463 already seen what happens when SHOULD-UF-CID does not hold . Now 1464 let's look at the situation in which SHOULD-UF-CID holds and SHOULD- 1465 SVR-AM holds as well. 1467 o Client C talks to server ABC using an nfs_client_id4 id like 1468 "C-ABC" and verifier v1. As a result a lease with clientid4 c.i 1469 established: {v1, "C-ABC", c.i}. 1471 o fs_a1 migrates from server ABC to server XYZ along with its state. 1472 Now server XYZ also has a lease: {v1, "C-ABC", c.i} 1474 o Server ABC reboots. 1476 o Client C talks to server ABC using an nfs_client_id4 id like 1477 "C-ABC" and verifier v1. As a result a lease with clientid4 c.j 1478 established: {v1, "C-ABC", c.j}. 1480 o fs_a2 migrates from server ABC to server XYZ. As part of 1481 migration the incoming lease is seen to denote same Nfs_client_id4 1482 and so is merged with {v1, "C-ABC, c.i}. 1484 o Now server XYZ has only one lease that matches {v1, "C_ABC", *}, 1485 so the problem is solved 1487 Now let's consider the same scenario in the situation in which 1488 SHOULD-UF-CID holds and SHOULD-SVR-AM holds as well. 1490 o Client C talks to server ABC using an nfs_client_id4 id like "C" 1491 and verifier v1. As a result a lease with clientid4 c.i is 1492 established: {v1, "C", c.i}. 1494 o fs_a1 migrates from server ABC to server XYZ along with its state. 1495 Now XYZ also has a lease: {v1, "C", c.i} 1497 o Server ABC reboots. 1499 o Client C talks to server ABC using an nfs_client_id4 id like "C" 1500 and verifier v1. As a result a lease with clientid4 c.j is 1501 established: {v1, "C", c.j}. 1503 o fs_a2 migrates from server ABC to server XYZ. As part of 1504 migration the incoming lease is seen to denote the same 1505 nfs_client_id4 and so is merged with {v1, "C", c.i}. 1507 o Now server XYZ has only one lease that matches {v1, "C", *}, so 1508 the problem is solved 1510 The correctness signature for this issue is 1512 SHOULD-SVR-AM 1514 so if you have clients and servers that obey the SHOULD clauses, the 1515 problem is gone regardless of the choice on the MAY. 1517 6.3. Results: Client complexity issues 1519 Consider the following situation: 1521 o There are a set of clients C1 through Cn accessing servers S1 1522 through Sm. Each server manages some significant number of 1523 filesystems with the filesystem count L being significantly 1524 greater than m. 1526 o Each client Cx will access a subset of the servers and so will 1527 have up to m clientid's, which we will call Cxy for server Sy. 1529 o Now assume that for load-balancing or other operational reasons, 1530 numbers of filesystems are migrated among the servers. As a 1531 result, depending on how this handled, the number of clientids may 1532 explode. See below. 1534 Now look what will happen under various scenarios: 1536 o We have previously (in Section 3.1.3) looked at this in case of 1537 client following the non-uniform client-string model. In that 1538 case, each client-server pair could have up to m clientid's and 1539 each client will have up to m**2 clientids. If we add the 1540 possibility of server reboot, the only bound on a client's 1541 clientid count is L. 1543 o If we look at this in the SHOULD-UF-CID case in which the SHOULD- 1544 SVR_AM condition holds, the situation is no different. Although 1545 the server has the client identity information that could enable 1546 same-client-same-server leases to be combined, it does not do so. 1547 We still have up to L clientid's per client. 1549 o On the other hand, if we look at the SHOULD-UF-CID case in which 1550 SHOULD-SVR-AM holds, the problem is gone. There can be no more 1551 than m clientids per client, and n clientid's per server. 1553 The correctness signature for this issue is 1555 (SHOULD-UF-CID & SHOULD-SVR-AM) 1557 so if you have clients and servers that obey the SHOULD clauses, the 1558 problem is gone regardless of the choice on the MAY. 1560 6.4. Result summary 1562 We have seen that (SHOULD-SVR-AM & SHOULD-UF-CID) are sufficient to 1563 solve the problems people have experienced. 1565 7. Issues for NFSv4.1 1567 Because NFSv4.1 includes the uniform client-string model, addressing 1568 migration issues is simpler. In the terms of Section 6, we already 1569 have SHOULD-UF-CID, for NFSv4.1, as advised by section 2.4 of 1570 [RFC5661], simplifying the work to be done. 1572 Nevertheless, there are some issues that will have to be addressed. 1574 For example, the other necessary part of addressing migration issues, 1575 which we call above SHOULD-SVR-AM, is not currently addressed by 1576 NFSv4.1 and it needs to be. 1578 8. Security Considerations 1580 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1581 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1582 agree. The section entitled "Security Considerations" encourages 1583 that clients protect the integrity of the SECINFO operation, any 1584 GETATTR operation for the fs_locations attribute, and the operations 1585 SETCLIENTID/SETCLIENTID_CONFIRM. A migration recovery event can use 1586 any or all of these operations. We do not recommend any change here. 1588 9. IANA Considerations 1590 This document does not require actions by IANA. 1592 10. Acknowledgements 1594 The editor and authors of this document gratefully acknowledge the 1595 contributions of Trond Myklebust of NetApp and Robert Thurlow of 1596 Oracle. We also thank Tom Haynes of NetApp and Spencer Shepler of 1597 Microsoft for their guidance and suggestions. 1599 Special thanks go to members of the Oracle Solaris NFS team, 1600 especially Rick Mesta and James Wahlig, for their work implementing 1601 an NFSv4.0 migration prototype and identifying many of the issues 1602 documented here. 1604 11. References 1606 11.1. Normative References 1608 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1609 Requirement Levels", BCP 14, RFC 2119, March 1997. 1611 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 1612 Beame, C., Eisler, M., and D. Noveck, "Network File System 1613 (NFS) version 4 Protocol", RFC 3530, April 2003. 1615 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 1616 System (NFS) Version 4 Minor Version 1 Protocol", 1617 RFC 5661, January 2010. 1619 11.2. Informative References 1621 [cur-v4.0-bis] 1622 Haynes, T., Ed. and D. Noveck, Ed., "Network File System 1623 (NFS) Version 4 Protocol", 2011, . 1626 Work in progress. 1628 Authors' Addresses 1630 David Noveck (editor) 1631 EMC Corporation 1632 228 South Street 1633 Hopkinton, MA 01748 1634 US 1636 Phone: +1 508 249 5748 1637 Email: david.noveck@emc.com 1639 Piyush Shivam 1640 Oracle Corporation 1641 5300 Riata Park Ct. 1642 Austin, TX 78727 1643 US 1645 Phone: +1 512 401 1019 1646 Email: piyush.shivam@oracle.com 1648 Charles Lever 1649 Oracle Corporation 1650 1015 Granger Avenue 1651 Ann Arbor, MI 48104 1652 US 1654 Phone: +1 248 614 5091 1655 Email: chuck.lever@oracle.com 1656 Bill Baker 1657 Oracle Corporation 1658 5300 Riata Park Ct. 1659 Austin, TX 78727 1660 US 1662 Phone: +1 512 401 1081 1663 Email: bill.baker@oracle.com