idnits 2.17.1 draft-ietf-nfsv4-migration-issues-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 7, 2012) is 4304 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530) ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft EMC 4 Intended status: Informational P. Shivam 5 Expires: January 8, 2013 C. Lever 6 B. Baker 7 ORACLE 8 July 7, 2012 10 NFSv4 migration: Implementation experience and spec issues to resolve 11 draft-ietf-nfsv4-migration-issues-01 13 Abstract 15 The migration feature of NFSv4 provides for moving responsibility for 16 a single filesystem from one server to another, without disruption to 17 clients. Recent implementation experience has shown problems in the 18 existing specification for this feature. This document discusses the 19 issues which have arisen and explores the options available for 20 curing the issues via clarification and correction of the NFSv4.0 and 21 NFSv4.1 specifications. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on January 8, 2013. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. NFSv4.0 Implementation Experience . . . . . . . . . . . . . . 5 60 3.1. Implementation issues . . . . . . . . . . . . . . . . . . 5 61 3.1.1. Failure to free migrated state on client reboot . . . 5 62 3.1.2. Server reboots resulting in a confused lease 63 situation . . . . . . . . . . . . . . . . . . . . . . 6 64 3.1.3. Client complexity issues . . . . . . . . . . . . . . . 7 65 3.2. Sources of Protocol difficulties . . . . . . . . . . . . . 9 66 3.2.1. Issues with nfs_client_id4 generation and use . . . . 9 67 3.2.2. Issues with lease proliferation . . . . . . . . . . . 11 68 4. Issues to be resolved in NFSv4.0 . . . . . . . . . . . . . . . 11 69 4.1. Possible changes to nfs_client_id4 client-string . . . . . 11 70 4.2. Possible changes to handle differing nfs_client_id4 71 string values . . . . . . . . . . . . . . . . . . . . . . 12 72 4.3. Other issues within migration-state sections . . . . . . . 13 73 4.4. Issues within other sections . . . . . . . . . . . . . . . 13 74 5. Proposed resolution of NFSv4.0 protocol difficulties . . . . . 14 75 5.1. Proposed changes: nfs_client_id4 client-string . . . . . . 14 76 5.2. Client-string Approaches (AS PROPOSED) . . . . . . . . . . 14 77 5.2.1. Non-Uniform Client-string Approach . . . . . . . . . . 16 78 5.2.2. Uniform Client-string Approach . . . . . . . . . . . . 16 79 5.2.3. Mixing Client-string Approaches . . . . . . . . . . . 18 80 5.2.4. Trunking Determination using Uniform Client-strings . 19 81 5.3. Proposed changes: merged (vs. synchronized) leases . . . . 24 82 5.4. Other proposed changes to migration-state sections . . . . 25 83 5.4.1. Proposed changes: Client ID migration . . . . . . . . 25 84 5.4.2. Proposed changes: Callback re-establishment . . . . . 26 85 5.4.3. Proposed changes: NFS4ERR_LEASE_MOVED rework . . . . . 26 86 5.5. Proposed changes to other sections . . . . . . . . . . . . 27 87 5.5.1. Proposed changes: callback update . . . . . . . . . . 27 88 5.5.2. Proposed changes: clientid4 handling . . . . . . . . . 27 89 5.5.3. Proposed changes: NFS4ERR_CLID_INUSE . . . . . . . . . 29 90 5.6. Migration, Replication and State (AS PROPOSED) . . . . . . 29 91 5.6.1. Migration and State . . . . . . . . . . . . . . . . . 30 92 5.6.2. Replication and State . . . . . . . . . . . . . . . . 32 93 5.6.3. Notification of Migrated Lease . . . . . . . . . . . . 32 94 5.6.4. Migration and the Lease_time Attribute . . . . . . . . 35 95 6. Results of proposed changes for NFSv4.0 . . . . . . . . . . . 35 96 6.1. Results: Failure to free migrated state on client 97 reboot . . . . . . . . . . . . . . . . . . . . . . . . . . 36 98 6.2. Results: Server reboots resulting in confused lease 99 situation . . . . . . . . . . . . . . . . . . . . . . . . 36 100 6.3. Results: Client complexity issues . . . . . . . . . . . . 38 101 6.4. Result summary . . . . . . . . . . . . . . . . . . . . . . 39 102 7. Issues for NFSv4.1 . . . . . . . . . . . . . . . . . . . . . . 39 103 7.1. Addressing state merger in NFSv4.1 . . . . . . . . . . . . 39 104 7.2. Addressing pNFS relationship with migration . . . . . . . 40 105 7.3. Addressing server owner changes in NFSv4.1 . . . . . . . . 40 106 8. Lock State and File System Transitions (AS PROPOSED) . . . . . 41 107 8.1. File System Transitions with Matching Server Scopes . . . 42 108 8.2. File System Transitions with Non-Matching Server Scopes . 43 109 8.3. FS Transitions Involving Reobtaining Locking State . . . . 44 110 9. Security Considerations . . . . . . . . . . . . . . . . . . . 45 111 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 112 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 45 113 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 46 114 12.1. Normative References . . . . . . . . . . . . . . . . . . . 46 115 12.2. Informative References . . . . . . . . . . . . . . . . . . 46 116 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 46 118 1. Introduction 120 This document is in the informational category, and while the facts 121 it reports may have normative implications, any such normative 122 significance reflects the readers' preferences. For example, we may 123 report that the reboot of a client with migrated state results in 124 state not being promptly cleared and that this will prevent granting 125 of conflicting lock requests at least for the lease time, which is a 126 fact. While it is to be expected that client and server implementers 127 will judge this to be a situation that is best avoided, the judgment 128 as to how pressing this issue should be considered is a judgment for 129 the reader, and eventually the nfsv4 working group to make. 131 We do explore possible ways in which such issues can be avoided, with 132 minimal negative effects, in the expectation that the working group 133 will choose to address these issues, but the choice of exactly how to 134 address these is best given effect in one or more standards-track 135 documents and/or errata. 137 This document focuses on NFSv4.0, since that is where the majority of 138 implementation experience has been. Nevertheless, there is some 139 discussion of the implications of the NFSv4.0 experience for 140 migration in NFSv4.1. 142 2. Conventions 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in [RFC2119]. 148 In the context of this informational document, these normative 149 keywords will always occur in the context of a quotation, most often 150 direct but sometimes indirect. The context will make it clear 151 whether the quotation is from: 153 o The current definitive definition of the NFSv4.0 protocol, whether 154 that is the original NFSv4.0 specification [RFC3530], the current 155 pending draft of RFC3530bis expected to become the definitive 156 definition of NFSv4.0 once certain procedural steps are taken 157 [cur-v4.0-bis], or an eventual RFC3530bis RFC, taking over the 158 role of definitive definition of NFSv4.0 from RFC3530. 160 As the identity of that document may change during the lifetime of 161 this document, we will often refer to the current or pending 162 definition of NFSv4.0 and quote from portions of the documents 163 that are identical among all existing drafts. Given that RFC3530 164 and all RFC3530bis drafts agree as to the issues under discussion, 165 this should not cause undue difficulty. Note that to simplify 166 document maintenance, section names rather than section numbers 167 are used when referring to sections in existing documents so that 168 only minimal changes will be necessary as the identity of the 169 document defining NFSv4.0 changes. 171 o The current definitive definition of the NFSv4.1 protocol 172 [RFC5661]. 174 o A proposed or possible text to serve as a replacement for the 175 current definitive document text. Sometimes, a number of possible 176 alternative texts may be listed and benefits and detriments of 177 each examined in turn. 179 3. NFSv4.0 Implementation Experience 181 3.1. Implementation issues 183 Note that the examples below reflect current experience which arises 184 from clients implementing the recommendation to use different 185 nfs_client_id4 id strings for different server addresses, i.e. using 186 what is later referred to herein as the "non-uniform client-string 187 approach" 189 This is simply because that is the experience implementers have had. 190 The reader should not assume that in all cases, this practice is the 191 source of the difficulty. It may be so in some cases but clearly it 192 is not in all cases. 194 3.1.1. Failure to free migrated state on client reboot 196 The following sort of situation has proved troublesome: 198 o A client C establishes a clientid4 C1 with server ABC specifying 199 an nfs_client_id4 with id string value "C-ABC" and boot verifier 200 0x111. 202 o The client begins to access files in filesystem F on server ABC, 203 resulting in generating stateids S1, S2, etc. under the lease for 204 clientid C1. It may also access files on other filesystems on the 205 same server. 207 o The filesystem is migrated from ABC to server XYZ. When 208 transparent state migration is in effect, stateids S1 and S2 and 209 clientid4 C1 are now available for use by client C at server XYZ. 210 So far, so good. 212 o Client C reboots and attempts to access data on server XYZ, 213 whether in filesystem F or another. It does a SETCLIENTID with an 214 nfs_client_id4 with id string value "C-XYZ" and boot verifier 215 0x112. There is thus no occasion to free stateids S1 and S2 since 216 they are associated with a different client name and so lease 217 expiration is the only way that they can be gotten rid of. 219 Note here that while it seems clear to us in this example that C-XYZ 220 and C-ABC are from the same client, the server has no way to 221 determine the structure of the "opaque" id string. In the protocol, 222 it really is treated as opaque. Only the client knows which 223 nfs_client_id4 values designate the same client on a different 224 server. 226 3.1.2. Server reboots resulting in a confused lease situation 228 Further problems arise from scenarios like the following. 230 o Client C talks to server ABC using an nfs_client_id4 id string 231 such as "C-ABC" and a boot verifier v1. As a result, a lease with 232 clientid4 c.i is established: {v1, "C-ABC", c.i}. 234 o fs_a1 migrates from server ABC to server XYZ along with its state. 235 Now server XYZ also has a lease: {v1, "C-ABC", c.i}. 237 o Server ABC reboots. 239 o Client C talks to server ABC using an nfs_client_id4 id string 240 such as "C-ABC" and a boot verifier v1. As a result, a lease with 241 clientid4 c.j is established: {v1, "C-ABC", c.j}. 243 o fs_a2 migrates from server ABC to server XYZ. Now server XYZ also 244 has a lease: {v1, "C-ABC", c.j}. 246 o Now server XYZ has two leases that match {v1, "C-ABC", *}, when 247 the protocol clearly assumes there can be only one. 249 Note that if the client used "C" (rather than "C-ABC") as the 250 nfs_client_id4 id string, the exact same situation would arise. 252 One of the first cases in which this sort of situation has resulted 253 in difficulties is in connection with doing a SETCLIENTID for 254 callback update. 256 The SETCLIENTID for callback update only includes the nfs_client_id4, 257 assuming there can only be one such with a given nfs_client_id4 258 value. If there were multiple, confirmed client records with 259 identical nfs_client_id4 id string values, there would be no way to 260 map the callback update request to the correct client record. Apart 261 from the migration handling specified in [RFC3530], such a situation 262 cannot arise. 264 One possible accommodation for this particular issue that has been 265 used is to add a RENEW operation along with SETCLIENTID (on a 266 callback update) to disambiguate the client. 268 When the client updates the callback info to the destination, the 269 client would, by convention, send a compound like this: 271 { RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb } 273 The presence of the clientid4 in the compound would allow the server 274 to differentiate among the various leases that it knows of, all with 275 the same nfs_client_id4 value. 277 While this would be a reasonable patch for an isolated protocol 278 weakness, interoperable clients and servers would require that the 279 protocol truly be updated to allow such a situation, specifically 280 that of multiple clientid4's with the same nfs_client_id4 value. The 281 protocol is currently designed and implemented assuming this can't 282 happen. We need to either prevent the situation from happening, or 283 fully adapt to the possibilities which can arise. See Section 4 for 284 a discussion of such issues. 286 3.1.3. Client complexity issues 288 Consider the following situation: 290 o There are a set of clients C1 through Cn accessing servers S1 291 through Sm. Each server manages some significant number of 292 filesystems with the filesystem count L being significantly 293 greater than m. 295 o Each client Cx will access a subset of the servers and so will 296 have up to m clientid's, which we will call Cxy for server Sy. 298 o Now assume that for load-balancing or other operational reasons, 299 numbers of filesystems are migrated among the servers. As a 300 result, each client-server pair will have up to m clientid's and 301 each client will have up to m**2 clientids. If we add the 302 possibility of server reboot, the only bound on a client's 303 clientid count is L. 305 Now, instead of a clientid4 identifying a client-server pair, we have 306 many more entities for the client to deal with. In addition, it 307 isn't clear how new state is to be incorporated in this structure. 309 The limitations of the migrated state (inability to be freed on 310 reboot) would argue against adding more such state but trying to 311 avoid that would run into its own difficulties. For example, a 312 single lockowner string presented under two different clientids would 313 appear as two different entities. 315 Thus we have to choose between: 317 o indefinite prolongation of foreign clientid's even after all 318 transferred state is gone. 320 o having multiple requests for the same lockowner-string-named 321 entity carried on in parallel by separate identically named 322 lockowners under different clientid4's 324 o Adding serialization at the lock-owner string level, in addition 325 to that at the lockowner level. 327 In any case, we have gone (in adding migration as it was described) 328 from a situation in which 330 o Each client has a single clientid4/lease for each server it talks 331 to. 333 o Each client has a single nfs_client_id4 for each server it talks 334 to. 336 o Every state id can be mapped to an associated lease based on the 337 server it was obtained from. 339 To one in which 341 o Each client may have multiple clientid4's for a single server. 343 o For each stateid, the client must separately record the clientid4 344 that it is assigned to, or it must manage separate "state blobs" 345 for each fsid and map those to clientid4's. 347 o Before doing an operation that can result in a stateid, the client 348 must either find a "state blob" based on fsid or create a new one, 349 possibly with a new clientid4. 351 o There may be multiple clientid4's all connected to the same server 352 and using the same nfs_clientid4. 354 This sort of additional client complexity is troublesome and needs to 355 be eliminated. 357 3.2. Sources of Protocol difficulties 359 3.2.1. Issues with nfs_client_id4 generation and use 361 The current definitive definition of the NFSv4.0 protocol [RFC3530], 362 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 363 agree. The section entitled "Client ID" says: 365 The second field, id is a variable length string that uniquely 366 defines the client. 368 There are two possible interpretations of the phrase "uniquely 369 defines" in the above: 371 o The relation between strings and clients is a function from such 372 strings to clients so that each string designates a single client. 374 o The relation between strings and clients is a bijection between 375 such strings and clients so that each string designates a single 376 client and each client is named by a single string. 378 The first interpretation would make these client-strings like phone 379 numbers (a single person can have several) while the second would 380 make them like social security numbers. 382 Endless debate about the true meaning of "uniquely defines" in this 383 context is quite possible but not very helpful. The following points 384 should be noted though: 386 o The second interpretation is more consistent with the way 387 "uniquely defines" is used elsewhere in the spec. 389 o The spec as now written intends the first interpretation (or is 390 internally inconsistent). In fact, it recommends, although it 391 doesn't "RECOMMEND" that a single client have at least as many 392 client-strings as server addresses that it interacts with. It 393 says, in the third bullet point regarding construction of the 394 string (which we shall henceforth refer to as client-string-BP3): 396 The string should be different for each server network address 397 that the client accesses, rather than common to all server 398 network addresses. 400 o If internode interactions are limited to those between a client 401 and its servers, there is no occasion for servers to be concerned 402 with the question of whether two client-strings designate the same 403 client, so that there is no occasion for the difference in 404 interpretation to matter. 406 o When transparent migration of client state occurs between two 407 servers, it becomes important to determine when state on two 408 different servers is for the same client or not, and this 409 distinction becomes very important. 411 Given the need for the server to be aware of client identity with 412 regard to migrated state, either client-string construction rules 413 will have to change or there will be a need to get around current 414 issues, or perhaps a combination of these two will be required. 415 Later sections will examine the options and propose a solution. 417 One consideration that may indicate that this cannot remain exactly 418 as it is today has to do with the fact that the current explanation 419 for this behavior is not correct. The current definitive definition 420 of the NFSv4.0 protocol [RFC3530], and the current pending draft of 421 RFC3530bis [cur-v4.0-bis] both agree. The section entitled "Client 422 ID" says: 424 The reason is that it may not be possible for the client to tell 425 if the same server is listening on multiple network addresses. If 426 the client issues SETCLIENTID with the same id string to each 427 network address of such a server, the server will think it is the 428 same client, and each successive SETCLIENTID will cause the server 429 to begin the process of removing the client's previous leased 430 state. 432 In point of fact, a "SETCLIENTID with the same id string" sent to 433 multiple network addresses will be treated as all from the same 434 client but will not "cause the server to begin the process of 435 removing the client's previous leased state" unless the server 436 believes it is a different instance of the same client, i.e. if the 437 id string is the same and there is a different boot verifier. If the 438 client does not reboot, the verifier should not change. If it does 439 reboot, the verifier will change, and the server should "begin the 440 process of removing the client's previous leased state. 442 The situation of multiple SETCLIENTID requests received by a server 443 on multiple network addresses is exactly the same, from the protocol 444 design point of view, as when multiple (i.e. duplicate) SETCLIENTID 445 requests are received by the server on a single network address. The 446 same protocol mechanisms that prevent erroneous state deletion in the 447 latter case prevent it in the former case. There is no reason for 448 special handling of the multiple-network-appearance case, in this 449 regard. 451 3.2.2. Issues with lease proliferation 453 It is often felt that this is a consequence of the client-string 454 construction issues, and it is certainly the case that the two are 455 closely connected in that non-uniform client-strings make it 456 impossible for the server to appropriately combine leases from the 457 same client. See Section 5.2.1 for a discussion of non-uniform 458 client-strings. 460 However, even where the server could combine leases from the same 461 client, it needs to be clear how and when it will do so, so that the 462 client will be prepared. These issues will have to be addressed at 463 various places in the spec. 465 This could be enough only if we are prepared to do away with the 466 "should" recommending non-uniform client-strings and replace it with 467 a "should not" or even a "SHOULD NOT". Current client implementation 468 patterns make this an unpalatable choice for use as a general 469 solution, but it is reasonable to "RECOMMEND" this choice for a well- 470 defined subset of clients. One alternative would be to create a way 471 for the server to infer from client behavior which leases are held by 472 the same client and use this information to do appropriate lease 473 mergers. Prototyping and detailed specification work has shown that 474 this could be done but the resulting complexity is such that a better 475 choice is to "RECOMMEND" use of the uniform approach for clients 476 supporting the migration feature. 478 Because of the discussion of client-string construction in [RFC3530], 479 most existing clients implement the non-uniform client-string 480 approach. As a result, existing servers may not have been tested 481 with clients implementing uniform client-strings. As a consequence, 482 care must be taken to preserve interoperability between UCS-capable 483 clients and servers that don't tolerate uniform client strings for 484 one reason or another. See Section 5.2.3 for details. 486 4. Issues to be resolved in NFSv4.0 488 4.1. Possible changes to nfs_client_id4 client-string 490 The fact that the reason given in client-string-BP3 is not valid 491 makes the existing "should" insupportable. We can't either 493 o Keep a reason we know is invalid. 495 o Keep saying "should" without giving a reason. 497 What are often presented as reasons that motivate use of the non- 498 uniform approach always turn out to be cases in which, if the uniform 499 approach were used, the server will treat a client which accesses 500 that server via two different IP addresses as part of a single 501 client, as it in fact is. This may be disconcerting to a client 502 unaware that the two IP addresses connect to the same server. This 503 is thus not a reason to use the non-uniform approach but rather an 504 illustration of the fact that those using the uniform approach must 505 use server behavior to determine whether any trunking of IP addresses 506 exists, as is described in Section 5.2.2. 508 It is always possible that a valid new reason will be found, but so 509 far none has been proposed. Given the history, the burden of proof 510 should be on those asserting the validity of a proposed new reason. 512 So we will assume for now that the "should" will have to go. The 513 question is what to replace it with. 515 o We can't say "MUST NOT", despite the problems this raises for 516 migration since this is pretty late in the day for such a change. 517 Many currently operating clients obey the existing "should". 518 Similar considerations would apply for "SHOULD NOT" or "should 519 not". 521 o Dropping client-string-BP3 entirely is a possibility but, given 522 the context and history, it would just be a confusing version of 523 "SHOULD NOT". 525 o Using "MAY" would clearly specify that both ways of doing this are 526 valid choices for clients and that servers will have to deal with 527 clients that make either choice. 529 o This might be modified by a "SHOULD" (or even a "MUST") for 530 particular groups of clients. 532 o There will have to be some text explaining why a client might make 533 either choice but, except for the particular cases referred to 534 above, we will have to make sure that it is truly descriptive, and 535 not slanted in either direction. 537 4.2. Possible changes to handle differing nfs_client_id4 string values 539 Given the difficulties caused by having different nfs_client_id4 540 client-string values for the same client, we have two choices: 542 o Deprecate the existing treatment and basically say the client is 543 on its own doing migration, if it follows it. 545 o Introduce a way of having the client provide client identity 546 information to the server, if it can be done compatibly while 547 staying within the bounds of v4.0. 549 4.3. Other issues within migration-state sections 551 There are a number of issues where the existing text is unclear 552 and/or wrong and needs to be fixed in some way. 554 o Lack of clarity in the discussion of moving clientids (as well as 555 stateids) as part of moving state for migration. 557 o The discussion of synchronized leases is wrong in that there is no 558 way to determine (in the current spec) when leases are for the 559 same client and also wrong in suggesting a benefit from leases 560 synchronized at the point of transfer. What is needed is merger 561 of leases, which is necessary to keep client complexity 562 requirements from getting out of hand. 564 o Lack of clarity in the discussion of LEASE_MOVED handling, 565 including failure to fully address situations in which transparent 566 state migration did not occur. 568 4.4. Issues within other sections 570 There are a number of cases in which certain sections, not 571 specifically related to migration, require additional clarification. 572 This is generally because text that is clear in a context in which 573 leases and clientids are created in one place and live there forever 574 may need further refinement in the more dynamic environment that 575 arises as part of migration. 577 Some examples: 579 o Some people are under the impression that updating callback 580 endpoint information for an existing client, as used during 581 migration, may cause the destination server to free existing 582 state. There need to be additions to clarify the situation. 584 o The handling of the sets of clientid4's maintained by each server 585 needs to be clarified. In particular, the issue of how the client 586 adapts to the presumably independent and uncoordinated clientid4 587 sets needs to be clearly addressed 589 o Statements regarding handling of invalid clientid4's need to be 590 clarified and/or refined in light of the possibilities that arise 591 due to lease motion and merger. 593 o Confusion and lack of clarity about NFS4ERR_CLID_INUSE. 595 5. Proposed resolution of NFSv4.0 protocol difficulties 597 5.1. Proposed changes: nfs_client_id4 client-string 599 We propose replacing client-string-BP3 with the following text and 600 adding the following proposed Section 5.2 to provide implementation 601 guidance. 603 o The string MAY be different for each server network address that 604 the client accesses, rather than common to all server network 605 addresses. 607 o The considerations that might influence a client to use different 608 strings for different network server addresses are explained in 609 Section 5.2. 611 o Despite the use of the word "string" for this identifier, and the 612 fact that using strings will often be convenient, it should be 613 understood that the protocol defines this as opaque data. In 614 particular, those receiving such an id should not assume that it 615 will be in UTF-8 format. Servers MUST NOT reject an 616 nfs_client_id4 simply because the id string is not in UTF-8 617 format. 619 5.2. Client-string Approaches (AS PROPOSED) 621 One particular aspect of the construction of the nfs4_client_id4 622 string has proved recurrently troublesome. The client has a choice 623 of: 625 o Presenting the same id string to multiple server addresses. This 626 is referred to as the "uniform client-string approach" and is 627 discussed in Section 5.2.2. 629 o Presenting different id strings to multiple server addresses. 630 This is referred to as the "non-uniform client-string approach" 631 and is discussed in Section 5.2.1. 633 Note that implementation considerations, including compatibility with 634 existing servers, may make it desirable for a client to use both 635 approaches, based on configuration information, such as mount 636 options. This issue will be discussed in Section 5.2.3. 638 Construction of the client-string has been a troublesome issue 639 because of the way in which the NFS protocols have evolved. 641 o NFSv3 as a stateless protocol had no need to identify the state 642 shared by a particular client-server pair. Thus there was no 643 occasion to consider the question of whether a set of requests 644 come from the same client, or whether two server IP addresses are 645 connected to the same server. As the environment was one in which 646 the user supplied the target server IP address as part of 647 incorporating the remote filesystem in the client's file name 648 space, there was no occasion to take note of server trunking. 649 Within a stateless protocol, the situation was symmetrical. The 650 client has no server identity information and the server has no 651 client identity information. 653 o NFSv4.1 is a stateful protocol with full support for client and 654 server identity determination. This enables the server to be 655 aware when two requests come from the same client (they are on 656 sessions sharing a clientid4) and the client to be aware when two 657 server IP addresses are connected to the same server (they return 658 the same server name in responding to an EXCHANGE_ID). 660 NFSv4.0 is unfortunately halfway between these two. The two client- 661 string approaches have arisen in attempts to deal with the changing 662 requirements of the protocol as implementation has proceeded and 663 features that were not very substantial in [RFC3530], got more 664 substantial. 666 o In the absence of any implementation of the fs_locations-related 667 features (replication, referral, and migration), the situation is 668 very similar to that of NFSv3, with the addition of state but with 669 no concern to provide accurate client and server identity 670 determination. This is the situation that gave rise to the non- 671 uniform client-string approach. 673 o In the presence of replication and referrals, the client may have 674 occasion to take advantage of knowledge of server trunking 675 information. Even more important, migration, by transferring 676 state among servers, causes difficulties for the non-uniform 677 client-string approach, in that the two different client-strings 678 sent to different IP addresses may wind up on the same IP address, 679 adding confusion. 681 o A further consideration is that client implementations typically 682 provide NFSv4.1 by augmenting their existing NFSv4.0 683 implementation, not by providing two separate implementations. 684 Thus the more NFSv4.0 and NFSv4.1 can work alike, the less complex 685 are clients. This is a key reason why those implementing NFSv4.0 686 clients might prefer using the uniform client string model, even 687 if they have chosen not to provide fs_locations-related features 688 in their NFSv4.0 client. 690 Both approaches have to deal with the asymmetry in client and server 691 identity information between client and server. Each seeks to make 692 the client's and the server's views match. In the process, each 693 encounters some combination of inelegant protocol features and/or 694 implementation difficulties. The choice of which to use is up to the 695 client implementer and the sections below try to give some useful 696 guidance. 698 5.2.1. Non-Uniform Client-string Approach 700 The non-uniform client-string approach is an attempt to handle these 701 matters in NFSv4.0 client implementations in as NFSv3-like a way as 702 possible. 704 For a client using the non-uniform approach, all internal recording 705 of clientid4 values is to include, whether explicitly or implicitly, 706 the server IP address so that one always has an (IP-address, 707 clientid4) pair. Two such pairs from different servers are always 708 distinct even when the clientid4 values are the same, as they may 709 occasionally be. In this approach, such equality is always treated 710 as simple happenstance. 712 Making the client-string different on different servers means that a 713 server has no way of tying together information from the same client 714 and so will treat a single client as multiple clients with multiple 715 leases for each server network address. Since there is no way in the 716 protocol for the client to determine if two network addresses are 717 connected to the same server, the resulting lack of knowledge is 718 symmetrical and can result in simpler client implementations in which 719 there is a single clientid/lease per server network addresses. 721 Support for migration, particularly with transparent state migration, 722 is more complex in the case of non-uniform client-strings. For 723 example, migration of a lease can result in multiple leases for the 724 same client accessing the same server addresses, vitiating many of 725 the advantages of this approach. Therefore, client implementations 726 that support migration with transparent state migration SHOULD NOT 727 use the non-uniform client-string approach, except where it is 728 necessary for compatibility with existing server implementations (For 729 details of arranging use of multiple client-string approaches, see 730 Section 5.2.3). 732 5.2.2. Uniform Client-string Approach 734 When the client-string is kept uniform, the server has the basis to 735 have a single clientid4/lease for each distinct client. The problem 736 that has to be addressed is the lack of explicit server identity 737 information, which is made available in NFSv4.1. 739 When the same client-string is given to multiple IP addresses, the 740 client can determine whether two IP addresses correspond to a single 741 server, based on the server's behavior. This is the inverse of the 742 strategy adopted for the non-uniform approach in which different 743 server IP addresses are told about different clients, simply to 744 prevent a server from manifesting behavior that is inconsistent with 745 there being a single server for each IP address, in line with the 746 traditions of NFS. So, to compare: 748 o In the non-uniform approach, servers are told about different 749 clients because, if the server were to use accurate information as 750 to client identity, two IP addresses on the same server would 751 behave as if they were talking to the same client, which might 752 prove disconcerting to a client not expecting such behavior. 754 o In the uniform approach, the servers are told about there being a 755 single client, which is, after all, the truth. Then, when the 756 server uses this information, two IP addresses on the same server 757 will behave as if they are talking to the same client, and this 758 difference in behavior allows the client to infer the server IP 759 address trunking configuration, even though NFSv4.0 does not 760 explicitly provide this information. 762 The approach given in the section below shows one example of how 763 this might be done. 765 The uniform client-string approach makes it necessary to exercise 766 more care in the definition of the nfs_client_id4 boot verifier: 768 o In [RFC3530], the client is told to change the boot verifier when 769 reboot occurs, but there is no explicit statement as to the 770 converse, so that any requirement to keep the verifier constant 771 unless rebooting is only present by implication. 773 o Many existing clients change the boot verifier every time they 774 destroy and recreate the data structure that tracks an pair. This might happen if the last mount of 776 a particular server is removed, and then a fresh mount is created. 777 And, note that this might result in each 778 pair having its own boot verifier that is independent of the 779 others. 781 o Within the uniform client-string approach, an nfs_client_id4 782 designates a globally known client instance, so that the boot 783 verifier should change if and only if a new client instance is 784 created, typically as a result of a reboot. 786 The following are advantages for the implementation of using the 787 uniform client-string approach: 789 o Clients can take advantage of server trunking (and clustering with 790 single-server-equivalent semantics) to increase bandwidth or 791 reliability. 793 o There are advantages in state management so that, for example, we 794 never have a delegation under one clientid revoked because of a 795 reference to the same file from the same client under a different 796 clientid. 798 o The uniform client-string approach allows the server to do any 799 necessary automatic lease merger in connection with migration, 800 without requiring any client involvement. This consideration is 801 of sufficient weight to cause us to RECOMMEND use of the uniform 802 client-string approach for clients supporting transparent state 803 migration. 805 The following implementation considerations might cause issues for 806 client implementations. 808 o This approach is considerably different from the non-uniform 809 approach, which most client implementations have been following. 810 Until substantial implementation experience is obtained with this 811 approach, reluctance to embrace something so new is to be 812 expected. 814 o Mapping between server network addresses and leases is more 815 complicated in that it is no longer a one-to-one mapping. 817 How to balance these considerations depends on implementation goals. 819 5.2.3. Mixing Client-string Approaches 821 As noted above, a client which needs to use the uniform client-string 822 approach (e.g. to support migration), may also need to support 823 existing servers with implementations that do not work properly in 824 this case. 826 Some examples of such server issues include: 828 o Some existing NFSv4 server implementations of IP-address failover 829 depend on clients' use of a non-uniform client-string approach. 830 In particular, when a server supports both its own IP address and 831 one failed over from a partner server, it may have separate sets 832 of state applicable to the two IP addresses, owned by different 833 servers but residing on a single one. 835 In this situation, some servers have relied on clients' use of the 836 non-uniform client-string approach, as suggested but not mandated 837 by [RFC3530], to keep these sets of state separate, and will have 838 problems in handling clients using the uniform client-string 839 approach, in that such clients will see changes in trunking 840 relationships whenever server failover and giveback occur. 842 o Some existing servers incorrectly return NFS4ERR_CLID_INUSE in a 843 way which interferes with clients using the uniform client-string 844 approach. See Section 5.5.3 for details. 846 In order to support such servers, the client can use different 847 approaches for different mounts, as long as: 849 o The uniform client-string approach is used when accessing servers 850 that may return NFS4ERR_MOVED. 852 o The non-uniform client-string approach is used when accessing 853 servers whose implementations make them incompatible with the 854 uniform client-string approach 856 One effective way for clients to handle this is to support the 857 uniform client-string approach as the default, but allow a mount 858 option to specify use of the non-uniform client-string approach for 859 particular mount points, as long as such mount points are not used 860 when migration is to be supported. 862 In the case in which the same server has multiple mounts, and both 863 approaches are specified for the same server, the client could have 864 multiple clientids corresponding to the same server, one for each 865 approach and would then have to keep these separate. 867 5.2.4. Trunking Determination using Uniform Client-strings 869 This section provides an example of how trunking determination could 870 be done by a client following the uniform client-string approach 871 (whether this is used for all mounts or not). Clients need not 872 follow this procedure but implementers should make sure that the 873 issues dealt with by this procedure are all properly addressed. 875 We need to clarify the various possible purposes of trunking 876 determination and the corresponding requirements as to server 877 behavior. The following points should be noted: 879 o The primary purpose of the trunking determination algorithm is to 880 make sure that, if the server treats client requests on two IP 881 addresses as part of the same client, the client will not be 882 blind-sided and encounter disconcerting server behavior, as 883 mentioned in Section 5.2.2. Such behavior could occur if the 884 client were unaware that all of its client requests for the two IP 885 addresses were being handled as part of a single client talking to 886 a single server. 888 o A second purpose to be able to use knowledge of trunking 889 relationships for better performance, etc 891 o If a server were to give out distinct clientid's in response to 892 receiving the same nfs_client_id4 on different network addresses, 893 and acted as if these were separate clients, the primary purpose 894 of trunking determination would be met, as long as the server did 895 not treat them as part of the same client. In this case, the 896 server would be acting, with regard to that client, as if it were 897 two distinct servers. This would interfere with the secondary 898 purpose of trunking determination but there is nothing the client 899 can do about that. 901 o Suppose a server were to give such a client two different 902 clientid's but act as if they were one. That it is the only way 903 that the server could behave in a way that would defeat the 904 primary purpose of the trunking determination algorithm. 906 Servers MUST NOT do that. 908 For a client using the uniform approach, clientid4 values are treated 909 as important information in determining server trunking patterns. 910 For two different IP addresses to return the same clientid4 value is 911 a necessary, though not a sufficient condition for them to be 912 considered as connected to the same server. As a result, when two 913 different IP addresses return the same clientid4, the client needs to 914 determine, using the procedure given below or otherwise, whether the 915 IP addresses are connected to the same server. For such clients, all 916 internal recording of clientid4 values needs to include, whether 917 explicitly or implicitly, identification of the server from which the 918 clientid4 was received so that one always has a (server, clientid4) 919 pair. Two such pairs from different servers are always considered 920 distinct even when the clientid4 values are the same, as they may 921 occasionally be. 923 In order to make this approach work, the client must have accessible, 924 for each nfs4_client_id4 used by the uniform approach (only one in 925 general) a list of all server IP addresses, together with the 926 associated clientid4 values and authentication flavors. As a part of 927 the associated data structures, there should be the ability to mark a 928 server IP structure as having the same server as another and to mark 929 an IP-address as currently unresolved. One way to do this is to a 930 allow each such entry to point to another with the pointer value 931 being one of: 933 o A pointer to another entry for an IP address associated with the 934 same server, where that IP address is the first one referenced to 935 access that server. 937 o A pointer to the current entry if there is no earlier IP address 938 associated with the same server, i.e. where the current IP address 939 is the first one referenced to access that server. We'll refer to 940 such an IP address as the lead IP address for a given server. 942 o The value NULL if the address's server identity is currently 943 unresolved. 945 In order to keep the above information current, in the interests of 946 the most effective trunking determination, RENEWs should be 947 periodically done on each server. However, even if this is not done, 948 the primary purpose of the trunking determination algorithm, to 949 prevent confusion due to trunking hidden from the client, will be 950 achieved. 952 Given this apparatus, when a SETCLIENTID is done and a clientid4 953 returned, the data structure can be searched for a matching clientid4 954 and if such is found, further processing can be done to determine 955 whether the clientid4 match is accidental, or the result of trunking. 957 In this algorithm, when SETCLIENTID is done it will use the common 958 nfs_client_id4 and specify the current target IP address as part of 959 the callback parameters. We call the clientid4 and SETCLIENTID 960 verifier returned by this operation XC and XV. 962 Note that when the client has done previous SETCLIENTID's, to any IP 963 addresses, with more than one authentication flavor, we have the 964 possibility of receiving NFS4ERR_CLID_INUSE, since we do not yet know 965 which of our connections with existing IP addresses might be trunked 966 with our current one. In the event that the SETCLIENID fails with 967 NFS4ERR_CLID_INUSE, one must try all other authentication flavors 968 currently in use and eventually one will be correct and not return 969 NFS4ERR_CLID_INUSE. 971 Note that at this point, no SETCLIENTID_CONFIRM has yet been done. 972 This is because our SETCLIENTID has either established a new 973 clientid4 on a previously unknown server or changed the callback 974 parameters on a clientid4 associated with some already known server. 975 Given that we don't want to confirm something that we are not sure we 976 want to happen, what is to be done next depends on information about 977 existing clientid4's. 979 o If no matching clientid4 is found, the IP address X and clientid4 980 XC are added to the list and considered as having no existing 981 known IP addresses trunked with it. The IP address is marked as a 982 lead IP address for a new server. A SETCLIENTID_CONFIRM is done 983 using XC and XV. 985 o If a matching clientid4 is found which is marked unresolved, 986 processing on the new IP address is suspended. In order to 987 simplify processing, there can only be one unresolved IP address 988 for any given clientid4. 990 o If one or more matching clientid4's is found, none of which is 991 marked unresolved, the new IP address in entered and marked 992 unresolved. After applying the steps below to each of the lead IP 993 addresses with a matching clientid4, the address will have been 994 resolved: either it will be part of the same server as a new IP 995 address to be added to an existing set of IP addresses for a 996 server, or it will be recognized as a new server. At the point at 997 which this determination is made, the unresolved indication is 998 cleared and any suspended SETCLIENTID processing is restarted 1000 So for each lead IP address IPn with a clientid4 matching XC, the 1001 following steps are done. 1003 o If the authentication flavor for IPn does not match that for X, 1004 the IP address is skipped, since it is impossible or IPn and X to 1005 be trunked in these circumstances. This avoids any possibility 1006 that NFS4ERR_CLID_INUSE will be returned for the SETCLIENTID and 1007 SETCLIENID_CONFIRM to be done below, as long as the server(s) at 1008 IP addresses IPn and X are correctly implemented. 1010 o A SETCLIENTID is done to update the callback parameters to reflect 1011 the possibility that X will be marked as associated with the 1012 server whose lead IP address is IPn. The specific callback 1013 parameters chosen, in terms of cb_client4 and callback_ident, are 1014 up to the client and should reflect its preferences as to callback 1015 handling for the common clientid, in the event that X and IPn are 1016 trunked together. So assume that we do that SETCLIENTID on IP 1017 address IPn and get back a setclientid_confirm value (in the form 1018 of a verifier4) SCn. 1020 Note that the v4.0 spec requires the server to make sure that such 1021 value are very unlikely to be regenerated. Given that it is 1022 already highly unlikely that the clientid XC is duplicated by 1023 distinct servers, the probability that Sc is duplicated as well 1024 has to be considered vanishingly small. Note also that the 1025 callback update procedure can be repeated multiple times to reduce 1026 the probability of spurious matches further. 1028 o Note that we don't want this to happen if address X is not 1029 associated with this server. So we do a SETCLIENTID_CONFIRM on 1030 address X using the setclientid_confirm value SCn. 1032 o If the setclientid_confirm value generated on X is accepted on 1033 IPn, then X and IPn are recognized as connected to the same server 1034 and the entry for X is marked as associated with IPn. The entry 1035 is now resolved and processing can be restarted for IP addresses 1036 whose clientid4 matched XC but whose resolution had been deferred. 1038 o If the confirm value generated on IPn is not accepted on X, then X 1039 and IPn are distinct and the callback update will not be 1040 confirmed. So we go on to the next IPn, until we run out of them. 1041 If it happens that we run out of potential matches, then we can 1042 treat X as connected to a distinct server and then update and 1043 confirm its callback parameters on that basis. 1045 Note here that we may set a number of possible values for the 1046 callback parameters to be used for XC, one for the possibility that X 1047 is untrunked, and others for each potential match with an existing 1048 IPn. Although there are multiple such updates at most one will be 1049 confirmed and, if X is untrunked, its original callback parameters 1050 will be put in effect by its SETCLIENID_CONFIRM. 1052 The procedure above has made no explicit mention of the possibility 1053 that server reboot can occur at any time. To address this 1054 possibility the client should periodically use the clientid4 XC in 1055 RENEW operations, directed to both the IP address X and the current 1056 lead IP address that is currently being tested for identity. 1058 o When XC becomes invalid on X, the resolution process should be 1059 terminated, subject to being redone later. Before redoing the 1060 resolution, XC should be checked on all the lead IP addresses on 1061 which it was valid. Once a new clientid4 is established on any 1062 servers on which XC became invalid, a new clientid4 can be 1063 established on X and the resolution process for X can be 1064 restarted. 1066 o When XC does not becomes invalid on X, but becomes invalid on the 1067 current IPn being tested, it should be concluded that X and IPn do 1068 not match and that it is time to advance to the next IPn, if any. 1070 o In the event of a reboot detected on any server lead IP, the set 1071 of IP addresses associated with the server should not change and 1072 state should be re-established for the lease as a whole, using all 1073 available connected server IP addresses. It is prudent to verify 1074 connectivity by doing a RENEW using the new clientid4 on each such 1075 server address before using it, however. 1077 If we have run out of IPn's without finding a matching server, X is 1078 considered as having no existing known IP addresses trunked with it. 1079 The IP address is marked as a lead IP address for a new server. A 1080 SETCLIENTID_CONFIRM is done using XC and XV. 1082 5.3. Proposed changes: merged (vs. synchronized) leases 1084 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1085 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1086 agree. The section entitled "Migration and State" says: 1088 As part of the transfer of information between servers, leases 1089 would be transferred as well. The leases being transferred to the 1090 new server will typically have a different expiration time from 1091 those for the same client, previously on the old server. To 1092 maintain the property that all leases on a given server for a 1093 given client expire at the same time, the server should advance 1094 the expiration time to the later of the leases being transferred 1095 or the leases already present. This allows the client to maintain 1096 lease renewal of both classes without special effort: 1098 There are a number of problems with this and any resolution of our 1099 difficulties must address them somehow. 1101 o The current v4.0 spec recommends that the client make it 1102 essentially impossible to determine when two leases are from "the 1103 same client". 1105 o It is not appropriate to speak of "maintain[ing] the property that 1106 all leases on a given server for a given client expire at the same 1107 time", since this is not a property that holds even in the absence 1108 of migration. A server listening on multiple network addresses 1109 may have the same client appear as multiple clients with no way to 1110 recognize the client as the same. 1112 o Even if the client identity issue could be resolved, advancing the 1113 lease time at the point of migration would not maintain the 1114 desired synchronization property. The leases would be 1115 synchronized until one of them was renewed, after which they would 1116 be unsynchronized again. 1118 To avoid client complexity, we need to have no more than one lease 1119 between a single client and a single server. This requires merger of 1120 leases since there is no real help from synchronizing them at a 1121 single instant. 1123 For the uniform approach, the destination server would simply merge 1124 leases as part of state transfer, since two leases with the same 1125 nfs_client_id4 values must be for the same client. 1127 We have made the following decisions as far as proposed normative 1128 statements regarding for state merger. They reflect the facts that 1129 we want to support fully migration support in the simplest way 1130 possible and that we can't say MUST since we have older clients and 1131 servers to deal with. 1133 o Clients SHOULD use the uniform client-string approach in order to 1134 get good migration support. 1136 o Servers SHOULD provide automatic lease merger during state 1137 migration so that clients using the uniform id approach get the 1138 support automatically. 1140 If the clients and the servers obey the SHOULD's, having more than a 1141 single lease for a given client-server pair will be a transient 1142 situation, cleaned up as part of adapting to use of migrated state. 1144 Since clients and servers will be a mixture of old and new and 1145 because nothing is a MUST we have to ensure that no combination will 1146 show worse behavior than is exhibited by current (i.e. old) clients 1147 and servers. 1149 5.4. Other proposed changes to migration-state sections 1151 5.4.1. Proposed changes: Client ID migration 1153 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1154 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1155 agree. The section entitled "Migration and State" says: 1157 In the case of migration, the servers involved in the migration of 1158 a filesystem SHOULD transfer all server state from the original to 1159 the new server. This must be done in a way that is transparent to 1160 the client. This state transfer will ease the client's transition 1161 when a filesystem migration occurs. If the servers are successful 1162 in transferring all state, the client will continue to use 1163 stateids assigned by the original server. Therefore the new 1164 server must recognize these stateids as valid. This holds true 1165 for the client ID as well. Since responsibility for an entire 1166 filesystem is transferred with a migration event, there is no 1167 possibility that conflicts will arise on the new server as a 1168 result of the transfer of locks. 1170 This poses some difficulties, mostly because the part about "client 1171 ID" is not clear: 1173 o It isn't clear what part of the paragraph the "this" in the 1174 statement "this holds true ..." is meant to signify. 1176 o The phrase "the client ID" is ambiguous, possibly indicating the 1177 clientid4 and possibly indicating the nfs_client_id4. 1179 o If the text means to suggest that the same clientid4 must be used, 1180 the logic is not clear since the issue is not the same as for 1181 stateids of which there might be many. Adapting to the change of 1182 a single clientid, as might happen as a part of lease migration, 1183 is relatively easy for the client. 1185 We have decided to address this issue as follows, with the relevant 1186 changes all reflected in Section 5.6. 1188 o Make it clear that both clientid4 and nfs_client_id4 (including 1189 both id string and boot verifier) are to be transferred. 1191 o Indicate that the initial transfer will result in the same 1192 clientid4 after transfer but this is not guaranteed since there 1193 may conflict with an existing clientid4 on the destination server 1194 and because lease merger can result in a change of the clientid4. 1196 5.4.2. Proposed changes: Callback re-establishment 1198 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1199 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1200 agree. The section entitled "Migration and State" says: 1202 A client SHOULD re-establish new callback information with the new 1203 server as soon as possible, according to sequences described in 1204 sections "Operation 35: SETCLIENTID - Negotiate Client ID" and 1205 "Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID". This 1206 ensures that server operations are not blocked by the inability to 1207 recall delegations. 1209 The above will need to be fixed to reflect the possibility of merging 1210 of leases and the text to do this appears as part of Section 5.6. 1212 5.4.3. Proposed changes: NFS4ERR_LEASE_MOVED rework 1214 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1215 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1216 agree. The section entitled "Notification of Migrated Lease" says: 1218 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that 1219 supports filesystem migration MUST probe all filesystems from that 1220 server on which it holds open state. Once the client has 1221 successfully probed all those filesystems which are migrated, the 1222 server MUST resume normal handling of stateful requests from that 1223 client. 1225 There is a lack of clarity that is prompted by ambiguity about what 1226 exactly probing is and what the interlock between client and server 1227 must be. This has led to some worry about the scalability of the 1228 probing process, and although the time required does scale linearly 1229 with the number of fs's that the client may have state for with 1230 respect to a given server, the actual process can be done 1231 efficiently. 1233 To address these issues we propose replacing the above with the text 1234 addressing NFS4RR_LEASE_MOVED as given in Section 5.6.3. 1236 5.5. Proposed changes to other sections 1238 5.5.1. Proposed changes: callback update 1240 Some changes are necessary to reduce confusion about the process of 1241 callback information update and in particular to make it clear that 1242 no state is freed as a result: 1244 o Make it clear that after migration there are confirmed entries for 1245 transferred clientid4/nfs_client_id4 pairs. 1247 o Be explicit in the sections headed "otherwise," in the 1248 descriptions of SETCLIENTID and SETCLIENTID_CONFIRM, that these 1249 don't apply in the cases we are concerned about. 1251 5.5.2. Proposed changes: clientid4 handling 1253 To address both of the clientid4-related issues mentioned in 1254 Section 4.4, we propose replacing the last three paragraphs of the 1255 section entitled "Client ID" with the following: 1257 Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has 1258 successfully completed, the client uses the shorthand client 1259 identifier, of type clientid4, instead of the longer and less 1260 compact nfs_client_id4 structure. This shorthand client 1261 identifier (a client ID) is assigned by the server and should be 1262 chosen so that it will not conflict with a client ID previously 1263 assigned by same server. This applies across server restarts or 1264 reboots. 1266 Distinct servers MAY assign clientid4's independently, and will 1267 generally do so. Therefore, a client has to be prepared to deal 1268 with multiple instances of the same clientid4 value received on 1269 distinct IP addresses, denoting separate entities. When trunking 1270 of server IP addresses is not a consideration, a client should 1271 keep track of (IP-address, clientid4) pairs, so that each pair is 1272 distinct. For a discussion of how to address the issue in the 1273 face of possible trunking of server IP addresses, see Section 5.2. 1275 When a clientid4 is presented to a server and that clientid4 is 1276 not recognized, the server will reject the request with the error 1277 NFS4ERR_STALE_CLIENTID. This can occur for a number of reasons: 1279 * A server reboot causing loss of the server's knowledge of the 1280 client 1282 * Client error sending an incorrect clientid4 or valid clientid4 1283 to the wrong server. 1285 * Loss of lease state due to lease expiration. 1287 * Client or server error causing the server to believe that the 1288 client has rebooted (i.e. receiving a SETCLIENTID with an 1289 nfs_client_id4 which has a matching id string and a non- 1290 matching boot verifier). 1292 * Migration of all state under the associated lease causes its 1293 non-existence to be recognized on the source server. 1295 * Merger of state under the associated lease with another lease 1296 under a different clientid causes the clientid4 serving as the 1297 source of the merge to cease being recognized on its server. 1299 In the event of a server reboot, or loss of lease state due to 1300 lease expiration, the client must obtain a new clientid4 by use of 1301 the SETCLIENTID operation and then proceed to any other necessary 1302 recovery for the server reboot case (See the section entitled 1303 "Server Failure and Recovery"). In cases of server or client 1304 error resulting in this error, use of SETCLIENTID to establish a 1305 new lease is desirable as well. 1307 In the last two cases, different recovery procedures are required. 1308 See Section 5.6 for details. Note that in cases in which there is 1309 any uncertainty about which sort of handling is applicable, the 1310 distinguishing characteristic is that in reboot-like cases, the 1311 clientid4 and all associated stateids cease to exist while in 1312 migration-related cases, the clientid4 ceases to exist while the 1313 stateids are still valid. 1315 The client must also employ the SETCLIENTID operation when it 1316 receives a NFS4ERR_STALE_STATEID error using a stateid derived 1317 from its current clientid4, since this indicates a situation, such 1318 as server reboot which has invalidated the existing clientid4 and 1319 associated stateids (see the section entitled "lock-owner" for 1320 details). 1322 See the detailed descriptions of SETCLIENTID and 1323 SETCLIENTID_CONFIRM for a complete specification of the 1324 operations. 1326 5.5.3. Proposed changes: NFS4ERR_CLID_INUSE 1328 It appears to be the intention that only a single authentication 1329 flavor be used for client establishment between any client-server 1330 pair. However: 1332 o There is no explicit statement to this effect. 1334 o The error that indicates an authentication flavor conflict has a 1335 name which does not clarify this issue: NFS4ERR_CLID_INUSE. 1337 o The definition of the error is also not very helpful: "The 1338 SETCLIENTID operation has found that a client id is already in use 1339 by another client". 1341 As a result, servers exist which reject a SETCLIENTID simply because 1342 there already exists a clientid for the same client, established 1343 using a different IP address. Although this is generally understood 1344 to be erroneous, such servers still exist and the spec should make 1345 the correct behavior clear. 1347 Although the error name cannot be changed, the following changes 1348 should be made to avoid confusion: 1350 o The definition of the error should be changed to read, "The 1351 SETCLIENTID operation has found that the specified nfs_client_id4 1352 was previously presented with a different authentication flavor 1353 and that client instance currently holds an active lease." 1355 o In the description of SETCLIENTID, the phrase "then the server 1356 returns a NFS4ERR_CLID_INUSE error" should be expanded to read 1357 "then the server returns a NFS4ERR_CLID_INUSE error, since use of 1358 a single client with multiple principals is not allowed." 1360 5.6. Migration, Replication and State (AS PROPOSED) 1362 When responsibility for handling a given filesystem is transferred to 1363 a new server (migration) or the client chooses to use an alternate 1364 server (e.g., in response to server unresponsiveness) in the context 1365 of filesystem replication, the appropriate handling of state shared 1366 between the client and server (i.e., locks, leases, stateids, and 1367 client IDs) is as described below. The handling differs between 1368 migration and replication. 1370 If a server replica or a server immigrating a filesystem agrees to, 1371 or is expected to, accept opaque values from the client that 1372 originated from another server, then it is a wise implementation 1373 practice for the servers to encode the "opaque" values in network 1374 byte order. When doing so, servers acting as replicas or immigrating 1375 filesystems will be able to parse values like stateids, directory 1376 cookies, filehandles, etc. even if their native byte order is 1377 different from that of other servers cooperating in the replication 1378 and migration of the filesystem. 1380 5.6.1. Migration and State 1382 In the case of migration, the servers involved in the migration of a 1383 filesystem SHOULD transfer all server state from the original to the 1384 new server. This must be done in a way that is transparent to the 1385 client. This state transfer will ease the client's transition when a 1386 filesystem migration occurs. If the servers are successful in 1387 transferring all state, the client will continue to use stateids 1388 assigned by the original server. Therefore the new server must 1389 recognize these stateids as valid. 1391 If transferring stateids from server to server would result in a 1392 conflict for an existing stateid for the destination server with the 1393 existing client, transparent state migration MUST NOT happen for that 1394 client. Servers participating in using transparent state migration 1395 should co-ordinate their stateid assignment policies to make this 1396 situation unlikely or impossible. The means by which this might be 1397 done, like all of the inter-server interactions for migration, are 1398 not specified by the NFS version 4.0 protocol. 1400 Handling of clientid values is similar but not identical. The 1401 clientid4 and nfs_client_id4 information (id string and boot 1402 verifier) will be transferred with the rest of the state information 1403 and the destination server should use that information to determine 1404 appropriate clientid4 handling. Although the destination server may 1405 make state stored under an existing lease available under the 1406 clientid4 used on the source server, the client should not assume 1407 that this is always so. In particular, 1409 o If there is an existing lease with an nfs_client_id4 that matches 1410 a migrated lease (same id string and boot verifier), the server 1411 SHOULD merge the two, making the union of the sets of stateids 1412 available under the clientid4 for the existing lease. As part of 1413 the lease merger, the expiration time of the lease will reflect 1414 renewal done within either of the ancestor leases (and so will 1415 reflect the latest of the renewals). 1417 o If there is an existing lease with an nfs_client_id4 that 1418 partially matches a migrated lease (same id string and a different 1419 boot verifier), the server MUST eliminate one of the two, possibly 1420 invalidating one of the ancestor clientid4's. Since boot 1421 verifiers are not ordered, the later lease renewal time will 1422 prevail. 1424 When leases are not merged, the transfer of state should result in 1425 creation of a confirmed client record with empty callback information 1426 but matching the {v, x, c} for the transferred client information. 1427 This should enable establishment of new callback information using 1428 SETCLIENTID and SETCLIENTID_CONFIRM. 1430 A client may determine the disposition of migrated state by using a 1431 stateid associated with the migrated state and in an operation on the 1432 new server and using the associated clientid4 in a RENEW on the new 1433 server. 1435 o If the stateid is not valid and an error NFS4ERR_BAD_STATEID is 1436 received, either transparent state migration has not occurred or 1437 the state was purged due to boot verifier mismatch. 1439 o If the stateid is valid and an error NFS4ERR_STALE_CLIENTID is 1440 received on the RENEW, transparent state migration has occurred 1441 and the lease has been merged with an existing lease on the 1442 destination server. 1444 o If the stateid is valid and the clientid4 is valid, the lease has 1445 been transferred intact. 1447 Since responsibility for an entire filesystem is transferred with a 1448 migration event, there is no possibility that conflicts will arise on 1449 the new server as a result of the transfer of locks. 1451 The servers may choose not to transfer the state information upon 1452 migration. However, this choice is discouraged, except where 1453 specific issues such as stateid conflicts make it necessary. In the 1454 case of migration without state transfer, when the client presents 1455 state information from the original server (e.g. in a RENEW op or a 1456 READ op of zero length), the client must be prepared to receive 1457 either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new 1458 server. The client should then recover its state information as it 1459 normally would in response to a server failure. The new server must 1460 take care to allow for the recovery of state information as it would 1461 in the event of server restart. 1463 When a lease is transferred to a new server (as opposed to being 1464 merged with a lease already on the new server), a client SHOULD re- 1465 establish new callback information with the new server as soon as 1466 possible, according to sequences described in sections "Operation 35: 1467 SETCLIENTID - Negotiate Client ID" and "Operation 36: 1468 SETCLIENTID_CONFIRM - Confirm Client ID". This ensures that server 1469 operations are not blocked by the inability to recall delegations. 1471 In those situation in which state has not been transferred, as shown 1472 by a return of NFS4ERR_BAD_STATEID, the client may attempt to reclaim 1473 the locks in order to take advantage of cases in which destination 1474 server has set up a file-system-specific grace period in support of 1475 the migration. 1477 5.6.2. Replication and State 1479 Since client switch-over in the case of replication is not under 1480 server control, the handling of state is different. In this case, 1481 leases, stateids and client IDs do not have validity across a 1482 transition from one server to another. The client must re-establish 1483 its locks on the new server. This can be compared to the re- 1484 establishment of locks by means of reclaim-type requests after a 1485 server reboot. The difference is that the server has no provision to 1486 distinguish requests reclaiming locks from those obtaining new locks 1487 or to defer the latter. Thus, a client re-establishing a lock on the 1488 new server (by means of a LOCK or OPEN request), may have the 1489 requests denied due to a conflicting lock. Since replication is 1490 intended for read-only use of filesystems, such denial of locks 1491 should not pose large difficulties in practice. When an attempt to 1492 re-establish a lock on a new server is denied, the client should 1493 treat the situation as if its original lock had been revoked. 1495 5.6.3. Notification of Migrated Lease 1497 In the case of lease renewal, the client may not be submitting 1498 requests for a filesystem that has been migrated to another server. 1499 This can occur because of the implicit lease renewal mechanism. The 1500 client renews a lease containing state of multiple filesystems when 1501 submitting a request to any one filesystem at the server. 1503 In order for the client to schedule renewal of leases that may have 1504 been relocated to the new server, the client must find out about 1505 lease relocation before those leases expire. Similarly, when 1506 migration occurs but there has not been transparent state migration, 1507 the client needs to find out about the change soon enough to be able 1508 to reclaim the lock within the destination server's grace period. To 1509 accomplish this, all operations which implicitly renew leases for a 1510 client (such as OPEN, CLOSE, READ, WRITE, RENEW, LOCK, and others), 1511 will return the error NFS4ERR_LEASE_MOVED if responsibility for any 1512 of the leases to be renewed has been transferred to a new server. 1513 Note that when the transfer of responsibility leaves remaining state 1514 for that lease on the source server, the lease is renewed just as it 1515 would have been in the NFS4ERR_OK case, despite returning the error. 1516 The transfer of responsibility happens when the server receives a 1517 GETATTR(fs_locations) from the client for each filesystem for which a 1518 lease has been moved to a new server. Normally it does this after 1519 receiving an NFS4ERR_MOVED for an access to the filesystem but the 1520 server is not required to verify that this happens in order to 1521 terminate the return of NFS4ERR_LEASE_MOVED. By convention, the 1522 compounds containing GETATTR(fs_locations) SHOULD include an appended 1523 RENEW operation to permit the server to identify the client getting 1524 the information. 1526 Note that the NFS4ERR_LEASE_MOVED error is only required when 1527 responsibility for at least one stateid has been affected. In the 1528 case of a null lease, where the only associated state is a clientid, 1529 no NFS4ERR_LEASE_MOVED error need be generated. 1531 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports 1532 filesystem migration MUST perform the necessary GETATTR operation for 1533 each of the filesystems containing state that have been migrated and 1534 so give the server evidence that it is aware of the migration of the 1535 filesystem. Once the client has done this for all migrated 1536 filesystems on which the client holds state, the server MUST resume 1537 normal handling of stateful requests from that client. 1539 One way in which clients can do this efficiently in the presence of 1540 large numbers of filesystems is described below. This approach 1541 divides the process into two phases, one devoted to finding the 1542 migrated filesystems and the second devoted to doing the necessary 1543 GETATTRs. 1545 The client can find the migrated filesystems by building and issuing 1546 one or more COMPOUND requests, each consisting of a set of PUTFH/ 1547 GETFH pairs, each pair using an fh in one of the filesystems in 1548 question. All such COMPOUND requests can be done in parallel. The 1549 successful completion of such a request indicates that none of the 1550 fs's interrogated have been migrated while termination with 1551 NFS4ERR_MOVED indicates that the filesystem getting the error has 1552 migrated while those interrogated before it in the same COMPOUND have 1553 not. Those whose interrogation follows the error remain in an 1554 uncertain state and can be interrogated by restarting the requests 1555 from after the point at which NFS4ERR_MOVED was returned or by 1556 issuing a new set of COMPOUND requests for the filesystems which 1557 remain in an uncertain state. 1559 Once the migrated filesystems have been found, all that is needed is 1560 for the client to give evidence to the server that it is aware of the 1561 migrated status of filesystems found by this process, by 1562 interrogating the fs_locations attribute for an fh within each of the 1563 migrated filesystems. The client can do this by building and issuing 1564 one or more COMPOUND requests, each of which consists of a set of 1565 PUTFH operations, each followed by a GETATTR of the fs_locations 1566 attribute. A RENEW follows to help tie the operations to the lease 1567 returning NFS4ERR_LEASE_MOVED. Once the client has done this for all 1568 migrated filesystems on which the client holds state, the server will 1569 resume normal handling of stateful requests from that client. 1571 In order to support legacy clients that do not handle the 1572 NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after 1573 a wait of at least two lease periods, at which time it will resume 1574 normal handling of stateful requests from all clients. If a client 1575 attempts to access the migrated files, the server MUST reply 1576 NFS4ERR_MOVED. 1578 When the client receives an NFS4ERR_MOVED error, the client can 1579 follow the normal process to obtain the new server information 1580 (through the fs_locations attribute) and perform renewal of those 1581 leases on the new server. If the server has not had state 1582 transferred to it transparently, the client will receive either 1583 NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, 1584 as described above. The client can then recover state information as 1585 it does in the event of server failure. 1587 Aside from recovering from a migration, there are other reasons a 1588 client may wish to retrieve fs_locations information from a server. 1589 When a server becomes unresponsive, for example, a client may use 1590 cached fs_locations data to discover an alternate server hosting the 1591 same fs data. A client may periodically request fs_locations data 1592 from a server in order to keep its cache of fs_locations data fresh. 1594 Since a GETATTR(fs_locations) operation would be used for refreshing 1595 cached fs_locations data, a server could mistake such a request as 1596 indicating recognition of an NFS4ERR_LEASE_MOVED condition. 1597 Therefore a compound which is not intended to signal that a client 1598 has recognized a migrated lease SHOULD be prefixed with a guard 1599 operation which fails with NFS4ERR_MOVED if the file handle being 1600 queried is no longer present on the server. The guard can be as 1601 simple as a GETFH operation. 1603 Though unlikely, it is possible that the target of such a compound 1604 could be migrated in the time after the guard operation is executed 1605 on the server but before the GETATTR(fs_locations) operation is 1606 encountered. When a client issues a GETATTR(fs_locations) operation 1607 as part of a compound not intended to signal recognition of a 1608 migrated lease, it SHOULD be prepared to process fs_locations data in 1609 the reply that shows the current location of the fs is gone. 1611 5.6.4. Migration and the Lease_time Attribute 1613 In order that the client may appropriately manage its leases in the 1614 case of migration, the destination server must establish proper 1615 values for the lease_time attribute. 1617 When state is transferred transparently, that state should include 1618 the correct value of the lease_time attribute. The lease_time 1619 attribute on the destination server must never be less than that on 1620 the source since this would result in premature expiration of leases 1621 granted by the source server. Upon migration in which state is 1622 transferred transparently, the client is under no obligation to re- 1623 fetch the lease_time attribute and may continue to use the value 1624 previously fetched (on the source server). 1626 In the case in which lease merger occurs as part of state transfer, 1627 the lease_time attribute of the destination lease remains in effect. 1628 The client can simply renew that lease with its existing lease_time 1629 attribute. State in the source lease is renewed at the time of 1630 transfer so that it cannot expire, as long as the destination lease 1631 is appropriately renewed. 1633 If state has not been transferred transparently (i.e., the client 1634 need to reclaim or re-obtain its locks), the client should fetch the 1635 value of lease_time on the new (i.e., destination) server, and use it 1636 for subsequent locking requests. However the server must respect a 1637 grace period at least as long as the lease_time on the source server, 1638 in order to ensure that clients have ample time to reclaim their 1639 locks before potentially conflicting non-reclaimed locks are granted. 1640 The means by which the new server obtains the value of lease_time on 1641 the old server is left to the server implementations. It is not 1642 specified by the NFS version 4.0 protocol. 1644 6. Results of proposed changes for NFSv4.0 1646 The purpose of this section is to examine the troubling results 1647 reported in Section 3.1. We will look at the scenarios as they would 1648 be handled within the proposal. 1650 Because the choice of uniform vs. non-uniform nfs_client_id4 id 1651 strings is a "SHOULD" in these cases, we will designate clients that 1652 follow this recommendation by SHOULD-UF-CID. 1654 We will also have to take account of any merger-related "SHOULD" 1655 clauses to better understand how they have addressed the issues seen. 1656 We abbreviate as follows: 1658 o SHOULD-SVR-AM refers to the server obeying the SHOULD which 1659 RECOMMENDS that they merge leases with identical nfs_client_id4 id 1660 strings and boot verifiers. 1662 6.1. Results: Failure to free migrated state on client reboot 1664 Let's look at the troublesome situation cited in Section 3.1.1. We 1665 have already seen what happens when SHOULD-UF-CID does not hold. Now 1666 let's look at the situation in which SHOULD-UF-CID holds, whether 1667 SHOULD-SVR-AM is in effect or not. 1669 o A client C establishes a clientid4 C1 with server ABC specifying 1670 an nfs_client_id4 with id string value "C" and boot verifier 1671 0x111. 1673 o The client begins to access files in filesystem F on server ABC, 1674 resulting in generating stateids S1, S2, etc. under the lease for 1675 clientid C1. It may also access files on other filesystems on the 1676 same server. 1678 o The filesystem is migrated from ABC to server XYZ. When 1679 transparent state migration is in effect, stateids S1 and S2 and 1680 lease {0x111, "C", C1} are now available for use by client C at 1681 server XYZ. So far, so good. 1683 o Client C reboots and attempts to access data on server XYZ, 1684 whether in filesystem F or another. It does a SETCLIENID with an 1685 nfs_client_id4 with id string value "C" and boot verifier 0x112. 1686 The state associated with lease {0x111, "C", C1} is deleted as 1687 part of creating {0x112, "C", C2}. No problem. 1689 The correctness signature for this issue is 1691 SHOULD-UF-CID 1693 so if you have clients and servers that obey the SHOULD clauses, the 1694 problem is gone regardless of the choice on the MAY. 1696 6.2. Results: Server reboots resulting in confused lease situation 1698 Now let's consider the scenario given in Section 3.1.2. We have 1699 already seen what happens when SHOULD-UF-CID does not hold . Now 1700 let's look at the situation in which SHOULD-UF-CID holds and SHOULD- 1701 SVR-AM holds as well. 1703 o Client C talks to server ABC using an nfs_client_id4 id string 1704 such as "C-ABC" and boot verifier v1. As a result a lease with 1705 clientid4 c.i established: {v1, "C-ABC", c.i}. 1707 o fs_a1 migrates from server ABC to server XYZ along with its state. 1708 Now server XYZ also has a lease: {v1, "C-ABC", c.i} 1710 o Server ABC reboots. 1712 o Client C talks to server ABC using an nfs_client_id4 id string 1713 such as "C-ABC" and boot verifier v1. As a result a lease with 1714 clientid4 c.j established: {v1, "C-ABC", c.j}. 1716 o fs_a2 migrates from server ABC to server XYZ. As part of 1717 migration the incoming lease is seen to denote same Nfs_client_id4 1718 and so is merged with {v1, "C-ABC, c.i}. 1720 o Now server XYZ has only one lease that matches {v1, "C_ABC", *}, 1721 so the problem is solved 1723 Now let's consider the same scenario in the situation in which 1724 SHOULD-UF-CID holds and SHOULD-SVR-AM holds as well. 1726 o Client C talks to server ABC using an nfs_client_id4 id string "C" 1727 and boot verifier v1. As a result a lease with clientid4 c.i is 1728 established: {v1, "C", c.i}. 1730 o fs_a1 migrates from server ABC to server XYZ along with its state. 1731 Now XYZ also has a lease: {v1, "C", c.i} 1733 o Server ABC reboots. 1735 o Client C talks to server ABC using an nfs_client_id4 id string "C" 1736 and boot verifier v1. As a result a lease with clientid4 c.j is 1737 established: {v1, "C", c.j}. 1739 o fs_a2 migrates from server ABC to server XYZ. As part of 1740 migration the incoming lease is seen to denote the same 1741 nfs_client_id4 and so is merged with {v1, "C", c.i}. 1743 o Now server XYZ has only one lease that matches {v1, "C", *}, so 1744 the problem is solved 1746 The correctness signature for this issue is 1747 SHOULD-SVR-AM 1749 so if you have clients and servers that obey the SHOULD clauses, the 1750 problem is gone regardless of the choice on the MAY. 1752 6.3. Results: Client complexity issues 1754 Consider the following situation: 1756 o There are a set of clients C1 through Cn accessing servers S1 1757 through Sm. Each server manages some significant number of 1758 filesystems with the filesystem count L being significantly 1759 greater than m. 1761 o Each client Cx will access a subset of the servers and so will 1762 have up to m clientid's, which we will call Cxy for server Sy. 1764 o Now assume that for load-balancing or other operational reasons, 1765 numbers of filesystems are migrated among the servers. As a 1766 result, depending on how this handled, the number of clientids may 1767 explode. See below. 1769 Now look what will happen under various scenarios: 1771 o We have previously (in Section 3.1.3) looked at this in case of 1772 client following the non-uniform client-string approach. In that 1773 case, each client-server pair could have up to m clientid's and 1774 each client will have up to m**2 clientids. If we add the 1775 possibility of server reboot, the only bound on a client's 1776 clientid count is L. 1778 o If we look at this in the SHOULD-UF-CID case in which the SHOULD- 1779 SVR_AM condition holds, the situation is no different. Although 1780 the server has the client identity information that could enable 1781 same-client-same-server leases to be combined, it does not do so. 1782 We still have up to L clientid's per client. 1784 o On the other hand, if we look at the SHOULD-UF-CID case in which 1785 SHOULD-SVR-AM holds, the problem is gone. There can be no more 1786 than m clientids per client, and n clientid's per server. 1788 The correctness signature for this issue is 1790 (SHOULD-UF-CID & SHOULD-SVR-AM) 1792 so if you have clients and servers that obey the SHOULD clauses, the 1793 problem is gone regardless of the choice on the MAY. 1795 6.4. Result summary 1797 We have seen that (SHOULD-SVR-AM & SHOULD-UF-CID) are sufficient to 1798 solve the problems people have experienced. 1800 7. Issues for NFSv4.1 1802 Because NFSv4.1 embraces the uniform client-string approach, 1803 addressing migration issues is simpler. In the terms of Section 6, 1804 we already have SHOULD-UF-CID, for NFSv4.1, as advised by section 2.4 1805 of [RFC5661], simplifying the work to be done. 1807 Nevertheless, there are some issues that will have to be addressed. 1808 Some examples: 1810 o The other necessary part of addressing migration issues, which we 1811 call above SHOULD-SVR-AM, is not currently addressed by NFSv4.1 1812 and changes need to be made to make it clear that state needs to 1813 be appropriately merged as part of migration, to avoid multiple 1814 clientids between a client-server pair. 1816 o There needs to be some clarification of how migration, and 1817 particularly transparent state migration, should interact with 1818 pNFS layouts. 1820 o The current discussion (in [RFC5661]), of the possibility of 1821 server_owner changes is incomplete and confusing. 1823 Discussion of how to resolve these issues will appear in the sections 1824 below. 1826 7.1. Addressing state merger in NFSv4.1 1828 The existing treatment of state transfer in [RFC5661], has similar 1829 problems to that in [RFC3530] in that it assumes that the state for 1830 multiple fs's on different servers will not be merged to so that it 1831 appears under a single common clientid. We've already seen the 1832 reasons that this is a problem, with regard to NFSv4.0. 1834 Although we don't have the problems stemming from the non-uniform 1835 client-string approach, there are a number of complexities in the 1836 existing treatment of state management in the section entitled "Lock 1837 State and File System Transitions" in [RFC5661] that make this non- 1838 trivial to address: 1840 o Migration is currently treated together with other sorts of file 1841 system transitions including transitioning between replicas 1842 without any NFS4ERR_MOVED errors. 1844 o There is separate handling and discussion of the cases of matching 1845 and non-matching server scopes. 1847 o In the case of matching server scopes, the text calls for an 1848 impossible degree of transparency. 1850 o In the case of non-matching server scopes, the text does not 1851 mention transparent state migration at all, resulting in a 1852 functional regression from NFSV4.0 1854 7.2. Addressing pNFS relationship with migration 1856 This is made difficult because, within the PNFS framework, migration 1857 might mean any of several things: 1859 o Transfer of the MDS, leaving DS's alone. 1861 This would be minimally disruptive to those using layouts but 1862 would a require the pNFS control protocol to support the DS being 1863 directed to a new MDS. 1865 o Transfer of a DS, leaving everything else in place. 1867 Such a transfer can be handled without using migration at all. 1868 The server can recall/revoke layouts, as appropriate. 1870 o Transfer of the file system to a new file system with both MDS and 1871 DS's moving. 1873 In such a transfer, an entirely different set of DS's will be at 1874 the target location. There may even be no pNFS support on the 1875 destination FS at all. 1877 Migration needs to support both the first and last of these models. 1879 7.3. Addressing server owner changes in NFSv4.1 1881 Section 2.10.5 of [RFC5661] states the following. 1883 The client should be prepared for the possibility that 1884 eir_server_owner values may be different on subsequent EXCHANGE_ID 1885 requests made to the same network address, as a result of various 1886 sorts of reconfiguration events. When this happens and the 1887 changes result in the invalidation of previously valid forms of 1888 trunking, the client should cease to use those forms, either by 1889 dropping connections or by adding sessions. For a discussion of 1890 lock reclaim as it relates to such reconfiguration events, see 1891 Section 8.4.2.1. 1893 While this paragraph is literally true in that such reconfiguration 1894 events can happen and clients have to deal with them, it is confusing 1895 in that it can be read as suggesting that clients have to deal with 1896 them without disruption, which in general is impossible. 1898 A clearer alternative would be: 1900 It is always possible that, as a result of various sorts of 1901 reconfiguration events, eir_server_scope and eir_server_owner 1902 values may be different on subsequent EXCHANGE_ID requests made to 1903 the same network address. 1905 In most cases such reconfiguration events will be disruptive and 1906 indicate that an IP address formerly connected to one server is 1907 now connected to an entirely different one. 1909 Some guidelines on client handling of such situations follow: 1911 * When eir_server_scope changes, the client has no assurance that 1912 any id's it obtained previously (e.g. file handles) can be 1913 validly used on the new server, and, even if the new server 1914 accepts them, there is no assurance that this is not due to 1915 accident. Thus it is best to treat all such state as lost/ 1916 stale although a client may assume that the probability of 1917 inadvertent acceptance is low and treat this situation as 1918 within the next case. 1920 * When eir_server_scope remains the same and 1921 eir_server_owner.so_major_id changes, the client can use 1922 filehandles it has and attempt reclaims. It may find that 1923 these are now stale but if NFS4ERR_STALE is not received, he 1924 can proceed to reclaim his opens. 1926 * When eir_server_scope and eir_server_owner.so_major_id remain 1927 the same, the client has to use the now-current values of 1928 eir_server-owner.so_minor_id in deciding on appropriate forms 1929 of trunking. 1931 8. Lock State and File System Transitions (AS PROPOSED) 1933 In dealing with file system transitions, the client needs to handle 1934 cases in which the two servers have cooperated in state management 1935 and cases in which they have not. 1937 The primary means by which a client finds out about state management 1938 co-operation is by comparing eir_server_scope values returned by each 1939 server. If the scope values do not match, then any co-operation of 1940 the servers in state management, is limited to transferring state in 1941 event of migration and making arrangements for the safe reclamation 1942 of locking state. If the scope values match, then this indicates the 1943 servers have cooperated in assigning client IDs and stateids to the 1944 point that the same id will not refer to different things on 1945 different servers. Servers may reject client IDs that refer to state 1946 they do not know about. See the section entitled "Server Scope" for 1947 more information about the use of server scope. 1949 How the client needs to deal with locking state with regard to these 1950 situations will depend upon: 1952 o The type of file system transition occurring. 1954 o The type of state involved (e.g. layout state may sometimes be 1955 handled differently). 1957 o The specific level of state handling co-ordination between the two 1958 servers for the specific transition. 1960 We will divide the basic description of these possibilities into 1961 three sections 1963 o In Section 8.1, we will discuss handling specific to the case of 1964 matching server scopes. 1966 o In Section 8.2, we will discuss handling specific to the case of 1967 non-matching server scopes. 1969 o In Section 8.3, we will discuss issues relating to handling common 1970 to both cases. 1972 8.1. File System Transitions with Matching Server Scopes 1974 In the case of migration, the servers involved in the migration of a 1975 file system SHOULD transfer all server state relevant to the 1976 migrating file system from the original to the new server. When this 1977 is done, it needs to be done in a way that is maximally transparent 1978 to the client in that all stateids used by the client to access state 1979 on the filesystem in question can be used on the new server, albeit 1980 possibly under different client IDs. 1982 When layouts are active for a migrated file system, layout state 1983 SHOULD be included as part of the state transferred. Even if it is 1984 the case that there are circumstances preventing the layout from 1985 being supported on the new server, this should be dealt with by 1986 recalling layouts either before or after the transition. Where this 1987 cannot be done, layout revocation is possible but any such revocation 1988 should appear to the client just as any other layout revocation 1989 would. 1991 With replication, such a degree of common state is typically not the 1992 case. Clients, however, should use the information provided by the 1993 eir_server_scope returned by EXCHANGE_ID (as modified by the 1994 validation procedures described in the section entitled "Server 1995 Scope") to determine whether such sharing may be in effect in non- 1996 migration cases, rather than making assumptions based solely on the 1997 reason for the transition. 1999 This state transfer will reduce disruption to the client when a file 2000 system transition occurs. If the servers are successful in 2001 transferring all state, the client can access existing stateids, 2002 using either existing or new sessions between the client and the new 2003 server instance. If the server accepts such a transferred stateid as 2004 valid, then the client may use that stateid to access the same state 2005 that it represented on the old server. 2007 When the two servers belong to the same server scope, it does not 2008 mean that when dealing with the transition, the client will not have 2009 to reclaim or otherwise reobtain state. However, it does mean that 2010 the client may proceed using its current stateids when communicating 2011 with the new server, and the new server will either recognize the 2012 stateids as valid or reject them, in which case locking state must be 2013 reobtained by the client. 2015 File systems cooperating in state management may actually share state 2016 or simply divide the identifier space so as to recognize (and reject 2017 as stale) each other's stateids and client IDs. Servers that do 2018 share state may not do so under all conditions or at all times. If 2019 the server cannot be sure when accepting a stateid that it reflects 2020 the locks the client was given, the server must treat the state as 2021 stale and report it as such to the client. 2023 8.2. File System Transitions with Non-Matching Server Scopes 2025 When the two file system instances are on servers that do not share a 2026 server scope value, the client must establish a new client ID on the 2027 destination, if it does not have one already, to obtain access to its 2028 locks. Depending on the type of file system transition and 2029 facilities provided by the server, it may re-establish its connection 2030 to locking and layout state in a number of ways. 2032 In the case of migration, the servers may have transferred stateids, 2033 making it possible for the client to access his state on the new 2034 server, simply by using the existing stateid. The server may 2035 transfer all state or a subset and the client can use TEST_STATEID to 2036 determine what state has been transferred and what needs to be 2037 reclaimed or otherwise reobtained as described in Section 8.3. 2039 Lock reclaim may be used by the client for any sort of file system 2040 transition, but the server is not required to support it in any 2041 particular case. 2043 Note that in this case, lock reclaim may be attempted even when the 2044 servers involved in the transfer have different server scope values 2045 (see Section 8.4.2.1 for the contrary case of reclaim after server 2046 reboot). Servers with different server scope values may cooperate to 2047 allow reclaim for locks associated with the transfer of a file system 2048 even if they do not cooperate sufficiently to share a server scope. 2050 8.3. FS Transitions Involving Reobtaining Locking State 2052 In either case, when actual locks are not known to be maintained, the 2053 destination server may establish a grace period specific to the given 2054 file system, with non-reclaim locks being rejected for that file 2055 system, even though normal locks are being granted for other file 2056 systems. Clients should not infer the absence of a grace period for 2057 file systems being transitioned to a server from responses to 2058 requests for other file systems. 2060 In the case of lock reclamation for a given file system after a file 2061 system transition, edge conditions can arise similar to those for 2062 reclaim after server restart (although in the case of the planned 2063 state transfer associated with migration, these can be avoided by 2064 securely recording lock state as part of state migration). Unless 2065 the destination server can guarantee that locks will not be 2066 incorrectly granted, the destination server should not allow lock 2067 reclaims and should avoid establishing a grace period. 2069 Once all locks have been reclaimed, or there were no locks to 2070 reclaim, the client indicates that there are no more reclaims to be 2071 done for the file system in question by sending a RECLAIM_COMPLETE 2072 operation with the rca_one_fs parameter set to true. Once this has 2073 been done, non-reclaim locking operations may be done, and any 2074 subsequent request to do a reclaim will be rejected with the error 2075 NFS4ERR_NO_GRACE. 2077 Information about client identity may be propagated between servers 2078 in the form of a client_owner4 and associated verifiers, under the 2079 assumption that the client presents the same values to all the 2080 servers with which it deals. 2082 Servers are encouraged to provide facilities to allow locks to be 2083 reclaimed on the new server after a file system transition. Often, 2084 however, in cases in which the two servers do not share a server 2085 scope value, such facilities may not be available and the client 2086 should be prepared to re-obtain locks, even though it is possible 2087 that the client may have its LOCK or OPEN request denied due to a 2088 conflicting lock. 2090 Layouts may be reobtained when necessary even without special 2091 facilities for lock reclamation. However, the client MUST NOT depend 2092 on being able to obtain such layout since pNFS or the desired mapping 2093 type might not be supported on the new server. 2095 The consequences of having no facilities available to reclaim locks 2096 on the new server will depend on the type of environment. In some 2097 environments, such as the transition between read-only file systems, 2098 such denial of locks should not pose large difficulties in practice. 2099 When an attempt to re-establish a lock on a new server is denied, the 2100 client should treat the situation as if its original lock had been 2101 revoked. Note that when the lock is granted, the client cannot 2102 assume that no conflicting lock could have been granted in the 2103 interim. Where change attribute continuity is present, the client 2104 may check the change attribute to check for unwanted file 2105 modifications. Where even this is not available, and the file system 2106 is not read-only, a client may reasonably treat all pending locks as 2107 having been revoked. 2109 9. Security Considerations 2111 The current definitive definition of the NFSv4.0 protocol [RFC3530], 2112 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 2113 agree. The section entitled "Security Considerations" encourages 2114 that clients protect the integrity of the SECINFO operation, any 2115 GETATTR operation for the fs_locations attribute, and the operations 2116 SETCLIENTID/SETCLIENTID_CONFIRM. A migration recovery event can use 2117 any or all of these operations. We do not recommend any change here. 2119 10. IANA Considerations 2121 This document does not require actions by IANA. 2123 11. Acknowledgements 2125 The editor and authors of this document gratefully acknowledge the 2126 contributions of Trond Myklebust of NetApp and Robert Thurlow of 2127 Oracle. We also thank Tom Haynes of NetApp and Spencer Shepler of 2128 Microsoft for their guidance and suggestions. 2130 Special thanks go to members of the Oracle Solaris NFS team, 2131 especially Rick Mesta and James Wahlig, for their work implementing 2132 an NFSv4.0 migration prototype and identifying many of the issues 2133 documented here. 2135 12. References 2137 12.1. Normative References 2139 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2140 Requirement Levels", BCP 14, RFC 2119, March 1997. 2142 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 2143 Beame, C., Eisler, M., and D. Noveck, "Network File System 2144 (NFS) version 4 Protocol", RFC 3530, April 2003. 2146 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 2147 System (NFS) Version 4 Minor Version 1 Protocol", 2148 RFC 5661, January 2010. 2150 12.2. Informative References 2152 [cur-v4.0-bis] 2153 Haynes, T., Ed. and D. Noveck, Ed., "Network File System 2154 (NFS) Version 4 Protocol", 2011, . 2157 Work in progress. 2159 Authors' Addresses 2161 David Noveck (editor) 2162 EMC Corporation 2163 228 South Street 2164 Hopkinton, MA 01748 2165 US 2167 Phone: +1 508 249 5748 2168 Email: david.noveck@emc.com 2169 Piyush Shivam 2170 Oracle Corporation 2171 5300 Riata Park Ct. 2172 Austin, TX 78727 2173 US 2175 Phone: +1 512 401 1019 2176 Email: piyush.shivam@oracle.com 2178 Charles Lever 2179 Oracle Corporation 2180 1015 Granger Avenue 2181 Ann Arbor, MI 48104 2182 US 2184 Phone: +1 248 614 5091 2185 Email: chuck.lever@oracle.com 2187 Bill Baker 2188 Oracle Corporation 2189 5300 Riata Park Ct. 2190 Austin, TX 78727 2191 US 2193 Phone: +1 512 401 1081 2194 Email: bill.baker@oracle.com