idnits 2.17.1 

draft-ietf-nfsv4-migration-issues-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 7, 2012) is 4304 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530)

  ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                     D. Noveck, Ed.
3	Internet-Draft                                                       EMC
4	Intended status: Informational                                 P. Shivam
5	Expires: January 8, 2013                                        C. Lever
6	                                                                B. Baker
7	                                                                  ORACLE
8	                                                            July 7, 2012

10	 NFSv4 migration: Implementation experience and spec issues to resolve
11	                  draft-ietf-nfsv4-migration-issues-01

13	Abstract

15	   The migration feature of NFSv4 provides for moving responsibility for
16	   a single filesystem from one server to another, without disruption to
17	   clients.  Recent implementation experience has shown problems in the
18	   existing specification for this feature.  This document discusses the
19	   issues which have arisen and explores the options available for
20	   curing the issues via clarification and correction of the NFSv4.0 and
21	   NFSv4.1 specifications.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on January 8, 2013.

40	Copyright Notice

42	   Copyright (c) 2012 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	   2.  Conventions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
59	   3.  NFSv4.0 Implementation Experience  . . . . . . . . . . . . . .  5
60	     3.1.  Implementation issues  . . . . . . . . . . . . . . . . . .  5
61	       3.1.1.  Failure to free migrated state on client reboot  . . .  5
62	       3.1.2.  Server reboots resulting in a confused lease
63	               situation  . . . . . . . . . . . . . . . . . . . . . .  6
64	       3.1.3.  Client complexity issues . . . . . . . . . . . . . . .  7
65	     3.2.  Sources of Protocol difficulties . . . . . . . . . . . . .  9
66	       3.2.1.  Issues with nfs_client_id4 generation and use  . . . .  9
67	       3.2.2.  Issues with lease proliferation  . . . . . . . . . . . 11
68	   4.  Issues to be resolved in NFSv4.0 . . . . . . . . . . . . . . . 11
69	     4.1.  Possible changes to nfs_client_id4 client-string . . . . . 11
70	     4.2.  Possible changes to handle differing nfs_client_id4
71	           string values  . . . . . . . . . . . . . . . . . . . . . . 12
72	     4.3.  Other issues within migration-state sections . . . . . . . 13
73	     4.4.  Issues within other sections . . . . . . . . . . . . . . . 13
74	   5.  Proposed resolution of NFSv4.0 protocol difficulties . . . . . 14
75	     5.1.  Proposed changes: nfs_client_id4 client-string . . . . . . 14
76	     5.2.  Client-string Approaches (AS PROPOSED) . . . . . . . . . . 14
77	       5.2.1.  Non-Uniform Client-string Approach . . . . . . . . . . 16
78	       5.2.2.  Uniform Client-string Approach . . . . . . . . . . . . 16
79	       5.2.3.  Mixing Client-string Approaches  . . . . . . . . . . . 18
80	       5.2.4.  Trunking Determination using Uniform Client-strings  . 19
81	     5.3.  Proposed changes: merged (vs. synchronized) leases . . . . 24
82	     5.4.  Other proposed changes to migration-state sections . . . . 25
83	       5.4.1.  Proposed changes: Client ID migration  . . . . . . . . 25
84	       5.4.2.  Proposed changes: Callback re-establishment  . . . . . 26
85	       5.4.3.  Proposed changes: NFS4ERR_LEASE_MOVED rework . . . . . 26
86	     5.5.  Proposed changes to other sections . . . . . . . . . . . . 27
87	       5.5.1.  Proposed changes: callback update  . . . . . . . . . . 27
88	       5.5.2.  Proposed changes: clientid4 handling . . . . . . . . . 27
89	       5.5.3.  Proposed changes: NFS4ERR_CLID_INUSE . . . . . . . . . 29
90	     5.6.  Migration, Replication and State (AS PROPOSED) . . . . . . 29
91	       5.6.1.  Migration and State  . . . . . . . . . . . . . . . . . 30
92	       5.6.2.  Replication and State  . . . . . . . . . . . . . . . . 32
93	       5.6.3.  Notification of Migrated Lease . . . . . . . . . . . . 32
94	       5.6.4.  Migration and the Lease_time Attribute . . . . . . . . 35
95	   6.  Results of proposed changes for NFSv4.0  . . . . . . . . . . . 35
96	     6.1.  Results: Failure to free migrated state on client
97	           reboot . . . . . . . . . . . . . . . . . . . . . . . . . . 36
98	     6.2.  Results: Server reboots resulting in confused lease
99	           situation  . . . . . . . . . . . . . . . . . . . . . . . . 36
100	     6.3.  Results: Client complexity issues  . . . . . . . . . . . . 38
101	     6.4.  Result summary . . . . . . . . . . . . . . . . . . . . . . 39
102	   7.  Issues for NFSv4.1 . . . . . . . . . . . . . . . . . . . . . . 39
103	     7.1.  Addressing state merger in NFSv4.1 . . . . . . . . . . . . 39
104	     7.2.  Addressing pNFS relationship with migration  . . . . . . . 40
105	     7.3.  Addressing server owner changes in NFSv4.1 . . . . . . . . 40
106	   8.  Lock State and File System Transitions (AS PROPOSED) . . . . . 41
107	     8.1.  File System Transitions with Matching Server Scopes  . . . 42
108	     8.2.  File System Transitions with Non-Matching Server Scopes  . 43
109	     8.3.  FS Transitions Involving Reobtaining Locking State . . . . 44
110	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 45
111	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 45
112	   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 45
113	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 46
114	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 46
115	     12.2. Informative References . . . . . . . . . . . . . . . . . . 46
116	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 46

118	1.  Introduction

120	   This document is in the informational category, and while the facts
121	   it reports may have normative implications, any such normative
122	   significance reflects the readers' preferences.  For example, we may
123	   report that the reboot of a client with migrated state results in
124	   state not being promptly cleared and that this will prevent granting
125	   of conflicting lock requests at least for the lease time, which is a
126	   fact.  While it is to be expected that client and server implementers
127	   will judge this to be a situation that is best avoided, the judgment
128	   as to how pressing this issue should be considered is a judgment for
129	   the reader, and eventually the nfsv4 working group to make.

131	   We do explore possible ways in which such issues can be avoided, with
132	   minimal negative effects, in the expectation that the working group
133	   will choose to address these issues, but the choice of exactly how to
134	   address these is best given effect in one or more standards-track
135	   documents and/or errata.

137	   This document focuses on NFSv4.0, since that is where the majority of
138	   implementation experience has been.  Nevertheless, there is some
139	   discussion of the implications of the NFSv4.0 experience for
140	   migration in NFSv4.1.

142	2.  Conventions

144	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
145	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
146	   document are to be interpreted as described in [RFC2119].

148	   In the context of this informational document, these normative
149	   keywords will always occur in the context of a quotation, most often
150	   direct but sometimes indirect.  The context will make it clear
151	   whether the quotation is from:

153	   o  The current definitive definition of the NFSv4.0 protocol, whether
154	      that is the original NFSv4.0 specification [RFC3530], the current
155	      pending draft of RFC3530bis expected to become the definitive
156	      definition of NFSv4.0 once certain procedural steps are taken
157	      [cur-v4.0-bis], or an eventual RFC3530bis RFC, taking over the
158	      role of definitive definition of NFSv4.0 from RFC3530.

160	      As the identity of that document may change during the lifetime of
161	      this document, we will often refer to the current or pending
162	      definition of NFSv4.0 and quote from portions of the documents
163	      that are identical among all existing drafts.  Given that RFC3530
164	      and all RFC3530bis drafts agree as to the issues under discussion,
165	      this should not cause undue difficulty.  Note that to simplify
166	      document maintenance, section names rather than section numbers
167	      are used when referring to sections in existing documents so that
168	      only minimal changes will be necessary as the identity of the
169	      document defining NFSv4.0 changes.

171	   o  The current definitive definition of the NFSv4.1 protocol
172	      [RFC5661].

174	   o  A proposed or possible text to serve as a replacement for the
175	      current definitive document text.  Sometimes, a number of possible
176	      alternative texts may be listed and benefits and detriments of
177	      each examined in turn.

179	3.  NFSv4.0 Implementation Experience

181	3.1.  Implementation issues

183	   Note that the examples below reflect current experience which arises
184	   from clients implementing the recommendation to use different
185	   nfs_client_id4 id strings for different server addresses, i.e. using
186	   what is later referred to herein as the "non-uniform client-string
187	   approach"

189	   This is simply because that is the experience implementers have had.
190	   The reader should not assume that in all cases, this practice is the
191	   source of the difficulty.  It may be so in some cases but clearly it
192	   is not in all cases.

194	3.1.1.  Failure to free migrated state on client reboot

196	   The following sort of situation has proved troublesome:

198	   o  A client C establishes a clientid4 C1 with server ABC specifying
199	      an nfs_client_id4 with id string value "C-ABC" and boot verifier
200	      0x111.

202	   o  The client begins to access files in filesystem F on server ABC,
203	      resulting in generating stateids S1, S2, etc. under the lease for
204	      clientid C1.  It may also access files on other filesystems on the
205	      same server.

207	   o  The filesystem is migrated from ABC to server XYZ.  When
208	      transparent state migration is in effect, stateids S1 and S2 and
209	      clientid4 C1 are now available for use by client C at server XYZ.
210	      So far, so good.

212	   o  Client C reboots and attempts to access data on server XYZ,
213	      whether in filesystem F or another.  It does a SETCLIENTID with an
214	      nfs_client_id4 with id string value "C-XYZ" and boot verifier
215	      0x112.  There is thus no occasion to free stateids S1 and S2 since
216	      they are associated with a different client name and so lease
217	      expiration is the only way that they can be gotten rid of.

219	   Note here that while it seems clear to us in this example that C-XYZ
220	   and C-ABC are from the same client, the server has no way to
221	   determine the structure of the "opaque" id string.  In the protocol,
222	   it really is treated as opaque.  Only the client knows which
223	   nfs_client_id4 values designate the same client on a different
224	   server.

226	3.1.2.  Server reboots resulting in a confused lease situation

228	   Further problems arise from scenarios like the following.

230	   o  Client C talks to server ABC using an nfs_client_id4 id string
231	      such as "C-ABC" and a boot verifier v1.  As a result, a lease with
232	      clientid4 c.i is established: {v1, "C-ABC", c.i}.

234	   o  fs_a1 migrates from server ABC to server XYZ along with its state.
235	      Now server XYZ also has a lease: {v1, "C-ABC", c.i}.

237	   o  Server ABC reboots.

239	   o  Client C talks to server ABC using an nfs_client_id4 id string
240	      such as "C-ABC" and a boot verifier v1.  As a result, a lease with
241	      clientid4 c.j is established: {v1, "C-ABC", c.j}.

243	   o  fs_a2 migrates from server ABC to server XYZ.  Now server XYZ also
244	      has a lease: {v1, "C-ABC", c.j}.

246	   o  Now server XYZ has two leases that match {v1, "C-ABC", *}, when
247	      the protocol clearly assumes there can be only one.

249	   Note that if the client used "C" (rather than "C-ABC") as the
250	   nfs_client_id4 id string, the exact same situation would arise.

252	   One of the first cases in which this sort of situation has resulted
253	   in difficulties is in connection with doing a SETCLIENTID for
254	   callback update.

256	   The SETCLIENTID for callback update only includes the nfs_client_id4,
257	   assuming there can only be one such with a given nfs_client_id4
258	   value.  If there were multiple, confirmed client records with
259	   identical nfs_client_id4 id string values, there would be no way to
260	   map the callback update request to the correct client record.  Apart
261	   from the migration handling specified in [RFC3530], such a situation
262	   cannot arise.

264	   One possible accommodation for this particular issue that has been
265	   used is to add a RENEW operation along with SETCLIENTID (on a
266	   callback update) to disambiguate the client.

268	   When the client updates the callback info to the destination, the
269	   client would, by convention, send a compound like this:

271	   { RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb }

273	   The presence of the clientid4 in the compound would allow the server
274	   to differentiate among the various leases that it knows of, all with
275	   the same nfs_client_id4 value.

277	   While this would be a reasonable patch for an isolated protocol
278	   weakness, interoperable clients and servers would require that the
279	   protocol truly be updated to allow such a situation, specifically
280	   that of multiple clientid4's with the same nfs_client_id4 value.  The
281	   protocol is currently designed and implemented assuming this can't
282	   happen.  We need to either prevent the situation from happening, or
283	   fully adapt to the possibilities which can arise.  See Section 4 for
284	   a discussion of such issues.

286	3.1.3.  Client complexity issues

288	   Consider the following situation:

290	   o  There are a set of clients C1 through Cn accessing servers S1
291	      through Sm.  Each server manages some significant number of
292	      filesystems with the filesystem count L being significantly
293	      greater than m.

295	   o  Each client Cx will access a subset of the servers and so will
296	      have up to m clientid's, which we will call Cxy for server Sy.

298	   o  Now assume that for load-balancing or other operational reasons,
299	      numbers of filesystems are migrated among the servers.  As a
300	      result, each client-server pair will have up to m clientid's and
301	      each client will have up to m**2 clientids.  If we add the
302	      possibility of server reboot, the only bound on a client's
303	      clientid count is L.

305	   Now, instead of a clientid4 identifying a client-server pair, we have
306	   many more entities for the client to deal with.  In addition, it
307	   isn't clear how new state is to be incorporated in this structure.

309	   The limitations of the migrated state (inability to be freed on
310	   reboot) would argue against adding more such state but trying to
311	   avoid that would run into its own difficulties.  For example, a
312	   single lockowner string presented under two different clientids would
313	   appear as two different entities.

315	   Thus we have to choose between:

317	   o  indefinite prolongation of foreign clientid's even after all
318	      transferred state is gone.

320	   o  having multiple requests for the same lockowner-string-named
321	      entity carried on in parallel by separate identically named
322	      lockowners under different clientid4's

324	   o  Adding serialization at the lock-owner string level, in addition
325	      to that at the lockowner level.

327	   In any case, we have gone (in adding migration as it was described)
328	   from a situation in which

330	   o  Each client has a single clientid4/lease for each server it talks
331	      to.

333	   o  Each client has a single nfs_client_id4 for each server it talks
334	      to.

336	   o  Every state id can be mapped to an associated lease based on the
337	      server it was obtained from.

339	   To one in which

341	   o  Each client may have multiple clientid4's for a single server.

343	   o  For each stateid, the client must separately record the clientid4
344	      that it is assigned to, or it must manage separate "state blobs"
345	      for each fsid and map those to clientid4's.

347	   o  Before doing an operation that can result in a stateid, the client
348	      must either find a "state blob" based on fsid or create a new one,
349	      possibly with a new clientid4.

351	   o  There may be multiple clientid4's all connected to the same server
352	      and using the same nfs_clientid4.

354	   This sort of additional client complexity is troublesome and needs to
355	   be eliminated.

357	3.2.  Sources of Protocol difficulties

359	3.2.1.  Issues with nfs_client_id4 generation and use

361	   The current definitive definition of the NFSv4.0 protocol [RFC3530],
362	   and the current pending draft of RFC3530bis [cur-v4.0-bis] both
363	   agree.  The section entitled "Client ID" says:

365	      The second field, id is a variable length string that uniquely
366	      defines the client.

368	   There are two possible interpretations of the phrase "uniquely
369	   defines" in the above:

371	   o  The relation between strings and clients is a function from such
372	      strings to clients so that each string designates a single client.

374	   o  The relation between strings and clients is a bijection between
375	      such strings and clients so that each string designates a single
376	      client and each client is named by a single string.

378	   The first interpretation would make these client-strings like phone
379	   numbers (a single person can have several) while the second would
380	   make them like social security numbers.

382	   Endless debate about the true meaning of "uniquely defines" in this
383	   context is quite possible but not very helpful.  The following points
384	   should be noted though:

386	   o  The second interpretation is more consistent with the way
387	      "uniquely defines" is used elsewhere in the spec.

389	   o  The spec as now written intends the first interpretation (or is
390	      internally inconsistent).  In fact, it recommends, although it
391	      doesn't "RECOMMEND" that a single client have at least as many
392	      client-strings as server addresses that it interacts with.  It
393	      says, in the third bullet point regarding construction of the
394	      string (which we shall henceforth refer to as client-string-BP3):

396	         The string should be different for each server network address
397	         that the client accesses, rather than common to all server
398	         network addresses.

400	   o  If internode interactions are limited to those between a client
401	      and its servers, there is no occasion for servers to be concerned
402	      with the question of whether two client-strings designate the same
403	      client, so that there is no occasion for the difference in
404	      interpretation to matter.

406	   o  When transparent migration of client state occurs between two
407	      servers, it becomes important to determine when state on two
408	      different servers is for the same client or not, and this
409	      distinction becomes very important.

411	   Given the need for the server to be aware of client identity with
412	   regard to migrated state, either client-string construction rules
413	   will have to change or there will be a need to get around current
414	   issues, or perhaps a combination of these two will be required.
415	   Later sections will examine the options and propose a solution.

417	   One consideration that may indicate that this cannot remain exactly
418	   as it is today has to do with the fact that the current explanation
419	   for this behavior is not correct.  The current definitive definition
420	   of the NFSv4.0 protocol [RFC3530], and the current pending draft of
421	   RFC3530bis [cur-v4.0-bis] both agree.  The section entitled "Client
422	   ID" says:

424	      The reason is that it may not be possible for the client to tell
425	      if the same server is listening on multiple network addresses.  If
426	      the client issues SETCLIENTID with the same id string to each
427	      network address of such a server, the server will think it is the
428	      same client, and each successive SETCLIENTID will cause the server
429	      to begin the process of removing the client's previous leased
430	      state.

432	   In point of fact, a "SETCLIENTID with the same id string" sent to
433	   multiple network addresses will be treated as all from the same
434	   client but will not "cause the server to begin the process of
435	   removing the client's previous leased state" unless the server
436	   believes it is a different instance of the same client, i.e. if the
437	   id string is the same and there is a different boot verifier.  If the
438	   client does not reboot, the verifier should not change.  If it does
439	   reboot, the verifier will change, and the server should "begin the
440	   process of removing the client's previous leased state.

442	   The situation of multiple SETCLIENTID requests received by a server
443	   on multiple network addresses is exactly the same, from the protocol
444	   design point of view, as when multiple (i.e. duplicate) SETCLIENTID
445	   requests are received by the server on a single network address.  The
446	   same protocol mechanisms that prevent erroneous state deletion in the
447	   latter case prevent it in the former case.  There is no reason for
448	   special handling of the multiple-network-appearance case, in this
449	   regard.

451	3.2.2.  Issues with lease proliferation

453	   It is often felt that this is a consequence of the client-string
454	   construction issues, and it is certainly the case that the two are
455	   closely connected in that non-uniform client-strings make it
456	   impossible for the server to appropriately combine leases from the
457	   same client.  See Section 5.2.1 for a discussion of non-uniform
458	   client-strings.

460	   However, even where the server could combine leases from the same
461	   client, it needs to be clear how and when it will do so, so that the
462	   client will be prepared.  These issues will have to be addressed at
463	   various places in the spec.

465	   This could be enough only if we are prepared to do away with the
466	   "should" recommending non-uniform client-strings and replace it with
467	   a "should not" or even a "SHOULD NOT".  Current client implementation
468	   patterns make this an unpalatable choice for use as a general
469	   solution, but it is reasonable to "RECOMMEND" this choice for a well-
470	   defined subset of clients.  One alternative would be to create a way
471	   for the server to infer from client behavior which leases are held by
472	   the same client and use this information to do appropriate lease
473	   mergers.  Prototyping and detailed specification work has shown that
474	   this could be done but the resulting complexity is such that a better
475	   choice is to "RECOMMEND" use of the uniform approach for clients
476	   supporting the migration feature.

478	   Because of the discussion of client-string construction in [RFC3530],
479	   most existing clients implement the non-uniform client-string
480	   approach.  As a result, existing servers may not have been tested
481	   with clients implementing uniform client-strings.  As a consequence,
482	   care must be taken to preserve interoperability between UCS-capable
483	   clients and servers that don't tolerate uniform client strings for
484	   one reason or another.  See Section 5.2.3 for details.

486	4.  Issues to be resolved in NFSv4.0

488	4.1.  Possible changes to nfs_client_id4 client-string

490	   The fact that the reason given in client-string-BP3 is not valid
491	   makes the existing "should" insupportable.  We can't either

493	   o  Keep a reason we know is invalid.

495	   o  Keep saying "should" without giving a reason.

497	   What are often presented as reasons that motivate use of the non-
498	   uniform approach always turn out to be cases in which, if the uniform
499	   approach were used, the server will treat a client which accesses
500	   that server via two different IP addresses as part of a single
501	   client, as it in fact is.  This may be disconcerting to a client
502	   unaware that the two IP addresses connect to the same server.  This
503	   is thus not a reason to use the non-uniform approach but rather an
504	   illustration of the fact that those using the uniform approach must
505	   use server behavior to determine whether any trunking of IP addresses
506	   exists, as is described in Section 5.2.2.

508	   It is always possible that a valid new reason will be found, but so
509	   far none has been proposed.  Given the history, the burden of proof
510	   should be on those asserting the validity of a proposed new reason.

512	   So we will assume for now that the "should" will have to go.  The
513	   question is what to replace it with.

515	   o  We can't say "MUST NOT", despite the problems this raises for
516	      migration since this is pretty late in the day for such a change.
517	      Many currently operating clients obey the existing "should".
518	      Similar considerations would apply for "SHOULD NOT" or "should
519	      not".

521	   o  Dropping client-string-BP3 entirely is a possibility but, given
522	      the context and history, it would just be a confusing version of
523	      "SHOULD NOT".

525	   o  Using "MAY" would clearly specify that both ways of doing this are
526	      valid choices for clients and that servers will have to deal with
527	      clients that make either choice.

529	   o  This might be modified by a "SHOULD" (or even a "MUST") for
530	      particular groups of clients.

532	   o  There will have to be some text explaining why a client might make
533	      either choice but, except for the particular cases referred to
534	      above, we will have to make sure that it is truly descriptive, and
535	      not slanted in either direction.

537	4.2.  Possible changes to handle differing nfs_client_id4 string values

539	   Given the difficulties caused by having different nfs_client_id4
540	   client-string values for the same client, we have two choices:

542	   o  Deprecate the existing treatment and basically say the client is
543	      on its own doing migration, if it follows it.

545	   o  Introduce a way of having the client provide client identity
546	      information to the server, if it can be done compatibly while
547	      staying within the bounds of v4.0.

549	4.3.  Other issues within migration-state sections

551	   There are a number of issues where the existing text is unclear
552	   and/or wrong and needs to be fixed in some way.

554	   o  Lack of clarity in the discussion of moving clientids (as well as
555	      stateids) as part of moving state for migration.

557	   o  The discussion of synchronized leases is wrong in that there is no
558	      way to determine (in the current spec) when leases are for the
559	      same client and also wrong in suggesting a benefit from leases
560	      synchronized at the point of transfer.  What is needed is merger
561	      of leases, which is necessary to keep client complexity
562	      requirements from getting out of hand.

564	   o  Lack of clarity in the discussion of LEASE_MOVED handling,
565	      including failure to fully address situations in which transparent
566	      state migration did not occur.

568	4.4.  Issues within other sections

570	   There are a number of cases in which certain sections, not
571	   specifically related to migration, require additional clarification.
572	   This is generally because text that is clear in a context in which
573	   leases and clientids are created in one place and live there forever
574	   may need further refinement in the more dynamic environment that
575	   arises as part of migration.

577	   Some examples:

579	   o  Some people are under the impression that updating callback
580	      endpoint information for an existing client, as used during
581	      migration, may cause the destination server to free existing
582	      state.  There need to be additions to clarify the situation.

584	   o  The handling of the sets of clientid4's maintained by each server
585	      needs to be clarified.  In particular, the issue of how the client
586	      adapts to the presumably independent and uncoordinated clientid4
587	      sets needs to be clearly addressed

589	   o  Statements regarding handling of invalid clientid4's need to be
590	      clarified and/or refined in light of the possibilities that arise
591	      due to lease motion and merger.

593	   o  Confusion and lack of clarity about NFS4ERR_CLID_INUSE.

595	5.  Proposed resolution of NFSv4.0 protocol difficulties

597	5.1.  Proposed changes: nfs_client_id4 client-string

599	   We propose replacing client-string-BP3 with the following text and
600	   adding the following proposed Section 5.2 to provide implementation
601	   guidance.

603	   o  The string MAY be different for each server network address that
604	      the client accesses, rather than common to all server network
605	      addresses.

607	   o  The considerations that might influence a client to use different
608	      strings for different network server addresses are explained in
609	      Section 5.2.

611	   o  Despite the use of the word "string" for this identifier, and the
612	      fact that using strings will often be convenient, it should be
613	      understood that the protocol defines this as opaque data.  In
614	      particular, those receiving such an id should not assume that it
615	      will be in UTF-8 format.  Servers MUST NOT reject an
616	      nfs_client_id4 simply because the id string is not in UTF-8
617	      format.

619	5.2.  Client-string Approaches (AS PROPOSED)

621	   One particular aspect of the construction of the nfs4_client_id4
622	   string has proved recurrently troublesome.  The client has a choice
623	   of:

625	   o  Presenting the same id string to multiple server addresses.  This
626	      is referred to as the "uniform client-string approach" and is
627	      discussed in Section 5.2.2.

629	   o  Presenting different id strings to multiple server addresses.
630	      This is referred to as the "non-uniform client-string approach"
631	      and is discussed in Section 5.2.1.

633	   Note that implementation considerations, including compatibility with
634	   existing servers, may make it desirable for a client to use both
635	   approaches, based on configuration information, such as mount
636	   options.  This issue will be discussed in Section 5.2.3.

638	   Construction of the client-string has been a troublesome issue
639	   because of the way in which the NFS protocols have evolved.

641	   o  NFSv3 as a stateless protocol had no need to identify the state
642	      shared by a particular client-server pair.  Thus there was no
643	      occasion to consider the question of whether a set of requests
644	      come from the same client, or whether two server IP addresses are
645	      connected to the same server.  As the environment was one in which
646	      the user supplied the target server IP address as part of
647	      incorporating the remote filesystem in the client's file name
648	      space, there was no occasion to take note of server trunking.
649	      Within a stateless protocol, the situation was symmetrical.  The
650	      client has no server identity information and the server has no
651	      client identity information.

653	   o  NFSv4.1 is a stateful protocol with full support for client and
654	      server identity determination.  This enables the server to be
655	      aware when two requests come from the same client (they are on
656	      sessions sharing a clientid4) and the client to be aware when two
657	      server IP addresses are connected to the same server (they return
658	      the same server name in responding to an EXCHANGE_ID).

660	   NFSv4.0 is unfortunately halfway between these two.  The two client-
661	   string approaches have arisen in attempts to deal with the changing
662	   requirements of the protocol as implementation has proceeded and
663	   features that were not very substantial in [RFC3530], got more
664	   substantial.

666	   o  In the absence of any implementation of the fs_locations-related
667	      features (replication, referral, and migration), the situation is
668	      very similar to that of NFSv3, with the addition of state but with
669	      no concern to provide accurate client and server identity
670	      determination.  This is the situation that gave rise to the non-
671	      uniform client-string approach.

673	   o  In the presence of replication and referrals, the client may have
674	      occasion to take advantage of knowledge of server trunking
675	      information.  Even more important, migration, by transferring
676	      state among servers, causes difficulties for the non-uniform
677	      client-string approach, in that the two different client-strings
678	      sent to different IP addresses may wind up on the same IP address,
679	      adding confusion.

681	   o  A further consideration is that client implementations typically
682	      provide NFSv4.1 by augmenting their existing NFSv4.0
683	      implementation, not by providing two separate implementations.
684	      Thus the more NFSv4.0 and NFSv4.1 can work alike, the less complex
685	      are clients.  This is a key reason why those implementing NFSv4.0
686	      clients might prefer using the uniform client string model, even
687	      if they have chosen not to provide fs_locations-related features
688	      in their NFSv4.0 client.

690	   Both approaches have to deal with the asymmetry in client and server
691	   identity information between client and server.  Each seeks to make
692	   the client's and the server's views match.  In the process, each
693	   encounters some combination of inelegant protocol features and/or
694	   implementation difficulties.  The choice of which to use is up to the
695	   client implementer and the sections below try to give some useful
696	   guidance.

698	5.2.1.  Non-Uniform Client-string Approach

700	   The non-uniform client-string approach is an attempt to handle these
701	   matters in NFSv4.0 client implementations in as NFSv3-like a way as
702	   possible.

704	   For a client using the non-uniform approach, all internal recording
705	   of clientid4 values is to include, whether explicitly or implicitly,
706	   the server IP address so that one always has an (IP-address,
707	   clientid4) pair.  Two such pairs from different servers are always
708	   distinct even when the clientid4 values are the same, as they may
709	   occasionally be.  In this approach, such equality is always treated
710	   as simple happenstance.

712	   Making the client-string different on different servers means that a
713	   server has no way of tying together information from the same client
714	   and so will treat a single client as multiple clients with multiple
715	   leases for each server network address.  Since there is no way in the
716	   protocol for the client to determine if two network addresses are
717	   connected to the same server, the resulting lack of knowledge is
718	   symmetrical and can result in simpler client implementations in which
719	   there is a single clientid/lease per server network addresses.

721	   Support for migration, particularly with transparent state migration,
722	   is more complex in the case of non-uniform client-strings.  For
723	   example, migration of a lease can result in multiple leases for the
724	   same client accessing the same server addresses, vitiating many of
725	   the advantages of this approach.  Therefore, client implementations
726	   that support migration with transparent state migration SHOULD NOT
727	   use the non-uniform client-string approach, except where it is
728	   necessary for compatibility with existing server implementations (For
729	   details of arranging use of multiple client-string approaches, see
730	   Section 5.2.3).

732	5.2.2.  Uniform Client-string Approach

734	   When the client-string is kept uniform, the server has the basis to
735	   have a single clientid4/lease for each distinct client.  The problem
736	   that has to be addressed is the lack of explicit server identity
737	   information, which is made available in NFSv4.1.

739	   When the same client-string is given to multiple IP addresses, the
740	   client can determine whether two IP addresses correspond to a single
741	   server, based on the server's behavior.  This is the inverse of the
742	   strategy adopted for the non-uniform approach in which different
743	   server IP addresses are told about different clients, simply to
744	   prevent a server from manifesting behavior that is inconsistent with
745	   there being a single server for each IP address, in line with the
746	   traditions of NFS.  So, to compare:

748	   o  In the non-uniform approach, servers are told about different
749	      clients because, if the server were to use accurate information as
750	      to client identity, two IP addresses on the same server would
751	      behave as if they were talking to the same client, which might
752	      prove disconcerting to a client not expecting such behavior.

754	   o  In the uniform approach, the servers are told about there being a
755	      single client, which is, after all, the truth.  Then, when the
756	      server uses this information, two IP addresses on the same server
757	      will behave as if they are talking to the same client, and this
758	      difference in behavior allows the client to infer the server IP
759	      address trunking configuration, even though NFSv4.0 does not
760	      explicitly provide this information.

762	      The approach given in the section below shows one example of how
763	      this might be done.

765	   The uniform client-string approach makes it necessary to exercise
766	   more care in the definition of the nfs_client_id4 boot verifier:

768	   o  In [RFC3530], the client is told to change the boot verifier when
769	      reboot occurs, but there is no explicit statement as to the
770	      converse, so that any requirement to keep the verifier constant
771	      unless rebooting is only present by implication.

773	   o  Many existing clients change the boot verifier every time they
774	      destroy and recreate the data structure that tracks an <IP-
775	      address, clientid4> pair.  This might happen if the last mount of
776	      a particular server is removed, and then a fresh mount is created.
777	      And, note that this might result in each <IP-address, clientid4>
778	      pair having its own boot verifier that is independent of the
779	      others.

781	   o  Within the uniform client-string approach, an nfs_client_id4
782	      designates a globally known client instance, so that the boot
783	      verifier should change if and only if a new client instance is
784	      created, typically as a result of a reboot.

786	   The following are advantages for the implementation of using the
787	   uniform client-string approach:

789	   o  Clients can take advantage of server trunking (and clustering with
790	      single-server-equivalent semantics) to increase bandwidth or
791	      reliability.

793	   o  There are advantages in state management so that, for example, we
794	      never have a delegation under one clientid revoked because of a
795	      reference to the same file from the same client under a different
796	      clientid.

798	   o  The uniform client-string approach allows the server to do any
799	      necessary automatic lease merger in connection with migration,
800	      without requiring any client involvement.  This consideration is
801	      of sufficient weight to cause us to RECOMMEND use of the uniform
802	      client-string approach for clients supporting transparent state
803	      migration.

805	   The following implementation considerations might cause issues for
806	   client implementations.

808	   o  This approach is considerably different from the non-uniform
809	      approach, which most client implementations have been following.
810	      Until substantial implementation experience is obtained with this
811	      approach, reluctance to embrace something so new is to be
812	      expected.

814	   o  Mapping between server network addresses and leases is more
815	      complicated in that it is no longer a one-to-one mapping.

817	   How to balance these considerations depends on implementation goals.

819	5.2.3.  Mixing Client-string Approaches

821	   As noted above, a client which needs to use the uniform client-string
822	   approach (e.g. to support migration), may also need to support
823	   existing servers with implementations that do not work properly in
824	   this case.

826	   Some examples of such server issues include:

828	   o  Some existing NFSv4 server implementations of IP-address failover
829	      depend on clients' use of a non-uniform client-string approach.
830	      In particular, when a server supports both its own IP address and
831	      one failed over from a partner server, it may have separate sets
832	      of state applicable to the two IP addresses, owned by different
833	      servers but residing on a single one.

835	      In this situation, some servers have relied on clients' use of the
836	      non-uniform client-string approach, as suggested but not mandated
837	      by [RFC3530], to keep these sets of state separate, and will have
838	      problems in handling clients using the uniform client-string
839	      approach, in that such clients will see changes in trunking
840	      relationships whenever server failover and giveback occur.

842	   o  Some existing servers incorrectly return NFS4ERR_CLID_INUSE in a
843	      way which interferes with clients using the uniform client-string
844	      approach.  See Section 5.5.3 for details.

846	   In order to support such servers, the client can use different
847	   approaches for different mounts, as long as:

849	   o  The uniform client-string approach is used when accessing servers
850	      that may return NFS4ERR_MOVED.

852	   o  The non-uniform client-string approach is used when accessing
853	      servers whose implementations make them incompatible with the
854	      uniform client-string approach

856	   One effective way for clients to handle this is to support the
857	   uniform client-string approach as the default, but allow a mount
858	   option to specify use of the non-uniform client-string approach for
859	   particular mount points, as long as such mount points are not used
860	   when migration is to be supported.

862	   In the case in which the same server has multiple mounts, and both
863	   approaches are specified for the same server, the client could have
864	   multiple clientids corresponding to the same server, one for each
865	   approach and would then have to keep these separate.

867	5.2.4.  Trunking Determination using Uniform Client-strings

869	   This section provides an example of how trunking determination could
870	   be done by a client following the uniform client-string approach
871	   (whether this is used for all mounts or not).  Clients need not
872	   follow this procedure but implementers should make sure that the
873	   issues dealt with by this procedure are all properly addressed.

875	   We need to clarify the various possible purposes of trunking
876	   determination and the corresponding requirements as to server
877	   behavior.  The following points should be noted:

879	   o  The primary purpose of the trunking determination algorithm is to
880	      make sure that, if the server treats client requests on two IP
881	      addresses as part of the same client, the client will not be
882	      blind-sided and encounter disconcerting server behavior, as
883	      mentioned in Section 5.2.2.  Such behavior could occur if the
884	      client were unaware that all of its client requests for the two IP
885	      addresses were being handled as part of a single client talking to
886	      a single server.

888	   o  A second purpose to be able to use knowledge of trunking
889	      relationships for better performance, etc

891	   o  If a server were to give out distinct clientid's in response to
892	      receiving the same nfs_client_id4 on different network addresses,
893	      and acted as if these were separate clients, the primary purpose
894	      of trunking determination would be met, as long as the server did
895	      not treat them as part of the same client.  In this case, the
896	      server would be acting, with regard to that client, as if it were
897	      two distinct servers.  This would interfere with the secondary
898	      purpose of trunking determination but there is nothing the client
899	      can do about that.

901	   o  Suppose a server were to give such a client two different
902	      clientid's but act as if they were one.  That it is the only way
903	      that the server could behave in a way that would defeat the
904	      primary purpose of the trunking determination algorithm.

906	      Servers MUST NOT do that.

908	   For a client using the uniform approach, clientid4 values are treated
909	   as important information in determining server trunking patterns.
910	   For two different IP addresses to return the same clientid4 value is
911	   a necessary, though not a sufficient condition for them to be
912	   considered as connected to the same server.  As a result, when two
913	   different IP addresses return the same clientid4, the client needs to
914	   determine, using the procedure given below or otherwise, whether the
915	   IP addresses are connected to the same server.  For such clients, all
916	   internal recording of clientid4 values needs to include, whether
917	   explicitly or implicitly, identification of the server from which the
918	   clientid4 was received so that one always has a (server, clientid4)
919	   pair.  Two such pairs from different servers are always considered
920	   distinct even when the clientid4 values are the same, as they may
921	   occasionally be.

923	   In order to make this approach work, the client must have accessible,
924	   for each nfs4_client_id4 used by the uniform approach (only one in
925	   general) a list of all server IP addresses, together with the
926	   associated clientid4 values and authentication flavors.  As a part of
927	   the associated data structures, there should be the ability to mark a
928	   server IP structure as having the same server as another and to mark
929	   an IP-address as currently unresolved.  One way to do this is to a
930	   allow each such entry to point to another with the pointer value
931	   being one of:

933	   o  A pointer to another entry for an IP address associated with the
934	      same server, where that IP address is the first one referenced to
935	      access that server.

937	   o  A pointer to the current entry if there is no earlier IP address
938	      associated with the same server, i.e. where the current IP address
939	      is the first one referenced to access that server.  We'll refer to
940	      such an IP address as the lead IP address for a given server.

942	   o  The value NULL if the address's server identity is currently
943	      unresolved.

945	   In order to keep the above information current, in the interests of
946	   the most effective trunking determination, RENEWs should be
947	   periodically done on each server.  However, even if this is not done,
948	   the primary purpose of the trunking determination algorithm, to
949	   prevent confusion due to trunking hidden from the client, will be
950	   achieved.

952	   Given this apparatus, when a SETCLIENTID is done and a clientid4
953	   returned, the data structure can be searched for a matching clientid4
954	   and if such is found, further processing can be done to determine
955	   whether the clientid4 match is accidental, or the result of trunking.

957	   In this algorithm, when SETCLIENTID is done it will use the common
958	   nfs_client_id4 and specify the current target IP address as part of
959	   the callback parameters.  We call the clientid4 and SETCLIENTID
960	   verifier returned by this operation XC and XV.

962	   Note that when the client has done previous SETCLIENTID's, to any IP
963	   addresses, with more than one authentication flavor, we have the
964	   possibility of receiving NFS4ERR_CLID_INUSE, since we do not yet know
965	   which of our connections with existing IP addresses might be trunked
966	   with our current one.  In the event that the SETCLIENID fails with
967	   NFS4ERR_CLID_INUSE, one must try all other authentication flavors
968	   currently in use and eventually one will be correct and not return
969	   NFS4ERR_CLID_INUSE.

971	   Note that at this point, no SETCLIENTID_CONFIRM has yet been done.
972	   This is because our SETCLIENTID has either established a new
973	   clientid4 on a previously unknown server or changed the callback
974	   parameters on a clientid4 associated with some already known server.
975	   Given that we don't want to confirm something that we are not sure we
976	   want to happen, what is to be done next depends on information about
977	   existing clientid4's.

979	   o  If no matching clientid4 is found, the IP address X and clientid4
980	      XC are added to the list and considered as having no existing
981	      known IP addresses trunked with it.  The IP address is marked as a
982	      lead IP address for a new server.  A SETCLIENTID_CONFIRM is done
983	      using XC and XV.

985	   o  If a matching clientid4 is found which is marked unresolved,
986	      processing on the new IP address is suspended.  In order to
987	      simplify processing, there can only be one unresolved IP address
988	      for any given clientid4.

990	   o  If one or more matching clientid4's is found, none of which is
991	      marked unresolved, the new IP address in entered and marked
992	      unresolved.  After applying the steps below to each of the lead IP
993	      addresses with a matching clientid4, the address will have been
994	      resolved: either it will be part of the same server as a new IP
995	      address to be added to an existing set of IP addresses for a
996	      server, or it will be recognized as a new server.  At the point at
997	      which this determination is made, the unresolved indication is
998	      cleared and any suspended SETCLIENTID processing is restarted

1000	   So for each lead IP address IPn with a clientid4 matching XC, the
1001	   following steps are done.

1003	   o  If the authentication flavor for IPn does not match that for X,
1004	      the IP address is skipped, since it is impossible or IPn and X to
1005	      be trunked in these circumstances.  This avoids any possibility
1006	      that NFS4ERR_CLID_INUSE will be returned for the SETCLIENTID and
1007	      SETCLIENID_CONFIRM to be done below, as long as the server(s) at
1008	      IP addresses IPn and X are correctly implemented.

1010	   o  A SETCLIENTID is done to update the callback parameters to reflect
1011	      the possibility that X will be marked as associated with the
1012	      server whose lead IP address is IPn.  The specific callback
1013	      parameters chosen, in terms of cb_client4 and callback_ident, are
1014	      up to the client and should reflect its preferences as to callback
1015	      handling for the common clientid, in the event that X and IPn are
1016	      trunked together.  So assume that we do that SETCLIENTID on IP
1017	      address IPn and get back a setclientid_confirm value (in the form
1018	      of a verifier4) SCn.

1020	      Note that the v4.0 spec requires the server to make sure that such
1021	      value are very unlikely to be regenerated.  Given that it is
1022	      already highly unlikely that the clientid XC is duplicated by
1023	      distinct servers, the probability that Sc is duplicated as well
1024	      has to be considered vanishingly small.  Note also that the
1025	      callback update procedure can be repeated multiple times to reduce
1026	      the probability of spurious matches further.

1028	   o  Note that we don't want this to happen if address X is not
1029	      associated with this server.  So we do a SETCLIENTID_CONFIRM on
1030	      address X using the setclientid_confirm value SCn.

1032	   o  If the setclientid_confirm value generated on X is accepted on
1033	      IPn, then X and IPn are recognized as connected to the same server
1034	      and the entry for X is marked as associated with IPn.  The entry
1035	      is now resolved and processing can be restarted for IP addresses
1036	      whose clientid4 matched XC but whose resolution had been deferred.

1038	   o  If the confirm value generated on IPn is not accepted on X, then X
1039	      and IPn are distinct and the callback update will not be
1040	      confirmed.  So we go on to the next IPn, until we run out of them.
1041	      If it happens that we run out of potential matches, then we can
1042	      treat X as connected to a distinct server and then update and
1043	      confirm its callback parameters on that basis.

1045	   Note here that we may set a number of possible values for the
1046	   callback parameters to be used for XC, one for the possibility that X
1047	   is untrunked, and others for each potential match with an existing
1048	   IPn.  Although there are multiple such updates at most one will be
1049	   confirmed and, if X is untrunked, its original callback parameters
1050	   will be put in effect by its SETCLIENID_CONFIRM.

1052	   The procedure above has made no explicit mention of the possibility
1053	   that server reboot can occur at any time.  To address this
1054	   possibility the client should periodically use the clientid4 XC in
1055	   RENEW operations, directed to both the IP address X and the current
1056	   lead IP address that is currently being tested for identity.

1058	   o  When XC becomes invalid on X, the resolution process should be
1059	      terminated, subject to being redone later.  Before redoing the
1060	      resolution, XC should be checked on all the lead IP addresses on
1061	      which it was valid.  Once a new clientid4 is established on any
1062	      servers on which XC became invalid, a new clientid4 can be
1063	      established on X and the resolution process for X can be
1064	      restarted.

1066	   o  When XC does not becomes invalid on X, but becomes invalid on the
1067	      current IPn being tested, it should be concluded that X and IPn do
1068	      not match and that it is time to advance to the next IPn, if any.

1070	   o  In the event of a reboot detected on any server lead IP, the set
1071	      of IP addresses associated with the server should not change and
1072	      state should be re-established for the lease as a whole, using all
1073	      available connected server IP addresses.  It is prudent to verify
1074	      connectivity by doing a RENEW using the new clientid4 on each such
1075	      server address before using it, however.

1077	   If we have run out of IPn's without finding a matching server, X is
1078	   considered as having no existing known IP addresses trunked with it.
1079	   The IP address is marked as a lead IP address for a new server.  A
1080	   SETCLIENTID_CONFIRM is done using XC and XV.

1082	5.3.  Proposed changes: merged (vs. synchronized) leases

1084	   The current definitive definition of the NFSv4.0 protocol [RFC3530],
1085	   and the current pending draft of RFC3530bis [cur-v4.0-bis] both
1086	   agree.  The section entitled "Migration and State" says:

1088	      As part of the transfer of information between servers, leases
1089	      would be transferred as well.  The leases being transferred to the
1090	      new server will typically have a different expiration time from
1091	      those for the same client, previously on the old server.  To
1092	      maintain the property that all leases on a given server for a
1093	      given client expire at the same time, the server should advance
1094	      the expiration time to the later of the leases being transferred
1095	      or the leases already present.  This allows the client to maintain
1096	      lease renewal of both classes without special effort:

1098	   There are a number of problems with this and any resolution of our
1099	   difficulties must address them somehow.

1101	   o  The current v4.0 spec recommends that the client make it
1102	      essentially impossible to determine when two leases are from "the
1103	      same client".

1105	   o  It is not appropriate to speak of "maintain[ing] the property that
1106	      all leases on a given server for a given client expire at the same
1107	      time", since this is not a property that holds even in the absence
1108	      of migration.  A server listening on multiple network addresses
1109	      may have the same client appear as multiple clients with no way to
1110	      recognize the client as the same.

1112	   o  Even if the client identity issue could be resolved, advancing the
1113	      lease time at the point of migration would not maintain the
1114	      desired synchronization property.  The leases would be
1115	      synchronized until one of them was renewed, after which they would
1116	      be unsynchronized again.

1118	   To avoid client complexity, we need to have no more than one lease
1119	   between a single client and a single server.  This requires merger of
1120	   leases since there is no real help from synchronizing them at a
1121	   single instant.

1123	   For the uniform approach, the destination server would simply merge
1124	   leases as part of state transfer, since two leases with the same
1125	   nfs_client_id4 values must be for the same client.

1127	   We have made the following decisions as far as proposed normative
1128	   statements regarding for state merger.  They reflect the facts that
1129	   we want to support fully migration support in the simplest way
1130	   possible and that we can't say MUST since we have older clients and
1131	   servers to deal with.

1133	   o  Clients SHOULD use the uniform client-string approach in order to
1134	      get good migration support.

1136	   o  Servers SHOULD provide automatic lease merger during state
1137	      migration so that clients using the uniform id approach get the
1138	      support automatically.

1140	   If the clients and the servers obey the SHOULD's, having more than a
1141	   single lease for a given client-server pair will be a transient
1142	   situation, cleaned up as part of adapting to use of migrated state.

1144	   Since clients and servers will be a mixture of old and new and
1145	   because nothing is a MUST we have to ensure that no combination will
1146	   show worse behavior than is exhibited by current (i.e. old) clients
1147	   and servers.

1149	5.4.  Other proposed changes to migration-state sections

1151	5.4.1.  Proposed changes: Client ID migration

1153	   The current definitive definition of the NFSv4.0 protocol [RFC3530],
1154	   and the current pending draft of RFC3530bis [cur-v4.0-bis] both
1155	   agree.  The section entitled "Migration and State" says:

1157	      In the case of migration, the servers involved in the migration of
1158	      a filesystem SHOULD transfer all server state from the original to
1159	      the new server.  This must be done in a way that is transparent to
1160	      the client.  This state transfer will ease the client's transition
1161	      when a filesystem migration occurs.  If the servers are successful
1162	      in transferring all state, the client will continue to use
1163	      stateids assigned by the original server.  Therefore the new
1164	      server must recognize these stateids as valid.  This holds true
1165	      for the client ID as well.  Since responsibility for an entire
1166	      filesystem is transferred with a migration event, there is no
1167	      possibility that conflicts will arise on the new server as a
1168	      result of the transfer of locks.

1170	   This poses some difficulties, mostly because the part about "client
1171	   ID" is not clear:

1173	   o  It isn't clear what part of the paragraph the "this" in the
1174	      statement "this holds true ..." is meant to signify.

1176	   o  The phrase "the client ID" is ambiguous, possibly indicating the
1177	      clientid4 and possibly indicating the nfs_client_id4.

1179	   o  If the text means to suggest that the same clientid4 must be used,
1180	      the logic is not clear since the issue is not the same as for
1181	      stateids of which there might be many.  Adapting to the change of
1182	      a single clientid, as might happen as a part of lease migration,
1183	      is relatively easy for the client.

1185	   We have decided to address this issue as follows, with the relevant
1186	   changes all reflected in Section 5.6.

1188	   o  Make it clear that both clientid4 and nfs_client_id4 (including
1189	      both id string and boot verifier) are to be transferred.

1191	   o  Indicate that the initial transfer will result in the same
1192	      clientid4 after transfer but this is not guaranteed since there
1193	      may conflict with an existing clientid4 on the destination server
1194	      and because lease merger can result in a change of the clientid4.

1196	5.4.2.  Proposed changes: Callback re-establishment

1198	   The current definitive definition of the NFSv4.0 protocol [RFC3530],
1199	   and the current pending draft of RFC3530bis [cur-v4.0-bis] both
1200	   agree.  The section entitled "Migration and State" says:

1202	      A client SHOULD re-establish new callback information with the new
1203	      server as soon as possible, according to sequences described in
1204	      sections "Operation 35: SETCLIENTID - Negotiate Client ID" and
1205	      "Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID".  This
1206	      ensures that server operations are not blocked by the inability to
1207	      recall delegations.

1209	   The above will need to be fixed to reflect the possibility of merging
1210	   of leases and the text to do this appears as part of Section 5.6.

1212	5.4.3.  Proposed changes: NFS4ERR_LEASE_MOVED rework

1214	   The current definitive definition of the NFSv4.0 protocol [RFC3530],
1215	   and the current pending draft of RFC3530bis [cur-v4.0-bis] both
1216	   agree.  The section entitled "Notification of Migrated Lease" says:

1218	      Upon receiving the NFS4ERR_LEASE_MOVED error, a client that
1219	      supports filesystem migration MUST probe all filesystems from that
1220	      server on which it holds open state.  Once the client has
1221	      successfully probed all those filesystems which are migrated, the
1222	      server MUST resume normal handling of stateful requests from that
1223	      client.

1225	   There is a lack of clarity that is prompted by ambiguity about what
1226	   exactly probing is and what the interlock between client and server
1227	   must be.  This has led to some worry about the scalability of the
1228	   probing process, and although the time required does scale linearly
1229	   with the number of fs's that the client may have state for with
1230	   respect to a given server, the actual process can be done
1231	   efficiently.

1233	   To address these issues we propose replacing the above with the text
1234	   addressing NFS4RR_LEASE_MOVED as given in Section 5.6.3.

1236	5.5.  Proposed changes to other sections

1238	5.5.1.  Proposed changes: callback update

1240	   Some changes are necessary to reduce confusion about the process of
1241	   callback information update and in particular to make it clear that
1242	   no state is freed as a result:

1244	   o  Make it clear that after migration there are confirmed entries for
1245	      transferred clientid4/nfs_client_id4 pairs.

1247	   o  Be explicit in the sections headed "otherwise," in the
1248	      descriptions of SETCLIENTID and SETCLIENTID_CONFIRM, that these
1249	      don't apply in the cases we are concerned about.

1251	5.5.2.  Proposed changes: clientid4 handling

1253	   To address both of the clientid4-related issues mentioned in
1254	   Section 4.4, we propose replacing the last three paragraphs of the
1255	   section entitled "Client ID" with the following:

1257	      Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has
1258	      successfully completed, the client uses the shorthand client
1259	      identifier, of type clientid4, instead of the longer and less
1260	      compact nfs_client_id4 structure.  This shorthand client
1261	      identifier (a client ID) is assigned by the server and should be
1262	      chosen so that it will not conflict with a client ID previously
1263	      assigned by same server.  This applies across server restarts or
1264	      reboots.

1266	      Distinct servers MAY assign clientid4's independently, and will
1267	      generally do so.  Therefore, a client has to be prepared to deal
1268	      with multiple instances of the same clientid4 value received on
1269	      distinct IP addresses, denoting separate entities.  When trunking
1270	      of server IP addresses is not a consideration, a client should
1271	      keep track of (IP-address, clientid4) pairs, so that each pair is
1272	      distinct.  For a discussion of how to address the issue in the
1273	      face of possible trunking of server IP addresses, see Section 5.2.

1275	      When a clientid4 is presented to a server and that clientid4 is
1276	      not recognized, the server will reject the request with the error
1277	      NFS4ERR_STALE_CLIENTID.  This can occur for a number of reasons:

1279	      *  A server reboot causing loss of the server's knowledge of the
1280	         client

1282	      *  Client error sending an incorrect clientid4 or valid clientid4
1283	         to the wrong server.

1285	      *  Loss of lease state due to lease expiration.

1287	      *  Client or server error causing the server to believe that the
1288	         client has rebooted (i.e. receiving a SETCLIENTID with an
1289	         nfs_client_id4 which has a matching id string and a non-
1290	         matching boot verifier).

1292	      *  Migration of all state under the associated lease causes its
1293	         non-existence to be recognized on the source server.

1295	      *  Merger of state under the associated lease with another lease
1296	         under a different clientid causes the clientid4 serving as the
1297	         source of the merge to cease being recognized on its server.

1299	      In the event of a server reboot, or loss of lease state due to
1300	      lease expiration, the client must obtain a new clientid4 by use of
1301	      the SETCLIENTID operation and then proceed to any other necessary
1302	      recovery for the server reboot case (See the section entitled
1303	      "Server Failure and Recovery").  In cases of server or client
1304	      error resulting in this error, use of SETCLIENTID to establish a
1305	      new lease is desirable as well.

1307	      In the last two cases, different recovery procedures are required.
1308	      See Section 5.6 for details.  Note that in cases in which there is
1309	      any uncertainty about which sort of handling is applicable, the
1310	      distinguishing characteristic is that in reboot-like cases, the
1311	      clientid4 and all associated stateids cease to exist while in
1312	      migration-related cases, the clientid4 ceases to exist while the
1313	      stateids are still valid.

1315	      The client must also employ the SETCLIENTID operation when it
1316	      receives a NFS4ERR_STALE_STATEID error using a stateid derived
1317	      from its current clientid4, since this indicates a situation, such
1318	      as server reboot which has invalidated the existing clientid4 and
1319	      associated stateids (see the section entitled "lock-owner" for
1320	      details).

1322	      See the detailed descriptions of SETCLIENTID and
1323	      SETCLIENTID_CONFIRM for a complete specification of the
1324	      operations.

1326	5.5.3.  Proposed changes: NFS4ERR_CLID_INUSE

1328	   It appears to be the intention that only a single authentication
1329	   flavor be used for client establishment between any client-server
1330	   pair.  However:

1332	   o  There is no explicit statement to this effect.

1334	   o  The error that indicates an authentication flavor conflict has a
1335	      name which does not clarify this issue: NFS4ERR_CLID_INUSE.

1337	   o  The definition of the error is also not very helpful: "The
1338	      SETCLIENTID operation has found that a client id is already in use
1339	      by another client".

1341	   As a result, servers exist which reject a SETCLIENTID simply because
1342	   there already exists a clientid for the same client, established
1343	   using a different IP address.  Although this is generally understood
1344	   to be erroneous, such servers still exist and the spec should make
1345	   the correct behavior clear.

1347	   Although the error name cannot be changed, the following changes
1348	   should be made to avoid confusion:

1350	   o  The definition of the error should be changed to read, "The
1351	      SETCLIENTID operation has found that the specified nfs_client_id4
1352	      was previously presented with a different authentication flavor
1353	      and that client instance currently holds an active lease."

1355	   o  In the description of SETCLIENTID, the phrase "then the server
1356	      returns a NFS4ERR_CLID_INUSE error" should be expanded to read
1357	      "then the server returns a NFS4ERR_CLID_INUSE error, since use of
1358	      a single client with multiple principals is not allowed."

1360	5.6.  Migration, Replication and State (AS PROPOSED)

1362	   When responsibility for handling a given filesystem is transferred to
1363	   a new server (migration) or the client chooses to use an alternate
1364	   server (e.g., in response to server unresponsiveness) in the context
1365	   of filesystem replication, the appropriate handling of state shared
1366	   between the client and server (i.e., locks, leases, stateids, and
1367	   client IDs) is as described below.  The handling differs between
1368	   migration and replication.

1370	   If a server replica or a server immigrating a filesystem agrees to,
1371	   or is expected to, accept opaque values from the client that
1372	   originated from another server, then it is a wise implementation
1373	   practice for the servers to encode the "opaque" values in network
1374	   byte order.  When doing so, servers acting as replicas or immigrating
1375	   filesystems will be able to parse values like stateids, directory
1376	   cookies, filehandles, etc. even if their native byte order is
1377	   different from that of other servers cooperating in the replication
1378	   and migration of the filesystem.

1380	5.6.1.  Migration and State

1382	   In the case of migration, the servers involved in the migration of a
1383	   filesystem SHOULD transfer all server state from the original to the
1384	   new server.  This must be done in a way that is transparent to the
1385	   client.  This state transfer will ease the client's transition when a
1386	   filesystem migration occurs.  If the servers are successful in
1387	   transferring all state, the client will continue to use stateids
1388	   assigned by the original server.  Therefore the new server must
1389	   recognize these stateids as valid.

1391	   If transferring stateids from server to server would result in a
1392	   conflict for an existing stateid for the destination server with the
1393	   existing client, transparent state migration MUST NOT happen for that
1394	   client.  Servers participating in using transparent state migration
1395	   should co-ordinate their stateid assignment policies to make this
1396	   situation unlikely or impossible.  The means by which this might be
1397	   done, like all of the inter-server interactions for migration, are
1398	   not specified by the NFS version 4.0 protocol.

1400	   Handling of clientid values is similar but not identical.  The
1401	   clientid4 and nfs_client_id4 information (id string and boot
1402	   verifier) will be transferred with the rest of the state information
1403	   and the destination server should use that information to determine
1404	   appropriate clientid4 handling.  Although the destination server may
1405	   make state stored under an existing lease available under the
1406	   clientid4 used on the source server, the client should not assume
1407	   that this is always so.  In particular,

1409	   o  If there is an existing lease with an nfs_client_id4 that matches
1410	      a migrated lease (same id string and boot verifier), the server
1411	      SHOULD merge the two, making the union of the sets of stateids
1412	      available under the clientid4 for the existing lease.  As part of
1413	      the lease merger, the expiration time of the lease will reflect
1414	      renewal done within either of the ancestor leases (and so will
1415	      reflect the latest of the renewals).

1417	   o  If there is an existing lease with an nfs_client_id4 that
1418	      partially matches a migrated lease (same id string and a different
1419	      boot verifier), the server MUST eliminate one of the two, possibly
1420	      invalidating one of the ancestor clientid4's.  Since boot
1421	      verifiers are not ordered, the later lease renewal time will
1422	      prevail.

1424	   When leases are not merged, the transfer of state should result in
1425	   creation of a confirmed client record with empty callback information
1426	   but matching the {v, x, c} for the transferred client information.
1427	   This should enable establishment of new callback information using
1428	   SETCLIENTID and SETCLIENTID_CONFIRM.

1430	   A client may determine the disposition of migrated state by using a
1431	   stateid associated with the migrated state and in an operation on the
1432	   new server and using the associated clientid4 in a RENEW on the new
1433	   server.

1435	   o  If the stateid is not valid and an error NFS4ERR_BAD_STATEID is
1436	      received, either transparent state migration has not occurred or
1437	      the state was purged due to boot verifier mismatch.

1439	   o  If the stateid is valid and an error NFS4ERR_STALE_CLIENTID is
1440	      received on the RENEW, transparent state migration has occurred
1441	      and the lease has been merged with an existing lease on the
1442	      destination server.

1444	   o  If the stateid is valid and the clientid4 is valid, the lease has
1445	      been transferred intact.

1447	   Since responsibility for an entire filesystem is transferred with a
1448	   migration event, there is no possibility that conflicts will arise on
1449	   the new server as a result of the transfer of locks.

1451	   The servers may choose not to transfer the state information upon
1452	   migration.  However, this choice is discouraged, except where
1453	   specific issues such as stateid conflicts make it necessary.  In the
1454	   case of migration without state transfer, when the client presents
1455	   state information from the original server (e.g. in a RENEW op or a
1456	   READ op of zero length), the client must be prepared to receive
1457	   either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new
1458	   server.  The client should then recover its state information as it
1459	   normally would in response to a server failure.  The new server must
1460	   take care to allow for the recovery of state information as it would
1461	   in the event of server restart.

1463	   When a lease is transferred to a new server (as opposed to being
1464	   merged with a lease already on the new server), a client SHOULD re-
1465	   establish new callback information with the new server as soon as
1466	   possible, according to sequences described in sections "Operation 35:
1467	   SETCLIENTID - Negotiate Client ID" and "Operation 36:
1468	   SETCLIENTID_CONFIRM - Confirm Client ID".  This ensures that server
1469	   operations are not blocked by the inability to recall delegations.

1471	   In those situation in which state has not been transferred, as shown
1472	   by a return of NFS4ERR_BAD_STATEID, the client may attempt to reclaim
1473	   the locks in order to take advantage of cases in which destination
1474	   server has set up a file-system-specific grace period in support of
1475	   the migration.

1477	5.6.2.  Replication and State

1479	   Since client switch-over in the case of replication is not under
1480	   server control, the handling of state is different.  In this case,
1481	   leases, stateids and client IDs do not have validity across a
1482	   transition from one server to another.  The client must re-establish
1483	   its locks on the new server.  This can be compared to the re-
1484	   establishment of locks by means of reclaim-type requests after a
1485	   server reboot.  The difference is that the server has no provision to
1486	   distinguish requests reclaiming locks from those obtaining new locks
1487	   or to defer the latter.  Thus, a client re-establishing a lock on the
1488	   new server (by means of a LOCK or OPEN request), may have the
1489	   requests denied due to a conflicting lock.  Since replication is
1490	   intended for read-only use of filesystems, such denial of locks
1491	   should not pose large difficulties in practice.  When an attempt to
1492	   re-establish a lock on a new server is denied, the client should
1493	   treat the situation as if its original lock had been revoked.

1495	5.6.3.  Notification of Migrated Lease

1497	   In the case of lease renewal, the client may not be submitting
1498	   requests for a filesystem that has been migrated to another server.
1499	   This can occur because of the implicit lease renewal mechanism.  The
1500	   client renews a lease containing state of multiple filesystems when
1501	   submitting a request to any one filesystem at the server.

1503	   In order for the client to schedule renewal of leases that may have
1504	   been relocated to the new server, the client must find out about
1505	   lease relocation before those leases expire.  Similarly, when
1506	   migration occurs but there has not been transparent state migration,
1507	   the client needs to find out about the change soon enough to be able
1508	   to reclaim the lock within the destination server's grace period.  To
1509	   accomplish this, all operations which implicitly renew leases for a
1510	   client (such as OPEN, CLOSE, READ, WRITE, RENEW, LOCK, and others),
1511	   will return the error NFS4ERR_LEASE_MOVED if responsibility for any
1512	   of the leases to be renewed has been transferred to a new server.
1513	   Note that when the transfer of responsibility leaves remaining state
1514	   for that lease on the source server, the lease is renewed just as it
1515	   would have been in the NFS4ERR_OK case, despite returning the error.
1516	   The transfer of responsibility happens when the server receives a
1517	   GETATTR(fs_locations) from the client for each filesystem for which a
1518	   lease has been moved to a new server.  Normally it does this after
1519	   receiving an NFS4ERR_MOVED for an access to the filesystem but the
1520	   server is not required to verify that this happens in order to
1521	   terminate the return of NFS4ERR_LEASE_MOVED.  By convention, the
1522	   compounds containing GETATTR(fs_locations) SHOULD include an appended
1523	   RENEW operation to permit the server to identify the client getting
1524	   the information.

1526	   Note that the NFS4ERR_LEASE_MOVED error is only required when
1527	   responsibility for at least one stateid has been affected.  In the
1528	   case of a null lease, where the only associated state is a clientid,
1529	   no NFS4ERR_LEASE_MOVED error need be generated.

1531	   Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports
1532	   filesystem migration MUST perform the necessary GETATTR operation for
1533	   each of the filesystems containing state that have been migrated and
1534	   so give the server evidence that it is aware of the migration of the
1535	   filesystem.  Once the client has done this for all migrated
1536	   filesystems on which the client holds state, the server MUST resume
1537	   normal handling of stateful requests from that client.

1539	   One way in which clients can do this efficiently in the presence of
1540	   large numbers of filesystems is described below.  This approach
1541	   divides the process into two phases, one devoted to finding the
1542	   migrated filesystems and the second devoted to doing the necessary
1543	   GETATTRs.

1545	   The client can find the migrated filesystems by building and issuing
1546	   one or more COMPOUND requests, each consisting of a set of PUTFH/
1547	   GETFH pairs, each pair using an fh in one of the filesystems in
1548	   question.  All such COMPOUND requests can be done in parallel.  The
1549	   successful completion of such a request indicates that none of the
1550	   fs's interrogated have been migrated while termination with
1551	   NFS4ERR_MOVED indicates that the filesystem getting the error has
1552	   migrated while those interrogated before it in the same COMPOUND have
1553	   not.  Those whose interrogation follows the error remain in an
1554	   uncertain state and can be interrogated by restarting the requests
1555	   from after the point at which NFS4ERR_MOVED was returned or by
1556	   issuing a new set of COMPOUND requests for the filesystems which
1557	   remain in an uncertain state.

1559	   Once the migrated filesystems have been found, all that is needed is
1560	   for the client to give evidence to the server that it is aware of the
1561	   migrated status of filesystems found by this process, by
1562	   interrogating the fs_locations attribute for an fh within each of the
1563	   migrated filesystems.  The client can do this by building and issuing
1564	   one or more COMPOUND requests, each of which consists of a set of
1565	   PUTFH operations, each followed by a GETATTR of the fs_locations
1566	   attribute.  A RENEW follows to help tie the operations to the lease
1567	   returning NFS4ERR_LEASE_MOVED.  Once the client has done this for all
1568	   migrated filesystems on which the client holds state, the server will
1569	   resume normal handling of stateful requests from that client.

1571	   In order to support legacy clients that do not handle the
1572	   NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after
1573	   a wait of at least two lease periods, at which time it will resume
1574	   normal handling of stateful requests from all clients.  If a client
1575	   attempts to access the migrated files, the server MUST reply
1576	   NFS4ERR_MOVED.

1578	   When the client receives an NFS4ERR_MOVED error, the client can
1579	   follow the normal process to obtain the new server information
1580	   (through the fs_locations attribute) and perform renewal of those
1581	   leases on the new server.  If the server has not had state
1582	   transferred to it transparently, the client will receive either
1583	   NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server,
1584	   as described above.  The client can then recover state information as
1585	   it does in the event of server failure.

1587	   Aside from recovering from a migration, there are other reasons a
1588	   client may wish to retrieve fs_locations information from a server.
1589	   When a server becomes unresponsive, for example, a client may use
1590	   cached fs_locations data to discover an alternate server hosting the
1591	   same fs data.  A client may periodically request fs_locations data
1592	   from a server in order to keep its cache of fs_locations data fresh.

1594	   Since a GETATTR(fs_locations) operation would be used for refreshing
1595	   cached fs_locations data, a server could mistake such a request as
1596	   indicating recognition of an NFS4ERR_LEASE_MOVED condition.
1597	   Therefore a compound which is not intended to signal that a client
1598	   has recognized a migrated lease SHOULD be prefixed with a guard
1599	   operation which fails with NFS4ERR_MOVED if the file handle being
1600	   queried is no longer present on the server.  The guard can be as
1601	   simple as a GETFH operation.

1603	   Though unlikely, it is possible that the target of such a compound
1604	   could be migrated in the time after the guard operation is executed
1605	   on the server but before the GETATTR(fs_locations) operation is
1606	   encountered.  When a client issues a GETATTR(fs_locations) operation
1607	   as part of a compound not intended to signal recognition of a
1608	   migrated lease, it SHOULD be prepared to process fs_locations data in
1609	   the reply that shows the current location of the fs is gone.

1611	5.6.4.  Migration and the Lease_time Attribute

1613	   In order that the client may appropriately manage its leases in the
1614	   case of migration, the destination server must establish proper
1615	   values for the lease_time attribute.

1617	   When state is transferred transparently, that state should include
1618	   the correct value of the lease_time attribute.  The lease_time
1619	   attribute on the destination server must never be less than that on
1620	   the source since this would result in premature expiration of leases
1621	   granted by the source server.  Upon migration in which state is
1622	   transferred transparently, the client is under no obligation to re-
1623	   fetch the lease_time attribute and may continue to use the value
1624	   previously fetched (on the source server).

1626	   In the case in which lease merger occurs as part of state transfer,
1627	   the lease_time attribute of the destination lease remains in effect.
1628	   The client can simply renew that lease with its existing lease_time
1629	   attribute.  State in the source lease is renewed at the time of
1630	   transfer so that it cannot expire, as long as the destination lease
1631	   is appropriately renewed.

1633	   If state has not been transferred transparently (i.e., the client
1634	   need to reclaim or re-obtain its locks), the client should fetch the
1635	   value of lease_time on the new (i.e., destination) server, and use it
1636	   for subsequent locking requests.  However the server must respect a
1637	   grace period at least as long as the lease_time on the source server,
1638	   in order to ensure that clients have ample time to reclaim their
1639	   locks before potentially conflicting non-reclaimed locks are granted.
1640	   The means by which the new server obtains the value of lease_time on
1641	   the old server is left to the server implementations.  It is not
1642	   specified by the NFS version 4.0 protocol.

1644	6.  Results of proposed changes for NFSv4.0

1646	   The purpose of this section is to examine the troubling results
1647	   reported in Section 3.1.  We will look at the scenarios as they would
1648	   be handled within the proposal.

1650	   Because the choice of uniform vs. non-uniform nfs_client_id4 id
1651	   strings is a "SHOULD" in these cases, we will designate clients that
1652	   follow this recommendation by SHOULD-UF-CID.

1654	   We will also have to take account of any merger-related "SHOULD"
1655	   clauses to better understand how they have addressed the issues seen.
1656	   We abbreviate as follows:

1658	   o  SHOULD-SVR-AM refers to the server obeying the SHOULD which
1659	      RECOMMENDS that they merge leases with identical nfs_client_id4 id
1660	      strings and boot verifiers.

1662	6.1.  Results: Failure to free migrated state on client reboot

1664	   Let's look at the troublesome situation cited in Section 3.1.1.  We
1665	   have already seen what happens when SHOULD-UF-CID does not hold.  Now
1666	   let's look at the situation in which SHOULD-UF-CID holds, whether
1667	   SHOULD-SVR-AM is in effect or not.

1669	   o  A client C establishes a clientid4 C1 with server ABC specifying
1670	      an nfs_client_id4 with id string value "C" and boot verifier
1671	      0x111.

1673	   o  The client begins to access files in filesystem F on server ABC,
1674	      resulting in generating stateids S1, S2, etc. under the lease for
1675	      clientid C1.  It may also access files on other filesystems on the
1676	      same server.

1678	   o  The filesystem is migrated from ABC to server XYZ.  When
1679	      transparent state migration is in effect, stateids S1 and S2 and
1680	      lease {0x111, "C", C1} are now available for use by client C at
1681	      server XYZ.  So far, so good.

1683	   o  Client C reboots and attempts to access data on server XYZ,
1684	      whether in filesystem F or another.  It does a SETCLIENID with an
1685	      nfs_client_id4 with id string value "C" and boot verifier 0x112.
1686	      The state associated with lease {0x111, "C", C1} is deleted as
1687	      part of creating {0x112, "C", C2}.  No problem.

1689	   The correctness signature for this issue is

1691	      SHOULD-UF-CID

1693	   so if you have clients and servers that obey the SHOULD clauses, the
1694	   problem is gone regardless of the choice on the MAY.

1696	6.2.  Results: Server reboots resulting in confused lease situation

1698	   Now let's consider the scenario given in Section 3.1.2.  We have
1699	   already seen what happens when SHOULD-UF-CID does not hold .  Now
1700	   let's look at the situation in which SHOULD-UF-CID holds and SHOULD-
1701	   SVR-AM holds as well.

1703	   o  Client C talks to server ABC using an nfs_client_id4 id string
1704	      such as "C-ABC" and boot verifier v1.  As a result a lease with
1705	      clientid4 c.i established: {v1, "C-ABC", c.i}.

1707	   o  fs_a1 migrates from server ABC to server XYZ along with its state.
1708	      Now server XYZ also has a lease: {v1, "C-ABC", c.i}

1710	   o  Server ABC reboots.

1712	   o  Client C talks to server ABC using an nfs_client_id4 id string
1713	      such as "C-ABC" and boot verifier v1.  As a result a lease with
1714	      clientid4 c.j established: {v1, "C-ABC", c.j}.

1716	   o  fs_a2 migrates from server ABC to server XYZ.  As part of
1717	      migration the incoming lease is seen to denote same Nfs_client_id4
1718	      and so is merged with {v1, "C-ABC, c.i}.

1720	   o  Now server XYZ has only one lease that matches {v1, "C_ABC", *},
1721	      so the problem is solved

1723	   Now let's consider the same scenario in the situation in which
1724	   SHOULD-UF-CID holds and SHOULD-SVR-AM holds as well.

1726	   o  Client C talks to server ABC using an nfs_client_id4 id string "C"
1727	      and boot verifier v1.  As a result a lease with clientid4 c.i is
1728	      established: {v1, "C", c.i}.

1730	   o  fs_a1 migrates from server ABC to server XYZ along with its state.
1731	      Now XYZ also has a lease: {v1, "C", c.i}

1733	   o  Server ABC reboots.

1735	   o  Client C talks to server ABC using an nfs_client_id4 id string "C"
1736	      and boot verifier v1.  As a result a lease with clientid4 c.j is
1737	      established: {v1, "C", c.j}.

1739	   o  fs_a2 migrates from server ABC to server XYZ.  As part of
1740	      migration the incoming lease is seen to denote the same
1741	      nfs_client_id4 and so is merged with {v1, "C", c.i}.

1743	   o  Now server XYZ has only one lease that matches {v1, "C", *}, so
1744	      the problem is solved

1746	   The correctness signature for this issue is
1747	      SHOULD-SVR-AM

1749	   so if you have clients and servers that obey the SHOULD clauses, the
1750	   problem is gone regardless of the choice on the MAY.

1752	6.3.  Results: Client complexity issues

1754	   Consider the following situation:

1756	   o  There are a set of clients C1 through Cn accessing servers S1
1757	      through Sm.  Each server manages some significant number of
1758	      filesystems with the filesystem count L being significantly
1759	      greater than m.

1761	   o  Each client Cx will access a subset of the servers and so will
1762	      have up to m clientid's, which we will call Cxy for server Sy.

1764	   o  Now assume that for load-balancing or other operational reasons,
1765	      numbers of filesystems are migrated among the servers.  As a
1766	      result, depending on how this handled, the number of clientids may
1767	      explode.  See below.

1769	   Now look what will happen under various scenarios:

1771	   o  We have previously (in Section 3.1.3) looked at this in case of
1772	      client following the non-uniform client-string approach.  In that
1773	      case, each client-server pair could have up to m clientid's and
1774	      each client will have up to m**2 clientids.  If we add the
1775	      possibility of server reboot, the only bound on a client's
1776	      clientid count is L.

1778	   o  If we look at this in the SHOULD-UF-CID case in which the SHOULD-
1779	      SVR_AM condition holds, the situation is no different.  Although
1780	      the server has the client identity information that could enable
1781	      same-client-same-server leases to be combined, it does not do so.
1782	      We still have up to L clientid's per client.

1784	   o  On the other hand, if we look at the SHOULD-UF-CID case in which
1785	      SHOULD-SVR-AM holds, the problem is gone.  There can be no more
1786	      than m clientids per client, and n clientid's per server.

1788	   The correctness signature for this issue is

1790	      (SHOULD-UF-CID & SHOULD-SVR-AM)

1792	   so if you have clients and servers that obey the SHOULD clauses, the
1793	   problem is gone regardless of the choice on the MAY.

1795	6.4.  Result summary

1797	   We have seen that (SHOULD-SVR-AM & SHOULD-UF-CID) are sufficient to
1798	   solve the problems people have experienced.

1800	7.  Issues for NFSv4.1

1802	   Because NFSv4.1 embraces the uniform client-string approach,
1803	   addressing migration issues is simpler.  In the terms of Section 6,
1804	   we already have SHOULD-UF-CID, for NFSv4.1, as advised by section 2.4
1805	   of [RFC5661], simplifying the work to be done.

1807	   Nevertheless, there are some issues that will have to be addressed.
1808	   Some examples:

1810	   o  The other necessary part of addressing migration issues, which we
1811	      call above SHOULD-SVR-AM, is not currently addressed by NFSv4.1
1812	      and changes need to be made to make it clear that state needs to
1813	      be appropriately merged as part of migration, to avoid multiple
1814	      clientids between a client-server pair.

1816	   o  There needs to be some clarification of how migration, and
1817	      particularly transparent state migration, should interact with
1818	      pNFS layouts.

1820	   o  The current discussion (in [RFC5661]), of the possibility of
1821	      server_owner changes is incomplete and confusing.

1823	   Discussion of how to resolve these issues will appear in the sections
1824	   below.

1826	7.1.  Addressing state merger in NFSv4.1

1828	   The existing treatment of state transfer in [RFC5661], has similar
1829	   problems to that in [RFC3530] in that it assumes that the state for
1830	   multiple fs's on different servers will not be merged to so that it
1831	   appears under a single common clientid.  We've already seen the
1832	   reasons that this is a problem, with regard to NFSv4.0.

1834	   Although we don't have the problems stemming from the non-uniform
1835	   client-string approach, there are a number of complexities in the
1836	   existing treatment of state management in the section entitled "Lock
1837	   State and File System Transitions" in [RFC5661] that make this non-
1838	   trivial to address:

1840	   o  Migration is currently treated together with other sorts of file
1841	      system transitions including transitioning between replicas
1842	      without any NFS4ERR_MOVED errors.

1844	   o  There is separate handling and discussion of the cases of matching
1845	      and non-matching server scopes.

1847	   o  In the case of matching server scopes, the text calls for an
1848	      impossible degree of transparency.

1850	   o  In the case of non-matching server scopes, the text does not
1851	      mention transparent state migration at all, resulting in a
1852	      functional regression from NFSV4.0

1854	7.2.  Addressing pNFS relationship with migration

1856	   This is made difficult because, within the PNFS framework, migration
1857	   might mean any of several things:

1859	   o  Transfer of the MDS, leaving DS's alone.

1861	      This would be minimally disruptive to those using layouts but
1862	      would a require the pNFS control protocol to support the DS being
1863	      directed to a new MDS.

1865	   o  Transfer of a DS, leaving everything else in place.

1867	      Such a transfer can be handled without using migration at all.
1868	      The server can recall/revoke layouts, as appropriate.

1870	   o  Transfer of the file system to a new file system with both MDS and
1871	      DS's moving.

1873	      In such a transfer, an entirely different set of DS's will be at
1874	      the target location.  There may even be no pNFS support on the
1875	      destination FS at all.

1877	   Migration needs to support both the first and last of these models.

1879	7.3.  Addressing server owner changes in NFSv4.1

1881	   Section 2.10.5 of [RFC5661] states the following.

1883	      The client should be prepared for the possibility that
1884	      eir_server_owner values may be different on subsequent EXCHANGE_ID
1885	      requests made to the same network address, as a result of various
1886	      sorts of reconfiguration events.  When this happens and the
1887	      changes result in the invalidation of previously valid forms of
1888	      trunking, the client should cease to use those forms, either by
1889	      dropping connections or by adding sessions.  For a discussion of
1890	      lock reclaim as it relates to such reconfiguration events, see
1891	      Section 8.4.2.1.

1893	   While this paragraph is literally true in that such reconfiguration
1894	   events can happen and clients have to deal with them, it is confusing
1895	   in that it can be read as suggesting that clients have to deal with
1896	   them without disruption, which in general is impossible.

1898	   A clearer alternative would be:

1900	      It is always possible that, as a result of various sorts of
1901	      reconfiguration events, eir_server_scope and eir_server_owner
1902	      values may be different on subsequent EXCHANGE_ID requests made to
1903	      the same network address.

1905	      In most cases such reconfiguration events will be disruptive and
1906	      indicate that an IP address formerly connected to one server is
1907	      now connected to an entirely different one.

1909	      Some guidelines on client handling of such situations follow:

1911	      *  When eir_server_scope changes, the client has no assurance that
1912	         any id's it obtained previously (e.g. file handles) can be
1913	         validly used on the new server, and, even if the new server
1914	         accepts them, there is no assurance that this is not due to
1915	         accident.  Thus it is best to treat all such state as lost/
1916	         stale although a client may assume that the probability of
1917	         inadvertent acceptance is low and treat this situation as
1918	         within the next case.

1920	      *  When eir_server_scope remains the same and
1921	         eir_server_owner.so_major_id changes, the client can use
1922	         filehandles it has and attempt reclaims.  It may find that
1923	         these are now stale but if NFS4ERR_STALE is not received, he
1924	         can proceed to reclaim his opens.

1926	      *  When eir_server_scope and eir_server_owner.so_major_id remain
1927	         the same, the client has to use the now-current values of
1928	         eir_server-owner.so_minor_id in deciding on appropriate forms
1929	         of trunking.

1931	8.  Lock State and File System Transitions (AS PROPOSED)

1933	   In dealing with file system transitions, the client needs to handle
1934	   cases in which the two servers have cooperated in state management
1935	   and cases in which they have not.

1937	   The primary means by which a client finds out about state management
1938	   co-operation is by comparing eir_server_scope values returned by each
1939	   server.  If the scope values do not match, then any co-operation of
1940	   the servers in state management, is limited to transferring state in
1941	   event of migration and making arrangements for the safe reclamation
1942	   of locking state.  If the scope values match, then this indicates the
1943	   servers have cooperated in assigning client IDs and stateids to the
1944	   point that the same id will not refer to different things on
1945	   different servers.  Servers may reject client IDs that refer to state
1946	   they do not know about.  See the section entitled "Server Scope" for
1947	   more information about the use of server scope.

1949	   How the client needs to deal with locking state with regard to these
1950	   situations will depend upon:

1952	   o  The type of file system transition occurring.

1954	   o  The type of state involved (e.g. layout state may sometimes be
1955	      handled differently).

1957	   o  The specific level of state handling co-ordination between the two
1958	      servers for the specific transition.

1960	   We will divide the basic description of these possibilities into
1961	   three sections

1963	   o  In Section 8.1, we will discuss handling specific to the case of
1964	      matching server scopes.

1966	   o  In Section 8.2, we will discuss handling specific to the case of
1967	      non-matching server scopes.

1969	   o  In Section 8.3, we will discuss issues relating to handling common
1970	      to both cases.

1972	8.1.  File System Transitions with Matching Server Scopes

1974	   In the case of migration, the servers involved in the migration of a
1975	   file system SHOULD transfer all server state relevant to the
1976	   migrating file system from the original to the new server.  When this
1977	   is done, it needs to be done in a way that is maximally transparent
1978	   to the client in that all stateids used by the client to access state
1979	   on the filesystem in question can be used on the new server, albeit
1980	   possibly under different client IDs.

1982	   When layouts are active for a migrated file system, layout state
1983	   SHOULD be included as part of the state transferred.  Even if it is
1984	   the case that there are circumstances preventing the layout from
1985	   being supported on the new server, this should be dealt with by
1986	   recalling layouts either before or after the transition.  Where this
1987	   cannot be done, layout revocation is possible but any such revocation
1988	   should appear to the client just as any other layout revocation
1989	   would.

1991	   With replication, such a degree of common state is typically not the
1992	   case.  Clients, however, should use the information provided by the
1993	   eir_server_scope returned by EXCHANGE_ID (as modified by the
1994	   validation procedures described in the section entitled "Server
1995	   Scope") to determine whether such sharing may be in effect in non-
1996	   migration cases, rather than making assumptions based solely on the
1997	   reason for the transition.

1999	   This state transfer will reduce disruption to the client when a file
2000	   system transition occurs.  If the servers are successful in
2001	   transferring all state, the client can access existing stateids,
2002	   using either existing or new sessions between the client and the new
2003	   server instance.  If the server accepts such a transferred stateid as
2004	   valid, then the client may use that stateid to access the same state
2005	   that it represented on the old server.

2007	   When the two servers belong to the same server scope, it does not
2008	   mean that when dealing with the transition, the client will not have
2009	   to reclaim or otherwise reobtain state.  However, it does mean that
2010	   the client may proceed using its current stateids when communicating
2011	   with the new server, and the new server will either recognize the
2012	   stateids as valid or reject them, in which case locking state must be
2013	   reobtained by the client.

2015	   File systems cooperating in state management may actually share state
2016	   or simply divide the identifier space so as to recognize (and reject
2017	   as stale) each other's stateids and client IDs.  Servers that do
2018	   share state may not do so under all conditions or at all times.  If
2019	   the server cannot be sure when accepting a stateid that it reflects
2020	   the locks the client was given, the server must treat the state as
2021	   stale and report it as such to the client.

2023	8.2.  File System Transitions with Non-Matching Server Scopes

2025	   When the two file system instances are on servers that do not share a
2026	   server scope value, the client must establish a new client ID on the
2027	   destination, if it does not have one already, to obtain access to its
2028	   locks.  Depending on the type of file system transition and
2029	   facilities provided by the server, it may re-establish its connection
2030	   to locking and layout state in a number of ways.

2032	   In the case of migration, the servers may have transferred stateids,
2033	   making it possible for the client to access his state on the new
2034	   server, simply by using the existing stateid.  The server may
2035	   transfer all state or a subset and the client can use TEST_STATEID to
2036	   determine what state has been transferred and what needs to be
2037	   reclaimed or otherwise reobtained as described in Section 8.3.

2039	   Lock reclaim may be used by the client for any sort of file system
2040	   transition, but the server is not required to support it in any
2041	   particular case.

2043	   Note that in this case, lock reclaim may be attempted even when the
2044	   servers involved in the transfer have different server scope values
2045	   (see Section 8.4.2.1 for the contrary case of reclaim after server
2046	   reboot).  Servers with different server scope values may cooperate to
2047	   allow reclaim for locks associated with the transfer of a file system
2048	   even if they do not cooperate sufficiently to share a server scope.

2050	8.3.  FS Transitions Involving Reobtaining Locking State

2052	   In either case, when actual locks are not known to be maintained, the
2053	   destination server may establish a grace period specific to the given
2054	   file system, with non-reclaim locks being rejected for that file
2055	   system, even though normal locks are being granted for other file
2056	   systems.  Clients should not infer the absence of a grace period for
2057	   file systems being transitioned to a server from responses to
2058	   requests for other file systems.

2060	   In the case of lock reclamation for a given file system after a file
2061	   system transition, edge conditions can arise similar to those for
2062	   reclaim after server restart (although in the case of the planned
2063	   state transfer associated with migration, these can be avoided by
2064	   securely recording lock state as part of state migration).  Unless
2065	   the destination server can guarantee that locks will not be
2066	   incorrectly granted, the destination server should not allow lock
2067	   reclaims and should avoid establishing a grace period.

2069	   Once all locks have been reclaimed, or there were no locks to
2070	   reclaim, the client indicates that there are no more reclaims to be
2071	   done for the file system in question by sending a RECLAIM_COMPLETE
2072	   operation with the rca_one_fs parameter set to true.  Once this has
2073	   been done, non-reclaim locking operations may be done, and any
2074	   subsequent request to do a reclaim will be rejected with the error
2075	   NFS4ERR_NO_GRACE.

2077	   Information about client identity may be propagated between servers
2078	   in the form of a client_owner4 and associated verifiers, under the
2079	   assumption that the client presents the same values to all the
2080	   servers with which it deals.

2082	   Servers are encouraged to provide facilities to allow locks to be
2083	   reclaimed on the new server after a file system transition.  Often,
2084	   however, in cases in which the two servers do not share a server
2085	   scope value, such facilities may not be available and the client
2086	   should be prepared to re-obtain locks, even though it is possible
2087	   that the client may have its LOCK or OPEN request denied due to a
2088	   conflicting lock.

2090	   Layouts may be reobtained when necessary even without special
2091	   facilities for lock reclamation.  However, the client MUST NOT depend
2092	   on being able to obtain such layout since pNFS or the desired mapping
2093	   type might not be supported on the new server.

2095	   The consequences of having no facilities available to reclaim locks
2096	   on the new server will depend on the type of environment.  In some
2097	   environments, such as the transition between read-only file systems,
2098	   such denial of locks should not pose large difficulties in practice.
2099	   When an attempt to re-establish a lock on a new server is denied, the
2100	   client should treat the situation as if its original lock had been
2101	   revoked.  Note that when the lock is granted, the client cannot
2102	   assume that no conflicting lock could have been granted in the
2103	   interim.  Where change attribute continuity is present, the client
2104	   may check the change attribute to check for unwanted file
2105	   modifications.  Where even this is not available, and the file system
2106	   is not read-only, a client may reasonably treat all pending locks as
2107	   having been revoked.

2109	9.  Security Considerations

2111	   The current definitive definition of the NFSv4.0 protocol [RFC3530],
2112	   and the current pending draft of RFC3530bis [cur-v4.0-bis] both
2113	   agree.  The section entitled "Security Considerations" encourages
2114	   that clients protect the integrity of the SECINFO operation, any
2115	   GETATTR operation for the fs_locations attribute, and the operations
2116	   SETCLIENTID/SETCLIENTID_CONFIRM.  A migration recovery event can use
2117	   any or all of these operations.  We do not recommend any change here.

2119	10.  IANA Considerations

2121	   This document does not require actions by IANA.

2123	11.  Acknowledgements

2125	   The editor and authors of this document gratefully acknowledge the
2126	   contributions of Trond Myklebust of NetApp and Robert Thurlow of
2127	   Oracle.  We also thank Tom Haynes of NetApp and Spencer Shepler of
2128	   Microsoft for their guidance and suggestions.

2130	   Special thanks go to members of the Oracle Solaris NFS team,
2131	   especially Rick Mesta and James Wahlig, for their work implementing
2132	   an NFSv4.0 migration prototype and identifying many of the issues
2133	   documented here.

2135	12.  References

2137	12.1.  Normative References

2139	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2140	              Requirement Levels", BCP 14, RFC 2119, March 1997.

2142	   [RFC3530]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
2143	              Beame, C., Eisler, M., and D. Noveck, "Network File System
2144	              (NFS) version 4 Protocol", RFC 3530, April 2003.

2146	   [RFC5661]  Shepler, S., Eisler, M., and D. Noveck, "Network File
2147	              System (NFS) Version 4 Minor Version 1 Protocol",
2148	              RFC 5661, January 2010.

2150	12.2.  Informative References

2152	   [cur-v4.0-bis]
2153	              Haynes, T., Ed. and D. Noveck, Ed., "Network File System
2154	              (NFS) Version 4 Protocol", 2011, <http://www.ietf.org/id/
2155	              draft-ietf-nfsv4-rfc3530bis-18.txt>.

2157	              Work in progress.

2159	Authors' Addresses

2161	   David Noveck (editor)
2162	   EMC Corporation
2163	   228 South Street
2164	   Hopkinton, MA  01748
2165	   US

2167	   Phone: +1 508 249 5748
2168	   Email: david.noveck@emc.com
2169	   Piyush Shivam
2170	   Oracle Corporation
2171	   5300 Riata Park Ct.
2172	   Austin, TX  78727
2173	   US

2175	   Phone: +1 512 401 1019
2176	   Email: piyush.shivam@oracle.com

2178	   Charles Lever
2179	   Oracle Corporation
2180	   1015 Granger Avenue
2181	   Ann Arbor, MI  48104
2182	   US

2184	   Phone: +1 248 614 5091
2185	   Email: chuck.lever@oracle.com

2187	   Bill Baker
2188	   Oracle Corporation
2189	   5300 Riata Park Ct.
2190	   Austin, TX  78727
2191	   US

2193	   Phone: +1 512 401 1081
2194	   Email: bill.baker@oracle.com