<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3530 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3530.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
<!ENTITY RFC5661 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5661.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->



<rfc category="info"



     docName="draft-dnoveck-nfsv4-migration-issues-02"

     ipr="trust200902">
  <front>
    <title abbrev="nfsv4-migr-isssues">NFSv4.0 migration: Implementation experience and spec issues to resolve</title>

    <author initials='D.' surname='Noveck'
            fullname = 'David Noveck' role='editor'>
      <organization abbrev='EMC'>EMC Corporation</organization>
      <address>
        <postal>
          <street>228 South Street</street>
          <city>Hopkinton</city> 
          <region>MA</region>
          <code>01748</code>
          <country>US</country>
        </postal>

        <phone>+1 508 249 5748</phone>
        <email>david.noveck@emc.com</email>
      </address>
    </author> 
    <author initials='P.' surname='Shivam'
            fullname = 'Piyush Shivam'>
      <organization abbrev='ORACLE'>Oracle Corporation</organization>
      <address>
        <postal>
          <street>5300 Riata Park Ct.</street>
          <city>Austin</city>
          <region>TX</region>
          <code>78727</code>
          <country>US</country>
        </postal>

        <phone>+1 512 401 1019</phone>
        <email>piyush.shivam@oracle.com</email>
      </address>
    </author>

    <author initials='C.' surname='Lever'
            fullname = 'Charles Lever'>
      <organization abbrev='ORACLE'>Oracle Corporation</organization>
      <address>
        <postal>
          <street>1015 Granger Avenue</street>
          <city>Ann Arbor</city>
          <region>MI</region>
          <code>48104</code>
          <country>US</country>
        </postal>

        <phone>+1 248 614 5091</phone>
        <email>chuck.lever@oracle.com</email>
      </address>
    </author>

    <author initials='B.' surname='Baker'
            fullname = 'Bill Baker'>
      <organization abbrev='ORACLE'>Oracle Corporation</organization>
      <address>
        <postal>
          <street>5300 Riata Park Ct.</street>
          <city>Austin</city>
          <region>TX</region>
          <code>78727</code>
          <country>US</country>
        </postal>

        <phone>+1 512 401 1081</phone>
        <email>bill.baker@oracle.com</email>
      </address>
    </author>




    <date year="2012"/>

   <area>Transport</area>
   <workgroup>NFSv4</workgroup>

    <abstract>
      <t>
        The migration feature of NFSv4 provides for moving responsibility for 
        a single filesystem from one server to another, without disruption 
        to clients.  Recent implementation experience has shown problems 
        in the existing specification for this feature.   This document 
        discusses the issues which have arisen and explores the options
        available for curing the issues via clarification and correction 
        of the NFSv4.0 specification.
      </t>
    </abstract>


  </front>

  <middle>
        
    <section title="Introduction">
      <t>
        This document is in the informational category, and while the 
        facts it reports may have normative implications, any such normative 
        significance reflects the readers' preferences. For example, we 
        may report that the reboot of a client with migrated state results
        in state not being
        promptly cleared and that this will prevent granting of conflicting 
        lock requests at least for the lease time, which is a fact.
        While it is to be expected that client and server implementers will 
        judge this to be a situation that is best avoided, the judgment as to 
        how pressing this issue should be considered is a judgment
        for the reader, and eventually the nfsv4 working group to make.
      </t>
       <t>
        We do explore possible ways in which such issues can be avoided, 
        with minimal negative effects, in the expectation that the working 
        group will choose to address these issues, but the choice of
        exactly how to address this is best given effect in a working group 
        document.
      </t>
    </section>
        
    <section title="Conventions">
      <t>
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
        "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted 
        as described in <xref target="RFC2119" />.
      </t>
      <t>
        In the context of this informational document, these normative keywords 
        will always occur in the context of a quotation, most often direct 
        but sometimes indirect. The context will
        make it clear whether the quotation is from:
      <list style='symbols'>
        <t>
          The current definitive definition of  the NFSv4.0 protocol, whether 
          that is the original NFSv4.0 specification <xref target="RFC3530" />,
          the current pending draft of RFC3530bis expected to become the
          definitive definition of NFSv4.0 once certain procedural steps are
          taken <xref target="cur-v4.0-bis" />, or an eventual RFC3530bis RFC,
          taking over the role of definitive definition of NFSv4.0 from RFC3530. 
        <vspace blankLines='1' />
          As the identity of that document may change during the lifetime 
          of this document, we will often refer to the current or pending
          definition of NFSv4.0 and quote from portions of the documents
          that are identical among all existing drafts.  Given that RFC3530 
          and all RFC3530bis drafts agree as to the issues 
          under discussion, 
          this should not cause undue difficulty.  Note that to simplify 
          document maintenance, section names rather than section numbers 
          are used when referring to sections in existing documents
          so that only minimal changes 
          will be necessary as the identity of the document defining NFSv4.0 
          changes.
        </t>
        <t>
          A proposed or possible text to serve as a replacement for the 
          current definitive document text.  Sometimes, 
          a number of possible alternative
          texts may be listed and benefits and detriments of 
          each examined in turn.  
        </t>
      </list>
      </t>
    </section>
    <section title="Implementation Experience">
      <section anchor="issues-all"
               title="Implementation issues">
      <t>
        Note that the examples below reflect current experience which arises
        from clients implementing the recommendation to use different
        nfs_client_id4 id strings for different server addresses, i.e.
        using what is later referred to herein as the "non-uniform
        client-string model"  
      </t>
      <t>
        This is simply because that is the experience implementers have had.  
        The
        reader should not assume that in all cases, this practice is the 
        source of the difficulty.   It may be so in some cases but clearly 
        it is not in all cases.
      </t>
        <section  anchor="issue-fail-free"
                  title="Failure to free migrated state on client reboot">
          <t>
            The following sort of situation has proved troublesome:
          <list style='symbols'>
            <t>
              A client C establishes a clientid4 C1 with server ABC specifying
              an nfs_client_id4 with "id" value "C-ABC" and verifier 0x111. 
            </t>
            <t>
              The client begins to access files in filesystem F on server ABC,
              resulting in generating stateids S1, S2, etc. under the lease for 
              clientid C1.  It may also access files on other filesystems on the same
              server.
            </t>
            <t>
              The filesystem is migrated from ABC to server XYZ.  When transparent
              state migration is in effect, stateids S1 and S2 and clientid4 C1 are now 
              available for use by client C at server XYZ.  So far, so good.
            </t>
            <t>
              Client C reboots and attempts to access data on server XYZ, whether in 
              filesystem F or another.  It does a SETCLIENTID with an nfs_client_id4 with 
              "id" value "C-XYZ" and verifier 0x112.  There is thus no occasion to free
              stateids S1 and S2 since they are associated with a different client name and
              so lease expiration is the only way that they can be gotten rid of. 
            </t>
          </list>
          </t>
          <t>
            Note here that while it seems clear to us in this example that C-XYZ and C-ABC
            are from the same client, the server has no way to determine the structure of the "opaque" id.
            In the protocol, it really is opaque.  Only the client knows which nfs_client_id4 values
            designate the same client on a different server.
         </t>
        </section>
        <section anchor="issue-svr-reboot-confusion"
                 title="Server reboots resulting in a confused lease situation">
          <t>
            Further problems arise from scenarios like the following.  
          <list style='symbols'>

            <t>
             Client C talks to server ABC using an nfs_client_id4 id like 
             "C-ABC" and verifier v1.  As a result a lease with clientid4 
             c.i is established: {v1, "C-ABC", c.i}.
             
            </t>
            <t>
              fs_a1 migrates from server ABC to server XYZ along with its 
              state.  Now server XYZ also has a lease: {v1, "C-ABC", c.i}. 
            </t>
            <t>
              Server ABC reboots.
            </t>
            <t>
              Client C talks to server ABC using an nfs_client_id4 id like 
              "C-ABC" and verifier v1.  As a result a lease with clientid4 
              c.j is established: {v1, "C-ABC", c.j}.

            </t>
            <t>
              fs_a2 migrates from server ABC to server XYZ. 
             Now server XYZ also has a lease: {v1, "C-ABC", c.j}. 
            </t>
            <t>

              Now server XYZ has two leases that match {v1, "C-ABC", *}, when
              the protocol clearly assumes there can be only one.
            </t>
          </list>
          </t>
          <t>
            Note that if the client used "C" (rather than "C-ABC") as the
            nfs_client_id4 id string, the exact same situation would arise. 
          </t>
          <t>
            One of the first cases in which this sort of situation has
            resulted in difficulties is in connection with doing a
            SETCLIENTID for callback update.
          </t>
          <t>	
            The SETCLIENTID for callback update only includes the 
            nfs_client_id4, assuming there can only be one such with a
            given nfs_client_id4 value.  If there are multiple, confirmed 
            client records with identical nfs_client_id4 values,
	    there is no way to map the callback update request to the 
            correct client record. 
          </t>
          <t>	
            One possible accommodation for this particular issue that 
            has been used 
            is to add a RENEW operation along with SETCLIENTID (on a
	    callback update) to disambiguate the client.
          </t>
          <t>	
            When the client updates the callback info to the destination,
	    the client would, by convention, send a compound like this:
          </t>
          <t>	
  	   { RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb }
          </t>
          <t>	
   	   The presence of the clientid4 in the compound would allow the 
           server to
	   differentiate among the various leases that it knows of, all 
           with the same nfs_client_id4 value.
          </t>
          <t>	
            While this would be a reasonable patch for an isolated protocol
            weakness, interoperable clients and servers would require that
            the protocol truly be updated to allow such a situation, 
            specifically that of multiple clientid4's 
            with the same nfs_client_id4 value.
            The protocol is currently designed and implemented assuming this 
            can't happen.  We need to either prevent the situation from
            happening, or fully adapt to the possibilities which can arise.
            See <xref target="poss-all" /> for a discussion of such issues. 
          </t>

        </section>
        <section anchor="issue-clnt-comp"
                 title="Client complexity issues">
         <t>
           Consider the following situation:
         <list style='symbols'>            
           <t>
             There are a set of clients C1 through Cn accessing servers 
             S1 through Sm.  Each server manages some significant number
             of filesystems with the filesystem count L being significantly
             greater than m.
           </t>
           <t>
             Each client Cx will access a subset of the servers
             and so will have up to m clientid's, which we will call Cxy for
             server Sy.     
           </t>
           <t>
             Now assume that for load-balancing or other operational reasons,
             numbers of filesystems are migrated among the servers.  As a 
             result, each client-server pair will have up to m clientid's
             and each client will have up to m**2 clientids.  If we add the
             possibility of server reboot, the only bound on a client's clientid
             count is L.
           </t>
         </list>
         </t>
         <t>
           Now, instead of a clientid4 identifying a client-server pair, we
           have many more entities for the client to deal with.  In addition,
           it isn't clear how new state is to be incorporated in this
           structure.
         </t>
         <t>
           The limitations of the migrated state (inability to be freed on
           reboot) would argue against adding more such state but trying
           to avoid that would run into its own difficulties.  For example,
           a single lockowner string presented under two different clientids
           would appear as two different entities.
         </t>
         <t>
           Thus we have to choose between: 
         <list style='symbols'>
           <t>
             indefinite prolongation of foreign clientid's even after all
             transferred state is gone.
           </t>       
           <t>
             having multiple requests for the same lockowner-string-named
             entity carried on in parallel by separate identically named
             lockowners under different clientid4's
           </t>       
           <t>
             Adding serialization at the lock-owner string level, in addition
             to that at the lockowner level.       
           </t>       
         </list>            
         </t>
         <t>
           In any case, we have gone (in adding migration as it 
           was described) from a situation in which
         <list style='symbols'>
           <t>
             Each client has a single clientid4/lease or each server it
             talks to.   
           </t>       
           <t>
             Each client has a single nfs_client_id4 for each server it
             talks to.   
           </t>       
           <t>  
             Every state id can be mapped to an associated lease based
             on the server it was obtained from.     
           </t>       
         </list>            
         </t>
         <t>
           To one in which
         <list style='symbols'>
           <t>       
             Each client may have multiple clientid4's for a single server.
           </t>       
           <t>
             For each stateid, the client must separately record the clientid4
             that it is assigned to, or it must manage separate "state blobs" 
             for each fsid and map those to clientid4's.        
           </t>       
           <t>
             Before doing an operation that can result in a stateid, the
             client must either find a "state blob" based on fsid or create
             a new one, possibly with a new clinetid4.
           </t>       
           <t>   
             There may be multiple clientid4's all connected to the same
             server and using the same nfs_clientid4.    
           </t>       
         </list>            
         </t>
         <t>
           This sort of additional client complexity is troublesome and needs 
           to be eliminated.
         </t>
        </section>
      </section>
    <section title="Sources of Protocol difficulties">
      <section title="Issues with nfs_client_id4 generation and use">
        <t>
          The current definitive definition of the NFSv4.0 protocol 
          <xref target="RFC3530" />, and the current pending draft of 
          RFC3530bis <xref target="cur-v4.0-bis" /> both agree.  The section
          entitled "Client ID" says: 
        <list>
          <t>
            The second field, id is a variable length string that uniquely
            defines the client.
          </t>
        </list>
        </t>
        <t>
            There are two possible interpretations of the phrase "uniquely
            defines" in the above: 
        <list style="symbols">
           <t>
             The relation between strings and clients is a function from
             such strings to clients so that each string designates a single
             client.
           </t>
           <t>
             The relation between strings and clients is a bijection between
             such strings and clients so that each string designates a single
             client and each client is named by a single string.
           </t>
        </list>
        </t>
        <t>
          The first interpretation would make these client-strings like 
          phone numbers (a single person can have several) while the 
          second would make them like social security numbers.
        </t>
        <t>
          Endless debate about the true meaning of "uniquely defines" in this 
          context is quite possible but not very helpful.  The following points
          should be noted though:
        <list style="symbols">
           <t>
             The second interpretation is more consistent with the way "uniquely
             defines" is used elsewhere in the spec.
           </t>
           <t>
             The spec as now written intends the first interpretation (or is 
             internally inconsistent).  In fact, it recommends, although it doesn't
             "RECOMMEND" that a single client have at least as 
             many client-strings
             as server addresses that it interacts with.  It says, in the third bullet
             point regarding construction of the string (which we shall 
             henceforth refer to as client-string-BP3):
           <list style="empty">
             <t>
               The string should be different for each server 
               network address that the client accesses, rather than 
               common to all server network addresses.  
             </t>
           </list>
           </t>
           <t>
              If internode interactions are limited to those between
              a client and its servers, there is no occasion for servers
              to be concerned with the question of whether two client-strings
              designate the same client, so that there is no occasion
              for the difference in interpretation to matter.
           </t>
           <t>
              When transparent migration of client state occurs between
              two servers, it becomes important to determine when state
              on two different servers is for the same client or not,
              and this distinction becomes very important. 
           </t>
         </list>
         </t>
         <t>
           Given the need for the server to be aware of client identity with
           regard to migrated state, either client-string construction 
           rules will have to change or there will be need to get 
           around current issues, or perhaps a  combination
           of these two will be required.  Later sections will examine the 
           options and propose a solution. 
         </t>
         <t>
           One consideration that may indicate that this cannot remain exactly
           as it is today
           has to do with the fact that the current explanation for this
           behavior is not correct.  The current definitive definition of  
           the NFSv4.0 protocol <xref target="RFC3530" />, and the current 
           pending draft of RFC3530bis <xref target="cur-v4.0-bis" /> both
           agree.  The section entitled "Client ID" says:
         <list>
           <t> 
             The reason is that it may not be possible for the
             client to tell if the same server is listening on 
             multiple network addresses.  If the client issues 
             SETCLIENTID with the same id string to each network 
             address of such a server, the server will
             think it is the same client, and each successive 
             SETCLIENTID will cause the server to begin the process 
             of removing the client's previous leased state.
           </t>
         </list> 
         </t>
         <t>
           In point of fact, a "SETCLIENTID with the same id string" 
           sent to multiple network addresses will be treated as all from
           the same client but will not "cause the server 
           to begin the process of removing the client's previous 
           leased state" unless the server believes it is a newer
           instance of the same client, i.e. if the id is the same and
           there is a different verifier.  If the client does not
           reboot, the verifier should not change.  If it does reboot, the
           verifier will change, and the server should "begin the process 
           of removing the client's previous leased state.
         </t>
         <t>
           The situation of multiple SETCLIENTID requests received by
           a server on multiple network addresses is exactly the same, 
           from the protocol design point of view, as when multiple 
           (i.e. duplicate) SETCLIENTID requests are received by the server
           on a single network address.  The same protocol mechanisms
           that prevent erroneous state deletion in the latter case prevent
           it in the former case.  There is no reason for special handling
           of the multiple-network-appearance case, in this regard. 
         </t>
      </section>
      <section title="Issues with lease proliferation">
        <t>
          It is often felt that this is a consequence of the client-string
          construction issues, and it is certainly the case that the two
          are closely connected in that non-uniform client-strings 
          make it impossible 
          for the server to appropriately combine leases from the same client.
          See <xref target="prop-model-nu" /> for a discussion of
          non-uniform client-strings.
        </t>
        <t>
          However, even where the server could combine leases from the same 
          client, it needs to be clear how and when it will do so, so that
          the client will be prepared.  These issues will have to be addressed
          at various places in the spec.
        </t>
        <t>
          This could be enough only if we are prepared to do away with the
          "should" recommending non-uniform client-strings and 
          replace it with a "should not" or even a "SHOULD NOT". 
          Current client implementation patterns
          make this an unpalatable choice for use as a general solution, but
          it is reasonable to "RECOMMEND" this choice for a well-defined subset
          of clients.
          One alternative would be to create a way 
          for the server to infer from client behavior which leases 
          are held by the same 
          client and use this information to do appropriate lease mergers. 
          Prototyping and detailed specification work has shown that this could
          be done but the resulting complexity is such that a better choice 
          is to
          "RECOMMEND" use of the uniform model for clients supporting
          the migration feature. 
        </t>
      </section>
    </section>
    </section>
        
    <section anchor="poss-all"
             title="Issues to be resolved">
      <section title="Possible changes to nfs_client_id4 client-string">
        <t>
          The fact that the reason given in client-string-BP3 is not valid
          makes the existing "should" insupportable.  We can't either
        <list style="symbols">
          <t>
            Keep a reason we know is invalid.
          </t>
          <t>
            Keep saying "should" without giving a reason.
          </t>
        </list>
        </t>
        <t>
          What are often presented as reasons that motivate use of the
          non-uniform model always turn out to be cases in which, if
          the uniform model were used, the server will treat a client which 
          accesses that server via two different IP addresses as part of 
          a single client, as it in fact is.  This may be disconcerting to 
          a client unaware that the two IP addresses connect to the same 
          server.
          This is thus not a reason to use the non-uniform model but rather
          an illustration of the fact that those using the uniform model
          must use server behavior to determine whether any trunking of
          IP addresses exists, as is described in 
          <xref target='prop-model-u' />. 
        </t>
        <t>
          It is always possible that a valid new reason will be found,
          but so far none has been proposed.  Given the history, the burden 
          of proof should be on those asserting the validity of a 
          proposed new reason.  
        </t>
        <t>
          So we will assume for now that the "should" will have
          to go.  The question is what to replace it with.  
        <list style="symbols">
          <t>
            We can't say "MUST NOT", despite the problems this raises for 
            migration since this is pretty late in the day for such a change.  
            Many currently operating clients obey the existing "should".  
            Similar considerations would apply for "SHOULD NOT" or 
            "should not".
          </t>
          <t>
            Dropping client-string-BP3 entirely is a possibility but, given the 
            context and history, it would just be a confusing version of 
            "SHOULD NOT".
          </t>
          <t>
            Using "MAY" would clearly specify that both ways of doing this
            are valid choices for clients and that servers will have to deal
            with clients that make either choice.
          </t>
          <t>
            This might be modified by a "SHOULD" (or even a "MUST") for
            particular groups of clients.
          </t>
          <t>

            There will have to be some text explaining why a client might make
            either choice 
            but, except for the particular cases referred to above,
            we will have to make sure that it is truly
            descriptive, and not slanted in either direction.
          </t>
        </list>
        </t>
      </section>
      <section anchor="poss-deal"
               title="Possible changes to handle differing nfs_client_id4 string values">
        <t>
          Given the difficulties caused by having different nfs_client_id4 
          client-string values for the same client, we have two choices:
        <list style="symbols">
          <t>
            Deprecate the existing treatment and basically say the client is
            on its own doing migration, if it follows it.
          </t>
          <t>
            Introduce a way of having the client provide client identity 
            information to the server, if it can be done compatibly while 
            staying within the bounds of v4.0.
          </t>
        </list> 
        </t>
      </section>
      <section anchor="poss-ms-other"
               title="Other issues within migration-state sections">
        <t>
          There are a number of issues where the existing text is unclear
          and/or wrong and needs to be fixed in some way.
        <list style="symbols">
          <t>
            Lack of clarity in the discussion of moving clientids (as well as
            stateids) as part of moving state for migration.
          </t>
          <t>
            The discussion of synchronized leases is wrong in that there
            is no way to determine (in the current spec) when leases
            are for the same client and also wrong in suggesting a benefit
            from leases synchronized at the point of transfer.  What is
            needed is merger of leases, which is necessary to keep
            client complexity requirements from getting out of hand.
          </t>
          <t>
            Lack of clarity in the discussion of LEASE_MOVED handling.
          </t>
        </list>
        </t>
      </section>
      <section anchor="poss-other"
               title="Issues within other sections">
        <t>
          There are a number of cases in which certain sections, not
          specifically related to migration require additional clarification.
          This is generally because text that is clear in a context in 
          which  leases and clientids are created in one place and live 
          there forever may need further refinement in the more dynamic
          environment that arises as part of migration.
        </t>
        <t>
          Some examples:
        <list style='symbols'>
          <t>
            Some people are under the impression that updating callback 
            endpoint information for an existing client, which is part of 
            the client's handling of migration, may cause the destination 
            server to free existing state. There needs to be additions 
            to clarify the situation.
          </t>
          <t>
            The handling of the sets of clientid4's maintained by each server
            needs to be clarified.  In particular, the issue of how the
            client adapts to the presumably independent and uncoordinated 
            clientid4 sets needs to be clearly addressed
          </t>
          <t>
            Statements regarding handling of invalid clientid4's need to be
            clarified and/or refined in light of the possibilities that
            arise due to lease motion and merger.
          </t>
        </list>
        </t>
      </section>
    </section>

    <section anchor="prop-res"
             title="Proposed resolution of protocol difficulties">
      <section anchor="prop-string"
               title="Proposed changes: nfs_client_id4 client-string">
        <t>
          We propose replacing client-string-BP3 with the following text
          and adding the following proposed
          <xref target="prop-models" /> to provide implementation guidance.
        <list style='symbols'>
          <t>
             The string MAY be different for each server 
             network address that the client accesses, rather than 
             common to all server network addresses.  The considerations
             that might influence a client to use different strings for
             each are explained in <xref target='prop-models' />.
          </t>
          <t>
             Despite the use of the word "string" for this identifier, 
             and the fact that using strings will often be convenient,
             it should be understood that the protocol defines this as
             opaque data.  In particular, those receiving such an id should
             not assume that it will be in UTF-8 format nor should they
             reject it if it is not.
          </t>
        </list>
        </t>
      </section>
      <section anchor="prop-models" 
               title="Client-string Models (AS PROPOSED)">
        <t>
          One particular aspect of the construction of the nfs4_client_id4
          string has proved recurrently troublesome.  The client has a
          choice of:
        <list style='symbols'>
          <t>
            Presenting the same id string to each server address accessed.
            This is referred to as the "uniform client-string model" and is
            discussed in <xref target="prop-model-u" />.  
          </t>
          <t>
            Presenting a different id string to each server address accessed.
            This is referred to as the "non-uniform client-string model" 
            and is discussed in <xref target="prop-model-nu" />.  
          </t>
        </list>
        </t>
        <t>
          Construction of the client-string has been a troublesome
          issue because of the way in which the NFS protocols have evolved.
        <list style='symbols'>
          <t>
            NFSv3 as a stateless protocol had no need to identify the
            state shared by a particular client-server pair.  Thus
            there was no occasion to consider the question of whether a
            set of requests come from the same client, or whether two 
            server IP addresses are connected to the same server.
            As the environment was one in which the user supplied
            the target server IP address as part of incorporating the
            remote filesystem in the client's file name space, there
            was no occasion to take note of server trunking.  Within
            a stateless protocol, the situation was symmetrical.  The client
            has no server identity information and the server has no 
            client identity information. 
          </t>
          <t>
            NFSv4.1 is a stateful protocol with full support for client
            and server identity determination.  This enables the server to
            be aware when two requests come from the same client (they are
            on sessions sharing a clientid4) and the client to be aware when 
            two server IP addresses are connected to the same server 
            (they return the same server name in responding to an EXCHANGE_ID).
          </t>
        </list>
        </t>
        <t>
          NFSv4.0 is unfortunately halfway between these two.  The two
          client-string models have arisen in attempts to deal with the changing
          requirements of the protocol as implementation has proceeded and 
          features that were not very substantial in <xref target="RFC3530" />, 
          got more substantial. 
        <list style='symbols'>
          <t>
            In the absence of any implementation of the fs_locations-related
            features (replication, referral, and migration), the situation
            is very similar to that of NFSv3, with the addition of state but 
            with no concern to provide accurate client and server identity 
            determination.  This is the situation that gave rise to the 
            non-uniform client-string model.
          </t>
          <t>
            In the presence of replication and referrals, the client may have
            occasion to take advantage of knowledge of server trunking
            information.  Even more important, migration, by transferring
            state among servers, causes difficulties for the non-uniform 
            client-string model, 
            in that the two different client-strings sent to different IP
            addresses may wind up on the same IP address, adding confusion.
          </t>
        </list>
        </t>
        <t>
          Both models have to deal with the asymmetry in client and server
          identity information between client and server.  Each seeks to 
          make the client's and the server's views match.  In the process,
          each encounters some combination of inelegant protocol features
          and/or implementation difficulties.  The choice of which to use
          is up to the client implementer and the sections below try to give
          some useful guidance.
        </t>
        <section anchor="prop-model-nu" 
                 title="Non-Uniform Client-string Model">
          <t>
             The non-uniform client-string model is an attempt to handle
             these matters in NFSv4.0 client implementations in as 
             NFSv3-like a way as possible. 
          </t>
          <t>
             For a client using the non-uniform model, all internal recording
             of clientid4 values is to include, whether explicitly 
             or implicitly,
             the server IP address so that one always has an (IP-address,
             clientid4) pair.  Two such pairs from different servers are always
             distinct even when the clientid4 values are the same, as they
             may occasionally be.  In this model, such equality is always 
             treated as simple happenstance.
          </t>
          <t>
             Making the client-string different on different
             servers means that a server has no way of tying
             together information from the same client and so will
             treat a single client as multiple clients with multiple
             leases for each server network address.  Since there is
             no way in the protocol for the client to determine if
             two network addresses are connected to the same server, the
             resulting lack of knowledge is symmetrical and can result
             in simpler client implementations in which there is a 
             single clientid/lease per server network addresses.  
          </t>
          <t>
             Support for migration, particularly with transparent state
             migration, is more complex in the case of non-uniform
             client-strings.  For example, migration of a lease can result
             in multiple leases for the same client accessing the same
             server addresses, vitiating many of the advantages of
             this approach. 
             Therefore, client implementations that support migration 
             with transparent state migration SHOULD NOT use the non-uniform
             client-string model.
          </t>
        </section>
        <section anchor="prop-model-u" 
                 title="Uniform Client-string Model">
          <t>
             When the client-string is kept uniform, the server has the
             basis to have a single clientid4/lease for each distinct
             client.  The problem that has to be addressed is the lack
             of explicit server identity information, which is made
             available in NFSv4.1.   
          </t>
          <t>
             When the same client-string is given to multiple IP
             addresses, the client can determine whether two IP addresses
             correspond to a single server, based on the server's behavior.
             This is the inverse of the
             strategy adopted for the non-uniform model in which different 
             server IP addresses are told about different clients, simply 
             to prevent a server from manifesting behavior that is 
             inconsistent with there being a single server for each 
             IP address, in line with the traditions of NFS.   So, to compare:
          <list style='symbols'>
             <t>
               In the non-uniform model, servers are told about different
               clients because, if the server were to use accurate information
               as to client identity, 
               two IP addresses on the same server would behave as if they were 
               talking to the same client, which might prove disconcerting to 
               a client not expecting such behavior.
             </t>
             <t>
               In the uniform model, the servers are told about 
               there being a single client, which is, after all, the truth.
               Then, when the server uses this information, two IP addresses
               on the same server will behave as if they are talking to the same
               client, and this difference in behavior allows the client to 
               infer the server IP address trunking configuration, even though 
               NFSv4.0 does not explicitly provide this information.
             <vspace blankLines='1' />
               The approach given below shows one example of how this might be 
               done.  
             </t>
          </list>
          </t>
          <t>
             For a client using the uniform model, clientid4 values are treated
             as important information in determining server trunking patterns.
             For two different IP addresses to return the same clientid4 
             value is a necessary, though not a sufficient condition for
             them to be considered as connected to the same server.
             As a result, when two different IP addresses return the same 
             clientid4, the client needs to determine, using the procedure
             given below or otherwise, whether the IP addresses are connected
             to the same server.  For such clients, all internal recording
             of clientid4 values needs to include, whether explicitly 
             or implicitly, identification of the server from which the 
             clientid4 was received so that one always has a (server
             clientid4) pair.  Two such pairs from different servers are 
             always considered distinct even when the clientid4 values 
             are the same, 
             as they may occasionally be.  
          </t>
          <t>
             In order to make this approach work, the client must have 
             accessible, for each nfs4_client_id4 used (only one in the
             uniform model) a list of all server IP addresses, together
             with the associated clientid4 values.  As a part of the associated
             data structures, there should be the ability to mark a server
             IP structure as having the same server as another and to mark
             an IP-address as currently unresolved.  One way to do this
             is to a allow each such entry to point to another with the
             pointer value being one of:
          <list style='symbols'>
            <t>
              A pointer to another entry for an IP address associated with
              the same server, where that IP address is the first one 
              referenced to access that server. 
            </t>
            <t>
              A pointer to the current entry if there is no earlier IP
              address associated with the same server, i.e. where 
              the current IP 
              address is the first one referenced to access that server.
              We'll refer to such an IP address as the lead IP address for
              a given server.  
            </t>
            <t>
              The value NULL if the address's server identity is currently
              unresolved.
            </t>
          </list>
          </t>
          <t>
             When a SETCLIENTID is done and a clientid4 returned, 
             the data structure is searched for
             a matching clientid4 and processing depends on what is found.
             We will refer to the IP address on which this SETCLIENTID is done
             as X.  The SETCLIENTID will use the common nfs_client_id4 and 
             specify X as part of the callback parameters.  We call the 
             clientid4 and verifier returned by this operation XC and XV.
          </t>
          <t>
             Note that at this point no SETCLIENTID_CONFIRM has yet been 
             done.  This is because we have either established a new 
             clientid4 on
             a previously unknown server or changed the callback parameters 
             on a clientid4 associated with some already known server.  
             We don't
             want to confirm something that we are not sure we want to happen.
          <list style='symbols'>
            <t>
              If no matching clientid4 is found, the IP address X and clientid4
              XC are added to the list and considered as having no existing 
              known IP addresses trunked with it.  The IP address is marked as
              a lead IP address for a new server.  A SETCLIENTID_CONFIRM
              is done using XC and XV.
            </t>
            <t>
              If a matching clientid4 is found which is marked unresolved,
              processing on the new IP address is suspended.  In order to
              simplify processing, there 
              can only be one unresolved IP address 
              for any given clientid4.
            </t>
            <t>
              If one or more matching clientid4's is found, none of which is 
              marked unresolved, the new IP address in entered and marked 
              unresolved.  After applying the steps below to each of the
              lead IP addresses with a matching clientid4, the address will 
              have been
              resolved: either it will be part of the same server as a new
              IP address to be added to an existing set of IP addresses for 
              a server, or it will
              be recognized as a new server.  At the point at which this 
              determination is made, the unresolved
              indication is cleared and any suspended SETCLIENTID 
              processing is restarted 
            </t>
          </list>
          </t>
          <t>
             So for each lead IP address IPn with a clientid4 matching XC, the 
             following steps are done.
          <list style='symbols'>
            <t>
              If the server has an associated stateid S, S is used in a request
              issued on the address X with the fact of whether 
              it is recognized on
              X giving definitive information of X's server identity. 
            </t>
            <t>
              If S is not recognized as valid on X, then X and IPn 
              are recognized as distinct and we go on to the next IPn,
              until we run out of them.
            </t>
            <t>
              If S is recognized as valid on X, then X and IPn 
              are recognized as connected to the same server 
              and the entry for X is marked as associated with IPn.
              The entry is now resolved and processing can be restarted for
              IP addresses whose clientid4 matched XC and whose
              resolution had been deferred.
            </t>
            <t>
              If there is no such S for IPn, a different procedure is used.
              a SETCLIENTID is done to update the callback parameters to
              reflect the possibility that X will be marked as associated
              with the server whose lead IP address is IPn.  So assume
              that we do that SETCLIENTID and get back verifier Vn.
            </t>
            <t>
              Note that we don't want this to happen if address X is not
              associated with this server.  So we do a SETCLIENTID_CONFIRM
              on address IPn using verifier Vn.
            </t>
            <t>
              If the verifier generated on X is accepted on IPn, 
              then X and IPn 
              are recognized as connected to the same server 
              and the entry for X is marked as associated with IPn.
              The entry is now resolved and processing can be restarted for
              IP addresses whose clientid4 matched XC but whose
              resolution had been deferred.
            </t>
            <t>
              If the verifier generated on X is not accepted on IPn, 
              then X and IPn are distinct and the callback update will not 
              be confirmed.  So we go on to the next IPn, until we run out
              of them.
            </t>
          </list>
          </t>
          <t>
            The procedure above has made no explicit mention of the 
            possibility that server reboot can occur at any time.
            To address this possibility the client should periodically
            use the clientid4 XC in RENEW operations, directed to both
            the IP address X and the current lead IP address that is currently
            being tested for identity.  
          <list style='symbols'>
            <t>
              When XC becomes invalid on X, the resolution process should be
              terminated, subject to being redone later.  Before redoing
              the resolution, XC should be checked on all the lead IP
              addresses on which it was valid.  Once a new clientid4 is 
              established on any servers on which XC became invalid, a new 
              clientid4 can be established on X and the resolution process for
              X can be restarted. 
            </t>
            <t>
              When XC does not becomes invalid on X, but becomes invalid on
              the current IPn being tested, it should be concluded that
              X and IPn do not match and that it is time to advance to the
              next IPn, if any.
            </t>
            <t>
              In the event of a reboot detected on any server lead IP, the
              set of IP addresses associated with the server should not 
              change and 
              state should be re-established for the lease as a whole, 
              using all available connected server IP addresses.  It is
              prudent to verify connectivity by doing a RENEW using the new 
              clientid4 on each such server address before using it, 
              however.
            </t>
          </list>
          </t>
          <t>
             If we have run out of IPn's without finding a matching server,
             X is considered as having no existing 
             known IP addresses trunked with it.  The IP address is marked as
             a lead IP address for a new server.  A SETCLIENTID_CONFIRM
             is done using XC and XV.
          </t>
          <t>
             The following are advantages for the implementation of using
             the uniform client-string model:
          <list style='symbols'>
            <t>
              Clients can take advantage of server trunking (and clustering
              with single-server-equivalent semantics) to increase bandwidth 
              or reliability.
            </t>
            <t>
             There are advantages in state management so that,
             for example, we never have a delegation under one clientid
             revoked because of a reference to the same file from the 
             same client
             under a different clientid.  
            </t>
            <t>
             The uniform client-string model allows the server 
             to do any necessary
             automatic lease merger in connection with migration, without 
             requiring any client involvement.
             This consideration is of sufficient weight  
             to cause us RECOMMEND use of the
             uniform client-string model for clients supporting transparent
             state migration.
            </t>
          </list>
          </t>
          <t>
             The following implementation considerations might cause
             issues for client implementations.
          <list style='symbols'>
            <t>
              This model is considerably different from the non-uniform
              model, which most client implementations have been following.
              Until substantial implementation experience is obtained with
              this model, reluctance to embrace something so new is to
              be expected.
            </t>
            <t>
              Mapping between server
              network addresses and leases is more complicated in 
              that it is no longer a one-to-one mapping.              
            </t>
          </list>
          </t>
          <t>
            How to balance these considerations depends on implementation
            goals.
          </t>
        </section>
      </section>
      <section anchor="prop-deal"
               title="Proposed changes: merged (vs. synchronized) leases">
        <t>
           The current definitive definition of  
           the NFSv4.0 protocol <xref target="RFC3530" />, and the current 
           pending draft of RFC3530bis <xref target="cur-v4.0-bis" /> both
           agree.  The section entitled "Migration and State" says:
        <list>
          <t>
            As part of the transfer of information between servers, 
            leases would be transferred as well.  The leases being 
            transferred to the new server will typically have a different 
            expiration time from those for the same client, previously 
            on the old server.  To maintain the property that all leases 
            on a given server for a given client expire at the same time, 
            the server should advance the expiration time to the later of 
            the leases being transferred or the leases already present.  
            This allows the client to maintain lease renewal of both
            classes without special effort: 
          </t>
         </list>
        </t>
        <t>
          There are a number of problems with this and any resolution of our 
          difficulties must address them somehow.
        <list style='symbols'>
           <t>
             The current v4.0 spec recommends that the client make it
             essentially impossible to determine when two leases are from
             "the same client". 
           </t>
           <t>
             It is not appropriate to speak of "maintain[ing] the property
             that all leases on a given server for a given client expire 
             at the same time", since this is not a property that holds
             even in the absence of migration.   A server listening on
             multiple network addresses may have the same client appear as
             multiple clients with no way to recognize the client as the same.      
           </t>
           <t>
             Even if the client identity issue could be resolved, advancing
             the lease time at the point of migration would not maintain the
             desired synchronization property.  The leases would be synchronized
             until one of them was renewed, after which they would be 
             unsynchronized again.
           </t>
        </list>
        </t>
        <t>
           To avoid client complexity, we need to have no more  
           than one lease between
           a single client and a single server.  This requires merger of leases
           since there is no real help from synchronizing them at a single 
           instant.
        </t>
        <t>
           For the uniform model, the destination server would simply merge 
           leases as part of state transfer, since two leases with the
           same nfs_client_id4 values must be for the same client. 
        </t>
        <t>
          We have made the following decisions as far as proposed normative
          statements regarding for state merger.  They reflect the facts that 
          we want to support fully migration support in the simplest way
          possible and that we can't say MUST since we 
          have older clients and servers to deal with.
        <list style='symbols'>
          <t>
            Clients SHOULD use the uniform  client-string model in order 
            to get good migration support.  
          </t>
          <t>
            Servers SHOULD provide automatic lease merger during state 
            migration so that clients using the uniform id model get the 
            support automatically.
          </t>
        </list>
        </t>
        <t>
          If the clients and the servers obey the SHOULD's, having more than a
          single lease for a given client-server pair will be a transient 
          situation, cleaned up as part of adapting to use of migrated state.
        </t>
        <t>
          Since clients and servers will be a mixture of old and new and 
          because nothing is a MUST we have to
          ensure that no combination will show worse behavior than is 
          exhibited by current 
          (i.e. old) clients and servers.
        </t>
      </section>
      <section anchor="prop-ms-other"
               title="Other proposed changes to migration-state sections">
        <section anchor="prop-ms-clid-migr"
               title="Proposed changes: Client ID migration">
          <t>
           The current definitive definition of  
           the NFSv4.0 protocol <xref target="RFC3530" />, and the current 
           pending draft of RFC3530bis <xref target="cur-v4.0-bis" /> both
           agree.  The section entitled "Migration and State" says:
          <list>
            <t>
              In the case of migration, the servers involved in the migration
              of a filesystem SHOULD transfer all server state from the
              original to the new server.  This must be done in a way that is
              transparent to the client.  This state transfer will ease the
              client's transition when a filesystem migration occurs.  If the
              servers are successful in transferring all state, the client
              will continue to use stateids assigned by the original server.
              Therefore the new server must recognize these stateids as valid.
              This holds true for the client ID as well.  Since responsibility
              for an entire filesystem is transferred with a migration event,
              there is no possibility that conflicts will arise on the new
              server as a result of the transfer of locks.
           </t>
         </list>
         </t>
         <t>
           This poses some difficulties, mostly because the part about 
           "client ID" is not clear:
         <list style='symbols'>
           <t>
             It isn't clear what part of the paragraph the "this" in the
             statement "this holds true ..." is meant to signify.
           </t>
           <t>
             The phrase "the client ID" is ambiguous, possibly indicating
             the clientid4 and possibly indicating the nfs_client_id4. 
           </t>
           <t>
             If the text means to suggest that the same clientid4 must be 
             used, the logic is not clear since the issue is not the same as
             for stateids of which there might be many.  Adapting to the change
             of a single clientid, as might happen as a part of lease
             migration, is relatively easy for the client.
           </t>
         </list>
         </t>
         <t>
           We have decided to address this issue as follows, with the relevant
           changes all reflected in <xref target="prop-mig-state" />. 
         <list style='symbols'>
           <t>
             Make it clear that both clientid4 and nfs_client_id4 are to
             be transferred.
           </t>
           <t>
             Indicate that the initial transfer will result in the same
             clientid4 after transfer but this is not guaranteed since
             there may conflict with an existing clientid4 on the destination
             server and because lease merger can result in a change
             of the clientid4.
           </t>
         </list>
         </t>
        </section>
        <section anchor="prop-ms-cb-est"
               title="Proposed changes: Callback re-establishment">
          <t>
           The current definitive definition of  
           the NFSv4.0 protocol <xref target="RFC3530" />, and the current 
           pending draft of RFC3530bis <xref target="cur-v4.0-bis" /> both
           agree.  The section entitled  "Migration and State" says:
         <list>
           <t>
             A client SHOULD re-establish new callback information with 
             the new server as soon as possible, according to sequences 
             described in sections "Operation 35: SETCLIENTID - 
             Negotiate Client ID" and "Operation 36: SETCLIENTID_CONFIRM - 
             Confirm Client ID".  This ensures that server operations
             are not blocked by the inability to recall delegations.
           </t>
          </list>
          </t>
          <t>
            The above will need to be fixed to reflect the possibility of 
            merging of leases and the text to do this appears as part of
            <xref target="prop-mig-state" />. 
          </t>
        </section>
        <section anchor="prop-ms-lm-migr"
               title="Proposed changes: NFS4ERR_LEASE_MOVED rework">
          <t>
           The current definitive definition of  
           the NFSv4.0 protocol <xref target="RFC3530" />, and the current 
           pending draft of RFC3530bis <xref target="cur-v4.0-bis" /> both
           agree.  The 
           section entitled  "Notification of Migrated Lease" says:
         <list>
           <t>
             Upon receiving the NFS4ERR_LEASE_MOVED error, a client 
             that supports filesystem migration MUST probe all 
             filesystems from that server on which it holds open state.  
             Once the client has successfully probed all those 
             filesystems which are migrated, the server MUST resume
             normal handling of stateful requests from that client.
           </t>
         </list>
         </t>
         <t>
           There is a lack of clarity that is prompted by ambiguity 
           about what exactly probing is and what the interlock between
           client and server must be.  This has led to some worry about
           the scalability of the probing process, and although
           the time required does scale linearly with the number of 
           fs's that the client may have state for with respect to a 
           given server, the actual process can be done
           efficiently.
         </t>
         <t>
           To address these issues we propose replacing the above with
           the text addressing NFS4RR_LEASE_MOVED as given in 
           <xref target="prop-notify-migr-lease" />.
         </t>
        </section>
      </section>
      <section anchor="prop-other"
               title="Proposed changes to other sections">
        <section anchor="prop-other-callback-update"
                 title="Proposed changes: callback update">
          <t>
            Some changes are necessary to reduce confusion about the process
            of callback information update and in particular to make it
            clear that no state is freed
            as a result:
          <list style="symbols">          
            <t>
              Make it clear that after migration there are confirmed
              entries for transferred clientid4/nfs_client_id4 pairs.
            </t>
            <t>
              Be explicit in the sections headed "otherwise," in the
              descriptions of SETCLIENTID and SETCLIENTID_CONFIRM,
              that these don't apply in the cases we are concerned about.
            </t>
          </list>          
          </t>
        </section>
        <section anchor="prop-other-clientid4"
                 title="Proposed changes: clientid4 handling">
          <t>
            To address both of the clientid4-related issues mentioned in
            <xref target='poss-other' />, we propose replacing the last three
            paragraphs of the section entitled "Client ID" with the following:
          <list style='empty'>
            <t>
              Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has 
              successfully completed, the client uses the shorthand client 
              identifier, of type clientid4, instead of the longer and 
              less compact nfs_client_id4 structure.  This shorthand client 
              identifier (a client ID) is assigned by the server and should 
              be chosen so that it will not conflict with a client ID 
              previously assigned by same server.  This applies across 
              server restarts or reboots.  
            </t>
            <t>
              Distinct servers MAY assign clientid4's independently, and
              will generally do so.  Therefore, a client has to be prepared
              to deal with multiple instances of the same clientid4 value
              received on distinct IP addresses, denoting separate entities.
              When trunking of server IP addresses is not a consideration,
              a client should keep track of (IP-address, clientid4) pairs,
              so that each pair is distinct.  For a discussion of how to
              address the issue in the face of possible trunking of server 
              IP addresses, see <xref target='prop-models' />.             
            </t>
            <t>
             When a clientid4 is presented to a server and that clientid4
             is not recognized, the server will reject the request with
             the error NFS4ERR_STALE_CLIENTID.   This can occur for a number
             of reasons:
             <list style='symbols'>
               <t>
                 A server reboot causing loss of the server's knowledge of
                 client
               </t>
               <t>
                 Client error sending an incorrect clientid4 or valid 
                 clientid4 to the wrong server.
               </t>
               <t>
                 Loss of lease state due to lease expiration.
               </t>
               <t>
                 Client or server error causing the server to believe that
                 the client has rebooted (i.e. receiving a SETCLIENTID with an
                 nfs_client_id4 which has a matching id and a non-matching 
                 verifier.
               </t>
               <t>
                 Migration of all state under the associated lease causes its
                 non-existence to be recognized on the source server.
               </t>
               <t>
                 Merger of state under the associated lease with another
                 lease under a different clientid causes the clientid4 serving
                 as the source of the merge to cease being recognized on its 
                 server.   
               </t>
            </list>
            </t>
            <t>
               In the event of a server reboot, or loss of lease state due
               to lease expiration, the client must obtain a new clientid4 
               by use of the SETCLIENTID operation and then
               proceed to any other necessary recovery for the server reboot 
               case (See the section entitled "Server Failure and Recovery").
               In cases of server or client error resulting in this error,
               use of SETCLIENTID to establish a new lease is desirable as 
               well. 
            </t>
            <t>
               In the last two cases, different recovery procedures are 
               required.  See <xref target='prop-mig-state' /> for details.
               Note that in cases in which there is any uncertainty about
               which sort of handling is applicable, the distinguishing 
               characteristic is that in reboot-like cases, the clientid4 and
               all associated stateid cease to exist while in migration-related
               cases, the clientid4 ceases to exist while the stateids are 
               still valid. 
            </t>
            <t>
               The client must also employ the SETCLIENTID operation when it
               receives a NFS4ERR_STALE_STATEID error using a stateid derived 
               from its current clientid4, since this indicates a situation,
               such as server reboot which has invalidated the existing 
               clientid4 and associated stateids (see the section
               entitled "lock-owner" for details).
            </t>
            <t>
               See the detailed descriptions of SETCLIENTID and 
               SETCLIENTID_CONFIRM for a complete specification of the 
               operations.
            </t>
          </list>
          </t>

        </section>
      </section>
      <section anchor="prop-mig-state" 
               title="Migration, Replication and State (AS PROPOSED)">

        <t>
          When responsibility for handling a given filesystem is transferred
          to a new server (migration) or the client chooses to use an
          alternate server (e.g., in response to server unresponsiveness) in
          the context of filesystem replication, the appropriate handling of
          state shared between the client and server (i.e., locks, leases,
          stateids, and client IDs) is as described below.  The handling
          differs between migration and replication.  
       </t>
 
       <t>
         If a server replica or a server immigrating a filesystem agrees
         to, or is expected to, accept opaque values from the client that
         originated from another server, then it is a wise implementation
         practice for the servers to encode the "opaque" values in network
         byte order.  When doing so, servers acting as replicas or immigrating
         filesystems will be able to parse values like stateids, directory
         cookies, filehandles, etc. even if their native byte order is
         different from that of other servers cooperating in the replication 
         and migration of the filesystem.
       </t>

     <section title="Migration and State" anchor="migr-and-state">

       <t>
         In the case of migration, the servers involved in the migration
         of a filesystem SHOULD transfer all server state from the
         original to the new server.  This must be done in a way that is
         transparent to the client.  This state transfer will ease the
         client's transition when a filesystem migration occurs.  If the
         servers are successful in transferring all state, the client
         will continue to use stateids assigned by the original server.
         Therefore the new server must recognize these stateids as valid.
       </t>
       <t>
         If transferring stateids from server to server would result in a 
         conflict for an existing stateid for the destination server with
         the existing client, transparent state migration MUST NOT happen
         for that client.  Servers participating in using transparent 
         state migration should co-ordinate their stateid assignment 
         policies to make
         this situation unlikely or impossible.  The means by which this
         might be done, like all of the inter-server interactions for 
         migration, are not specified by the NFS version 4.0 protocol. 
       </t>
       <t>
         Handling of clientid values is similar but not identical. 
         The clientid4 and nfs_client_id4 information (id and verifier) 
         will be transferred with the rest of the state information and 
         the destination server should use that information to determine 
         appropriate clientid4 handling. Although the destination 
         server may make state stored under an existing
         lease available under the clientid4 used on the source server,
         the client should not assume that this is always so.  In particular,
         
       <list style="symbols">          
         <t>
           If there is an existing lease with an nfs_client_id4 that matches
           a migrated lease (same id and verifier), the server SHOULD merge 
           the two, making the union of the sets of stateids available under 
           the clientid4 for the existing lease.
           As part of the lease merger, the expiration time of the lease will 
           reflect renewal done within either of the ancestor leases (and 
           so will reflect the latest of the renewals).
         </t>
         <t>
           If there is an existing lease with an nfs_client_id4 that partially
           matches a migrated lease (same id and a different verifier), 
           the server MUST eliminate one of the two, possibly invalidating 
           one of the ancestor clientid4's.  
           Since verifiers are not ordered, the later lease renewal
           time will prevail.
         </t>
       </list>          
       </t>
       <t>
         When leases are not merged, the transfer of state should result
         in creation of a confirmed client record with empty callback 
         information but matching the {v, x, c} for the transferred
         client information.  This should enable establishment of new
         callback information using SETCLIENTID and SETCLIENTID_CONFIRM.
       </t>
       <t>
         A client may determine the disposition of migrated state by
         using a stateid associated with the migrated state and in
         an operation on the new server and using the associated clientid4
         in a RENEW on the new server.
       <list style="symbols">          
         <t>
           If the stateid is not valid and an error NFS4ERR_BAD_STATEID is
           received, either transparent state migration has not occurred
           or the state was purged due to verifier mismatch.
         </t>
         <t>
           If the stateid is valid and an error NFS4ERR_STALE_CLIENTID is
           received on the RENEW, transparent state migration has occurred 
           and the lease has been merged with an existing lease on the
           destination server.
         </t>
         <t>
           If the stateid is valid and the clientid4  is valid, the
           lease has been transferred intact.
         </t>
       </list>          
       </t>
       <t>
         Since responsibility
         for an entire filesystem is transferred with a migration event,
         there is no possibility that conflicts will arise on the new
         server as a result of the transfer of locks.
       </t>
       <t>
         The servers may choose not to transfer the state information
         upon migration.  However, this choice is discouraged, except
         where specific issues such as stateid conflicts make it necessary.
         In the case of migration without state transfer,
         when the client presents state information from the
         original server (e.g. in a RENEW op or a READ op of zero length),
         the client must be prepared to receive either
         NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new
         server.  The client should then recover its state information as
         it normally would in response to a server failure.  The new server
         must take care to allow for the recovery of state information as
         it would in the event of server restart.
       </t>

       <t>
         When a lease is transferred to a new server (as opposed to being
         merged with a lease already on the new server),
         a client SHOULD re-establish new callback information
         with the new server as soon as possible, according to
         sequences described in sections "Operation 35: SETCLIENTID - 
         Negotiate Client ID" and "Operation 36: SETCLIENTID_CONFIRM - 
         Confirm Client ID".
         This ensures that server operations are not blocked by
         the inability to recall delegations.
       </t>

     </section>
     <section title="Replication and State">

       <t>
         Since client switch-over in the case of replication is not
         under server control, the handling of state is different.
         In this case, leases, stateids and client IDs do not have validity
         across a transition from one server to another.  The client must
         re-establish its locks on the new server.  This can be compared
         to the re-establishment of locks by means of reclaim-type
         requests after a server reboot.  The difference is that the
         server has no provision to distinguish requests reclaiming locks
         from those obtaining new locks or to defer the latter.  Thus,
         a client re-establishing a lock on the new server (by means of
         a LOCK or OPEN request), may have the requests denied due to a
         conflicting lock.  Since replication is intended for read-only
         use of filesystems, such denial of locks should not pose large
         difficulties in practice.  When an attempt to re-establish a lock
         on a new server is denied, the client should treat the situation
         as if its original lock had been revoked.
       </t>

     </section>
     <section title="Notification of Migrated Lease"
              anchor="prop-notify-migr-lease">

       <t>
         In the case of lease renewal, the client may not be submitting
         requests for a filesystem that has been migrated to another server.
         This can occur because of the implicit lease renewal mechanism.
         The client renews a lease containing state of multiple filesystems 
         when submitting a request to any one filesystem at the server.
       </t>

       <t>
         In order for the client to schedule renewal of leases that may
         have been relocated to the new server, the client must find out
         about lease relocation before those leases expire.  To accomplish
         this, all operations which implicitly renew leases for a client
         (such as OPEN, CLOSE, READ, WRITE, RENEW, LOCK, and others),
         will return the error NFS4ERR_LEASE_MOVED if responsibility for
         any of the leases to be renewed has been transferred to a new
         server.  Note that when the transfer of responsibility leaves
         remaining state for that lease on the source server, the lease
         is renewed just as it would have been in the NFS4ERR_OK case,
         despite returning the error.  
         The transfer of responsibility happens when the server receives a 
         GETATTR(fs_locations) from
         the client for each filesystem for which
         a lease has been moved to a new server.  Normally it does this
         after receiving an NFS4ERR_MOVED for an access to the filesystem
         but the server is not required to verify that this happens in
         order to terminate the return of NFS4ERR_LEASE_MOVED.  
         By convention, the compounds containing
         GETATTR(fs_locations) SHOULD include an appended RENEW operation
         to permit the server to identify the client getting the
         information.
       </t>
       <t>
         Note that the NFS4ERR_LEASE_MOVED error is only required 
         when responsibility
         for at least one stateid has been transferred.  In the case of a
         null lease, where the only associated state is a clientid, no
         NFS4ERR_LEASE_MOVED error need be generated.  
       </t>

       <t>
         Upon receiving the NFS4ERR_LEASE_MOVED error, a client that 
         supports filesystem migration MUST perform the necessary GETATTR
         operation for each of the filesystems containing state that 
         have been migrated and so give the server evidence that it is 
         aware of the migration of the filesystem. Once the client has done 
         this for all migrated filesystems on which 
         the client holds state, the server MUST resume normal handling 
         of stateful requests from that client. 
       </t>
       <t>
         One way in which clients can do this efficiently in the presence
         of large numbers of filesystems is described below.  
         This approach divides
         the process into two phases, one devoted to finding the
         migrated filesystems and the second devoted to doing the necessary 
         GETATTRs. 
       </t>
       <t>
         The client can find the migrated filesystems by building and
         issuing one or more 
         COMPOUND requests, each consisting of a set of PUTFH/GETFH
         pairs, each pair using an fh in one of the filesystems in
         question.  All such COMPOUND requests can be done in parallel.  
         The successful completion of such a request indicates that none 
         of the fs's interrogated have been migrated while termination 
         with NFS4ERR_MOVED indicates that the filesystem getting the 
         error has migrated while those interrogated before it in the 
         same COMPOUND have not.  Those whose interrogation follows 
         the error remain in an uncertain state and can be interrogated 
         by restarting the requests from after the point at which 
         NFS4ERR_MOVED was returned or by issuing a new set
         of COMPOUND requests for the filesystems which remain in an 
         uncertain state.
       </t>
       <t>
         Once the migrated filesystems have been found, all that is needed
         is for client to give evidence to the server that it is aware 
         of the migrated status of filesystems found by this process, by 
         interrogating the fs_locations attribute for an fh each of the
         migrated filesystems.  The client can do this building and
         issuing one or more 
         COMPOUND requests, each of which consists of a  
         set of PUTFH operations, each followed by a GETATTR of 
         the fs_locations attribute.  A RENEW follows to help tie
         the operations to the lease returning NFS4ERR_LEASE_MOVED.
         Once the client has done this for all migrated filesystems on which 
         the client holds state, the server will resume normal handling 
         of stateful requests from that client. 
       </t>

       <t>
        In order to support legacy clients that do not handle the
        NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after a
        wait of at least two lease periods, at which time it will resume normal
        handling of stateful requests from all clients. If a client attempts to
        access the migrated files, the server MUST reply NFS4ERR_MOVED.
       </t>

       <t>
         When the client receives an NFS4ERR_MOVED error,
         the client can follow the normal process to obtain the new server
         information (through the fs_locations attribute) and perform
         renewal of those leases on the new server.
         If the server has not had state transferred to it transparently,
         the client will receive either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID
         from the new server, as described above. The client can then recover
         state information as it does in the event of server failure.
       </t>
       <t>
         Aside from recovering from a migration, there are other
         reasons a client may wish to retrieve fs_locations information
         from a server.  When a server becomes unresponsive, for
         example, a client may use cached fs_locations data to discover
         an alternate server hosting the same fs data.  A client may
         periodically request fs_locations data from a server in order
         to keep its cache of fs_locations data fresh.
       </t>

       <t>
         Since a GETATTR(fs_locations)
         operation would be used for refreshing cached fs_locations
         data, a server could mistake such a request as
         indicating recognition of an NFS4ERR_LEASE_MOVED condition.
         Therefore a compound which is not intended to signal that
         a client has recognized a migrated lease SHOULD be prefixed
         with a guard operation which fails with NFS4ERR_MOVED if the
         file handle being queried is no longer present on the server.
         The guard can be as simple as a GETFH operation.
       </t>

       <t>
         Though unlikely, it is possible that the target of such a
         compound could be migrated in the time after the guard
         operation is executed on the server but before the
         GETATTR(fs_locations) operation is encountered.  When a
         client issues a GETATTR(fs_locations) operation as part of
         a compound not intended to signal recognition of a migrated
         lease, it SHOULD be prepared to process fs_locations
         data in the reply that shows the current location of the
         fs is gone.
       </t>
     </section>
     <section title="Migration and the Lease_time Attribute">

       <t>
         In order that the client may appropriately manage its leases
         in the case of migration, the destination server must establish
         proper values for the lease_time attribute.
       </t>

       <t>
         When state is transferred transparently, that state should include
         the correct value of the lease_time attribute.  The lease_time
         attribute on the destination server must never be less than that
         on the source since this would result in premature expiration of
         leases granted by the source server.  Upon migration in which state
         is transferred transparently, the client is under no obligation
         to re-fetch the lease_time attribute and may continue to use
         the value previously fetched (on the source server).
       </t>

       <t>
         In the case in which lease merger occurs as part of state transfer,
         the lease_time attribute of the destination lease remains in 
         effect.  The client can simply renew that lease with its existing
         lease_time attribute.  State in the source lease is renewed at
         the time of transfer so that it cannot expire, as long as the
         destination lease is appropriately renewed.       
       </t>
       <t>
         If state has not been transferred transparently (i.e., the client
         sees a real or simulated server reboot), the client should fetch
         the value of lease_time on the new (i.e., destination) server,
         and use it for subsequent locking requests.  However the server
         must respect a grace period at least as long as the lease_time on
         the source server, in order to ensure that clients have ample
         time to reclaim their locks before potentially conflicting
         non-reclaimed locks are granted.  The means by which the new
         server obtains the value of lease_time on the old server is left
         to the server implementations.  It is not specified by the NFS
         version 4.0 protocol.
       </t>
     </section>
   </section>
  </section>
  <section anchor="rslt-all"
            title="Results of proposed changes">
    <t>
      The purpose of this section is to examine the troubling results
      reported in <xref target="issues-all" />.  We will look at
      the scenarios as they would be handled within the proposal. 
    </t>
    <t>
      Because the choice of uniform vs. non-uniform nfs_client_id4
      id strings is a "SHOULD" in these cases, we will designate
      clients that follow this recommendation by SHOULD-UF-CID.
    </t>
    <t>
      We will also have to take account of the various merger-related 
      "SHOULD" clauses to better understand how they have addressed the issues 
      seen,  we abbreviate these (collectively known as "SHOULD-merges")
      as follows:
    <list style="symbols">
      <t>
        SHOULD-SVR-AM refers to the server obeying the SHOULD which RECOMMENDS
        that they merge leases with identical nfs_client_id4 id strings and
        verifiers.
      </t>
    </list>
    </t>
    <t>
    </t>
    <section  anchor="rslt-fail-free"
              title="Results: Failure to free migrated state on client reboot">

          <t>
            Let's look at the troublesome situation cited in 
            <xref target="issue-fail-free" />.  We have already seen what
            happens when
            
            SHOULD-UF-CID does not hold. 
            Now let's look at the
            situation in which 
            SHOULD-UF-CID
            holds, whether 
            SHOULD-SVR-AM is
            in effect or not.    
          <list style='symbols'>
            <t>
              A client C establishes a clientid4 C1 with server ABC specifying
              an nfs_client_id4 with "id" value "C" and verifier 0x111. 
            </t>
            <t>
              The client begins to access files in filesystem F on server ABC,
              resulting in generating stateids S1, S2, etc. under the lease for 
              clientid C1.  It may also access files on other filesystems on the same
              server.
            </t>
            <t>
              The filesystem is migrated from ABC to server XYZ.  When transparent
              state migration is in effect, stateids S1 and S2 and lease
              {0x111, "C", C1} are now 
              available for use by client C at server XYZ.  So far, so good.
            </t>
            <t>
              Client C reboots and attempts to access data on server XYZ, whether in 
              filesystem F or another.  It does a SETCLIENID with an nfs_client_id4 with 
              "id" value "C" and verifier 0x112.  The state associated with lease
              {0x111, "C", C1} is deleted as part of creating {0x112, "C", C2}.
              No problem.
            </t>
          </list>
          </t>
          <t>
            The correctness signature for this issue is 
          <list>
             <t>
               SHOULD-UF-CID
             </t>
          </list>
             so if you have clients and servers that obey the SHOULD clauses,
             the problem is gone regardless of the choice on the MAY.
          </t>    
      </section>
    <section anchor="rslt-svr-reboot-confusion"
              title="Results: Server reboots resulting in confused lease situation">
          <t>
            Now let's consider the scenario given in 
            <xref target="issue-svr-reboot-confusion" />.  We have already 
            seen what
            happens when 
            SHOULD-UF-CID does not hold 
            .  Now let's look at the
            situation in which 
            SHOULD-UF-CID
            holds and SHOULD-SVR-AM holds as well.
          <list style='symbols'>

            <t>
             Client C talks to server ABC using an nfs_client_id4 id 
             like "C-ABC" and verifier v1.  As a result a lease with clientid4 c.i
             established: {v1, "C-ABC", c.i}.
             
            </t>
            <t>
              fs_a1 migrates from server ABC to server XYZ along with its 
              state.  Now server XYZ also has a lease: {v1, "C-ABC", c.i} 
            </t>
            <t>
              Server ABC reboots.
            </t>
            <t>
              Client C talks to server ABC using an nfs_client_id4 id 
              like "C-ABC" 
              and verifier v1.  As a result a lease with clientid4 c.j
              established: {v1, "C-ABC", c.j}.

            </t>
            <t>
             fs_a2 migrates from server ABC to server XYZ. 
             As part of migration the incoming lease is seen to denote
             same Nfs_client_id4 and so is merged with {v1, "C-ABC, c.i}.
            </t>
            <t>
              Now server XYZ has only one lease that matches {v1, "C_ABC", *}, so
              the problem is solved
            </t>
          </list>
          </t>
          <t>
            Now let's consider the same scenario in the 
            situation in which 
            SHOULD-UF-CID 
            holds 
            and SHOULD-SVR-AM 
            holds as well.
          <list style='symbols'>

            <t>
             Client C talks to server ABC using an nfs_client_id4 id like 
             "C"  and verifier v1.  As a result a lease with clientid4 c.i
             is established: {v1, "C", c.i}.          
            </t>
            <t>
              fs_a1 migrates from server ABC to server XYZ along with its 
              state.  Now XYZ also has a lease: {v1, "C", c.i} 
            </t>
            <t>
              Server ABC reboots.
            </t>
            <t>
              Client C talks to server ABC using an nfs_client_id4 id like
              "C"  and verifier v1.  As a result a lease with clientid4 c.j
              is established: {v1, "C", c.j}.

            </t>
            <t>
             fs_a2 migrates from server ABC to server XYZ. 
             As part of migration the incoming lease is seen to denote
             the same nfs_client_id4 and so is merged with {v1, "C", c.i}.
            </t>
            <t>
              Now server XYZ has only one lease that matches {v1, "C", *}, so
              the problem is solved
            </t>
          </list>
          </t>
          <t>
            The correctness signature for this issue is 
          <list>
             <t>
               SHOULD-SVR-AM
             </t>
          </list>
             so if you have clients and servers that obey the SHOULD clauses,
             the problem is gone regardless of the choice on the MAY.
          </t>    

    </section>
    <section anchor="rslt-clnt-comp"
             title="Results: Client complexity issues">
         <t>
           Consider the following situation:
         <list style='symbols'>            
           <t>
             There are a set of clients C1 through Cn accessing servers 
             S1 through Sm.  Each server manages some significant number
             of filesystems with the filesystem count L being significantly
             greater than m.
           </t>
           <t>
             Each client Cx will access a subset of the servers
             and so will have up to m clientid's, which we will call Cxy for
             server Sy.     
           </t>
           <t>
             Now assume that for load-balancing or other operational reasons,
             numbers of filesystems are migrated among the servers.  As a 
             result, depending on how this handled, the number of clientids
             may explode.  See below.
           </t>
         </list>
         </t>
           <t>
             Now look what will happen under various scenarios:
           <list style="symbols">
             <t>
               We have previously (in <xref target="issue-clnt-comp" />)
               looked at this in 
              case of  client following the non-uniform client-string model.   
               In that case, each
               client-server pair could have up to m clientid's
               and each client will have up to m**2 clientids. If we add 
               the possibility of server reboot, the only bound on a client's 
               clientid count is L.
             </t>
             <t>
               If we look at this in the 
               SHOULD-UF-CID case in which the SHOULD-SVR_AM
               condition holds,
 
               the situation is no
               different.  Although the server has the client identity 
               information that could enable same-client-same-server leases
               to be combined, it does not do so.  We still have up to L
               clientid's per client.
             </t>
             <t>
               On the other hand, if we look at the 
               SHOULD-UF-CID 
               case in which 
               SHOULD-SVR-AM holds, the problem is gone.  There can be no
               more than m clientids per client, and n clientid's per server.
             </t>
           </list>
           </t>
          <t>
            The correctness signature for this issue is 
          <list>
             <t>
               (SHOULD-UF-CID &amp; SHOULD-SVR-AM)
             </t>
          </list>
             so if you have clients and servers that obey the SHOULD clauses,
             the problem is gone regardless of the choice on the MAY.
          </t>    
    </section>
    <section anchor="rslt-sum"
             title="Result summary">
          <t>
             We have seen that (SHOULD-SVR-AM &amp; 
             SHOULD-UF-CID)
             are
             sufficient to solve the problems people have experienced.    
          </t>    
    </section>
  </section>
          
   <section title="Security Considerations">
      <t>
        The current definitive definition of the NFSv4.0 protocol
        <xref target="RFC3530" />, and the current pending draft of
        RFC3530bis <xref target="cur-v4.0-bis" /> both agree.
        The section entitled  "Security Considerations" encourages
        that clients protect the integrity of the SECINFO operation,
        any GETATTR operation for the fs_locations attribute,
        and the operations SETCLIENTID/SETCLIENTID_CONFIRM.
        A migration recovery event can use any or all of these operations.
        We do not recommend any change here.
      </t>
    </section>

    <section title="IANA Considerations">
      <t>
        This document does not require actions by IANA.
      </t>
     </section>        
        
   <section title="Acknowledgements">
      <t>
        The editor and authors of this document gratefully acknowledge
        the contributions of
        Trond Myklebust of NetApp and Robert Thurlow of Oracle.
        We also thank
        Tom Haynes of NetApp and Spencer Shepler of Microsoft
        for their guidance and suggestions.
      </t>
      <t>
        Special thanks go to members of the Oracle Solaris NFS team, especially
        Rick Mesta and James Wahlig,
        for their work implementing an NFSv4.0 migration prototype and identifying
        many of the issues documented here.
      </t>
    </section>
    </middle>
    <back>
        <references title="Normative References">
      &RFC2119;
      &RFC3530;
        </references>

        <references title="Informative References">
      &RFC5661;
        <reference anchor="cur-v4.0-bis"
                   target="http://www.ietf.org/id/draft-ietf-nfsv4-rfc3530bis-16.txt">
        <front>
          <title>Network File System (NFS) Version 4 Protocol</title>

          <author role="editor" initials="T." surname="Haynes">
            <organization>NetApp</organization>
          </author>
          <author role="editor" initials="D." surname="Noveck">
            <organization>EMC</organization>
          </author>

          <date year="2011" />
        </front>
        <annotation>
          Work in progress.
        </annotation>
      </reference>
        </references>
    </back>
</rfc>

