<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
<!ENTITY RFC5661 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5661.xml">
<!ENTITY RFC7530 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7530.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->



<rfc category="info"
     docName="draft-ietf-nfsv4-migration-issues-09"
     ipr="trust200902">
  <front>
    <title abbrev="nfsv4-migr-issues">NFSv4 migration: Implementation experience and spec issues to resolve</title>

    <author initials='D.' surname='Noveck'
            fullname = 'David Noveck' role='editor'>
     <organization abbrev="HPE">
              Hewlett Packard Enterprise
     </organization>
     <address>
       <postal>
         <street>165 Dascomb Road</street>
         <city>Andover</city> 
         <region>MA</region>
         <code>01810</code>
         <country>US</country>
       </postal>

       <phone>+1 978 474 2011</phone>
       <email>davenoveck@gmail.com</email>
     </address>
    </author> 
    <author initials='P.' surname='Shivam'
            fullname = 'Piyush Shivam'>
      <organization abbrev='ORACLE'>Oracle Corporation</organization>
      <address>
        <postal>
          <street>5300 Riata Park Ct.</street>
          <city>Austin</city>
          <region>TX</region>
          <code>78727</code>
          <country>US</country>
        </postal>

        <phone>+1 512 401 1019</phone>
        <email>piyush.shivam@oracle.com</email>
      </address>
    </author>

    <author initials='C.' surname='Lever'
            fullname = 'Charles Lever'>
      <organization abbrev='ORACLE'>Oracle Corporation</organization>
      <address>
        <postal>
          <street>1015 Granger Avenue</street>
          <city>Ann Arbor</city>
          <region>MI</region>
          <code>48104</code>
          <country>US</country>
        </postal>

        <phone>+1 248 614 5091</phone>
        <email>chuck.lever@oracle.com</email>
      </address>
    </author>

    <author initials='B.' surname='Baker'
            fullname = 'Bill Baker'>
      <organization abbrev='ORACLE'>Oracle Corporation</organization>
      <address>
        <postal>
          <street>5300 Riata Park Ct.</street>
          <city>Austin</city>
          <region>TX</region>
          <code>78727</code>
          <country>US</country>
        </postal>

        <phone>+1 512 401 1081</phone>
        <email>bill.baker@oracle.com</email>
      </address>
    </author>




   <date year="2016"/>

   <area>Transport</area>
   <workgroup>NFSv4</workgroup>

    <abstract>
      <t>
        The migration feature of NFSv4 provides for moving responsibility for 
        a single filesystem from one server to another, without disruption 
        to clients.  Recent implementation experience has shown problems 
        in the existing specification for this feature.   This document 
        discusses the issues which have arisen and explores the options
        available for curing the issues.  It also explains the choices made
        regarding updating the NFSv4.0 specification and those to be made
        with regard to the NFSv4.1 specification, in order
        to properly address migration.
      </t>
    </abstract>


  </front>

  <middle>
        
  <section title="Introduction">
      <t>
        This document is in the informational category, and while the 
        facts it reports may have normative implications, any such normative 
        significance reflects the readers' preferences. For example, we 
        may report that the reboot of a client with migrated state results
        in state not being
        promptly cleared and that this will prevent granting of conflicting 
        lock requests at least for the lease time, which is a fact.
        While it is to be expected that client and server implementers will 
        judge this to be a situation that is best avoided, the judgment as to 
        how pressing this issue should be considered is a judgment
        for the reader, and eventually the nfsv4 working group to make.
      </t>
       <t>
        We do explore possible ways in which such issues can be avoided, 
        with minimal negative effects, given that the working group has 
        decided to address these issues, but the choice of exactly how to 
        address these is best given effect in one or more
        standards-track documents and/or errata.
      </t>
      <t>
        This document focuses on NFSv4.0, since that is where the 
        majority of implementation experience has been.  Nevertheless,
        there is discussion of the implications of the NFSv4.0
        experience for migration in NFSv4.1, as well as discussion of
        other issues with regard to the treatment of migration in NFSv4.1. 
      </t>
  </section>
        
  <section title="Conventions">
      <t>
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
        "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted 
        as described in <xref target="RFC2119" />.
      </t>
      <t>
        In the context of this informational document, these normative keywords 
        will always occur in the context of a quotation, most often direct 
        but sometimes indirect. The context will
        make it clear whether the quotation is from:
      <list style='symbols'>
        <t>
          The current definitive definition of the NFSv4.0 protocol
          <xref target="RFC7530" />.
        </t>
        <t>
          The current definitive definition of the NFSv4.1 protocol
          <xref target="RFC5661" />.
        </t>
        <t>
          A proposed or possible text to serve as a replacement for the 
          current definitive document text.  Sometimes, 
          a number of possible alternative
          texts may be listed and benefits and detriments of 
          each examined in turn.  
        </t>
      </list>
      </t>
    </section>
    <section title="NFSv4.0 Implementation Experience">
      <section title="Implementation issues"
               anchor="issues-all">
      <t>
        Note that the examples below reflect current experience which arises
        from clients implementing the recommendation to use different
        nfs_client_id4 id strings for different server addresses, i.e.
        using what is later referred to herein as the "non-uniform
        client-string approach."  
      </t>
      <t>
        This is simply because that is the experience implementers have 
        had.  The reader should not assume that in all cases, 
        this practice is the 
        source of the difficulty.   It may be so in some cases but clearly 
        it is not in all cases.
      </t>
        <section  title="Failure to free migrated state on client reboot"
                  anchor="issue-fail-free">
                  
          <t>
            The following sort of situation has proved troublesome:
          <list style='symbols'>
            <t>
              A client C establishes a clientid4 C1 with server ABC specifying
              an nfs_client_id4 with id string value "C-ABC" and boot
              verifier 0x111. 
            </t>
            <t>
              The client begins to access files in filesystem F on server ABC,
              resulting in generating stateids S1, S2, etc. under the lease for 
              clientid C1.  It may also access files on other filesystems on the same
              server.
            </t>
            <t>
              The filesystem is migrated from server ABC to server XYZ.  
              When transparent state migration is in effect, 
              stateids S1 and S2 and clientid4 C1 are now 
              available for use by client C at server XYZ.  
            </t>
            <t>
              Client C reboots and attempts to access data on server XYZ, whether in 
              filesystem F or another.  It does a SETCLIENTID with an nfs_client_id4 with 
              id string value "C-XYZ" and boot verifier 0x112.  There is thus no occasion to free
              stateids S1 and S2 since they are associated with a different client name and
              so lease expiration is the only way that they can be gotten rid of. 
            </t>
          </list>
          </t>
          <t>
            Note here that while it seems clear to us in this example that 
            C-XYZ and C-ABC are from the same client, the server has no way 
            to determine the structure of the "opaque" 
            id string.  In the protocol, it really is treated as opaque.
            Only the client knows which nfs_client_id4 values
            designate the same client on a different server.
         </t>
        </section>
        <section title="Server reboots resulting in a confused lease situation"
                 anchor="issue-svr-reboot-confusion">
          <t>
            Further problems arise from scenarios like the following.  
          <list style='symbols'>

            <t>
             Client C talks to server ABC using an nfs_client_id4 id string
             such as "C-ABC" and a boot verifier v1.  As a result, a lease 
             with clientid4 c.i is established: {v1, "C-ABC", c.i}.
            </t>
            <t>
              fs_a1 migrates from server ABC to server XYZ along with its 
              state.  Now server XYZ also has a lease: {v1, "C-ABC", c.i}. 
            </t>
            <t>
              Server ABC reboots.
            </t>
            <t>
              Client C talks to server ABC using an nfs_client_id4 id string 
              such as "C-ABC" and a boot verifier v1.  As a result,
              a lease with clientid4 c.j is established: {v1, "C-ABC", c.j}.

            </t>
            <t>
              fs_a2 migrates from server ABC to server XYZ. 
             Now server XYZ also has a lease: {v1, "C-ABC", c.j}. 
            </t>
            <t>

              Now server XYZ has two leases that match {v1, "C-ABC", *}, when
              the protocol clearly assumes there can be only one.
            </t>
          </list>
          </t>
          <t>
            Note that if the client used "C" (rather than "C-ABC") as the
            nfs_client_id4 id string, the exact same situation would arise. 
          </t>
          <t>
            One of the first cases in which this sort of situation has
            resulted in difficulties is in connection with doing a
            SETCLIENTID for callback update.
          </t>
          <t>	
            The SETCLIENTID for callback update only includes the 
            nfs_client_id4, assuming there can only be one such with a
            given nfs_client_id4 value.  If there were multiple, confirmed 
            client records with identical nfs_client_id4 id string values,
	    there would be no way to map the callback update request to the 
            correct client record.   Apart from the migration handling
            specified in <xref target="RFC7530" />, such a situation cannot
            arise. 
          </t>
          <t>	
            One possible accommodation for this particular issue that 
            has been used 
            is to add a RENEW operation along with SETCLIENTID (on a
	    callback update) to disambiguate the client.
          </t>
          <t>	
            When the client updates the callback info to the destination,
	    the client would, by convention, send a compound like this:
          </t>
          <t>	
  	   { RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb }
          </t>
          <t>	
   	   The presence of the clientid4 in the compound would allow the 
           server to
	   differentiate among the various leases that it knows of, all 
           with the same nfs_client_id4 value.
          </t>
          <t>	
            While this would be a reasonable patch for an isolated protocol
            weakness, interoperable clients and servers would require that
            the protocol truly be updated to allow such a situation, 
            specifically that of multiple clientid4's 
            with the same nfs_client_id4 value.
            The protocol is currently designed and implemented assuming this 
            cannot happen.  We need to either prevent the situation from
            happening, or fully adapt to the possibilities which can arise.
            See <xref target="poss-all" /> for a discussion of such issues. 
          </t>

        </section>
        <section title="Client complexity issues"
                 anchor="issue-clnt-comp">
                 
         <t>
           Consider the following situation:
         <list style='symbols'>            
           <t>
             There are a set of clients C1 through Cn accessing servers 
             S1 through Sm.  Each server manages some significant number
             of filesystems with the filesystem count L being significantly
             greater than m.
           </t>
           <t>
             Each client Cx will access a subset of the servers
             and so will have up to m clientids, which we will call Cxy for
             server Sy.     
           </t>
           <t>
             Now assume that for load-balancing or other operational reasons,
             numbers of filesystems are migrated among the servers.  As a 
             result, each client-server pair will have up to m clientids
             and each client will have up to m**2 clientids.  If we add the
             possibility of server reboot, the only bound on a client's clientid
             count is L.
           </t>
         </list>
         </t>
         <t>
           Now, instead of a clientid4 identifying a client-server pair, we
           have many more entities for the client to deal with.  In addition,
           it isn't clear how new state is to be incorporated in this
           structure.
         </t>
         <t>
           The limitations of the migrated state (inability to be freed on
           reboot) would argue against adding more such state but trying
           to avoid that would run into its own difficulties.  For example,
           a single lockowner string presented under two different clientids
           would appear as two different entities.
         </t>
         <t>
           Thus we have to choose between: 
         <list style='symbols'>
           <t>
             indefinite prolongation of foreign clientids even after all
             transferred state is gone.
           </t>       
           <t>
             having multiple requests for the same lockowner-string-named
             entity carried on in parallel by separate identically named
             lockowners under different clientid4's
           </t>       
           <t>
             Adding serialization at the lock-owner string level, in addition
             to that at the lockowner level.       
           </t>       
         </list>            
         </t>
         <t>
           In any case, we have gone (in adding migration as it 
           was described) from a situation in which
         <list style='symbols'>
           <t>
             Each client has a single clientid4/lease for each server it
             talks to.   
           </t>       
           <t>
             Each client has a single nfs_client_id4 for each server it
             talks to.   
           </t>       
           <t>  
             Every state id can be mapped to an associated lease based
             on the server it was obtained from.     
           </t>       
         </list>            
         </t>
         <t>
           To one in which
         <list style='symbols'>
           <t>       
             Each client may have multiple clientid4's for a single server.
           </t>       
           <t>
             For each stateid, the client must separately record the clientid4
             that it is assigned to, or it must manage separate "state blobs" 
             for each fsid and map those to clientid4's.        
           </t>       
           <t>
             Before doing an operation that can result in a stateid, the
             client must either find a "state blob" based on fsid or create
             a new one, possibly with a new clientid4.
           </t>       
           <t>   
             There may be multiple clientid4's all connected to the same
             server and using the same nfs_clientid4.    
           </t>       
         </list>            
         </t>
         <t>
           This sort of additional client complexity is troublesome and needs 
           to be eliminated.
         </t>
        </section>
      </section>
      <section title="Sources of Protocol difficulties">
        <section title="Issues with nfs_client_id4 generation and use">
        <t>
          In the current definitive definition of the NFSv4.0 protocol, 
          <xref target="RFC7530" />, the section  entitled "Client ID" says: 
        <list>
          <t>
            The second field, id is a variable length string that uniquely
            defines the client.
          </t>
        </list>
        </t>
        <t>
          There are two possible interpretations of the phrase "uniquely
          defines" in the above: 
        <list style="symbols">
           <t>
             The relation between strings and clients is a function from
             such strings to clients so that each string designates a single
             client.
           </t>
           <t>
             The relation between strings and clients is a bijection between
             such strings and clients so that each string designates a single
             client and each client is named by a single string.
           </t>
        </list>
        </t>
        <t>
          The first interpretation would make these client-strings like 
          phone numbers (a single person can have several) while the 
          second would make them like social security numbers.
        </t>
        <t>
          Debate about the possible meanings of "uniquely defines" in this 
          context is quite possible but not very helpful.  The following points
          should be noted though:
        <list style="symbols">
           <t>
             The second interpretation is more consistent with the way "uniquely
             defines" is used elsewhere in the spec.
           </t>
           <t>
             The spec as now written intends the first interpretation (or is 
             internally inconsistent).  In fact, it recommends, although 
             non-normatively, that a single client have at least as 
             many client-strings
             as server addresses that it interacts with.  It says, in the 
             third bullet
             point regarding construction of the string (which we shall 
             henceforth refer to as client-string-BP3):
           <list style="empty">
             <t>
               The string should be different for each server 
               network address that the client accesses, rather than 
               common to all server network addresses.  
             </t>
           </list>
           </t>
           <t>
              If internode interactions are limited to those between
              a client and its servers, there is no occasion for servers
              to be concerned with the question of whether two client-strings
              designate the same client, so that there is no occasion
              for the difference in interpretation to matter.
           </t>
           <t>
              When transparent migration of client state occurs between
              two servers, it becomes important to determine when state
              on two different servers is for the same client or not,
              and this distinction becomes very important. 
           </t>
         </list>
         </t>
         <t>
           Given the need for the server to be aware of client identity with
           regard to migrated state, either client-string construction 
           rules will have to change or there will be a need to get 
           around current issues, or perhaps a  combination
           of these two will be required.  Later sections will examine the 
           options and propose a solution. 
         </t>
         <t>
           One consideration that may indicate that this cannot remain exactly
           as it is today
           has to do with the fact that the current explanation for this
           behavior is not correct.            
           In the current definitive definition of the NFSv4.0 protocol 
           <xref target="RFC7530" />, the section entitled "Client ID" says:
         <list>
           <t> 
             The reason is that it may not be possible for the
             client to tell if the same server is listening on 
             multiple network addresses.  If the client issues 
             SETCLIENTID with the same id string to each network 
             address of such a server, the server will
             think it is the same client, and each successive 
             SETCLIENTID will cause the server to begin the process 
             of removing the client's previous leased state.
           </t>
         </list> 
         </t>
         <t>
           In point of fact, a "SETCLIENTID with the same id string" 
           sent to multiple network addresses will be treated as all from
           the same client but will not "cause the server 
           to begin the process of removing the client's previous 
           leased state" unless the server believes it is a different
           instance of the same client, i.e. if the id string is the same and
           there is a different boot verifier.  If the client does not
           reboot, the verifier should not change.  If it does reboot, the
           verifier will change, and it is appropriate that the server
           "begin the process 
           of removing the client's previous leased state.
         </t>
         <t>
           The situation of multiple SETCLIENTID requests received by
           a server on multiple network addresses is exactly the same, 
           from the protocol design point of view, as when multiple 
           (i.e. duplicate) SETCLIENTID requests are received by the server
           on a single network address.  The same protocol mechanisms
           that prevent erroneous state deletion in the latter case prevent
           it in the former case.  There is no reason for special handling
           of the multiple-network-appearance case, in this regard. 
         </t>
        </section>
        <section title="Issues with lease proliferation">
        <t>
          It is often felt that this is a consequence of the client-string
          construction issues, and it is certainly the case that the two
          are closely connected in that non-uniform client-strings 
          make it impossible 
          for the server to appropriately combine leases from the same client.
        </t>
        <t>
          However, even where the server could combine leases from the same 
          client, it needs to be clear how and when it will do so, so that
          the client will be prepared.  These issues will have to be addressed
          at various places in the spec.
        </t>
        <t>
          This could be enough only if we are prepared to do away with the
          "should" recommending non-uniform client-strings and 
          replace it with a "should not" or even a "SHOULD NOT". 
          Current client implementation patterns
          make this an unpalatable choice for use as a general solution, but
          it is reasonable to "RECOMMEND" this choice for a well-defined subset
          of clients.
          One alternative would be to create a way 
          for the server to infer from client behavior which leases 
          are held by the same 
          client and use this information to do appropriate lease mergers. 
          Prototyping and detailed specification work has shown that this could
          be done but the resulting complexity is such that a better choice 
          is to
          "RECOMMEND" use of the uniform client-string approach for 
          clients supporting the migration feature. 
        </t>
        <t>
          Because of the discussion of client-string construction in
          <xref target="RFC7530"/>,
          most existing clients implement
          the non-uniform client-string approach.  As a result, existing
          servers may not have been tested with clients implementing uniform
          client-strings.  As a consequence, care must be taken to preserve 
          interoperability between UCS-capable clients and servers that 
          don't tolerate uniform client strings for one reason or another.
        </t>
        </section>
      </section>
    </section>
        
  <section title="Issues to be resolved in NFSv4.0"
           anchor="poss-all">
             
    <section title="Possible changes to nfs_client_id4 client-string">
        <t>
          The fact that the reason given in client-string-BP3 is not valid
          makes the existing "should" insupportable.  We can't either
        <list style="symbols">
          <t>
            Keep a reason we know is invalid.
          </t>
          <t>
            Keep saying "should" without giving a reason.
          </t>
        </list>
        </t>
        <t>
          What are often presented as reasons that motivate use of the
          non-uniform approach always turn out to be cases in which, if
          the uniform approach were used, the server will treat a client which 
          accesses that server via two different IP addresses as part of 
          a single client, as it in fact is.  This may be disconcerting to 
          a client unaware that the two IP addresses connect to the same 
          server.
          This is not a reason to use the non-uniform approach but is better
          thought of as an illustration of the fact that those using the 
          uniform approach need to be aware of the possibility of
          server trunking and its potential effect on server behavior.  
        </t>
        <t>
          If it is possible to reliably infer the existence of
          trunking of server IP addresses from observed server behavior, 
          use of the uniform approach would be more desirable, although
          compatibility issues would have to be dealt with. 
          <vspace blankLines='1' />
          An alternative to having the client infer the existence of
          trunking of IP server addresses, is to make this information
          available to the client directly.  See <xref target="poss-newop" />
          for details.  
        </t>
        <t>
          It is always possible that a valid new reason will be found,
          but so far none has been proposed.  Given the history, the burden 
          of proof should be on those asserting the validity of a 
          proposed new reason.  
        </t>
        <t>
          So we will assume for now that the "should" will have
          to go.  The question is what to replace it with.  
        <list style="symbols">
          <t>
            We can't say "MUST NOT", despite the problems this raises for 
            migration since this is pretty late in the day for such a change.  
            Many currently operating clients obey the existing "should".  
            Similar considerations would apply for "SHOULD NOT" or 
            "should not".
          </t>
          <t>
            Dropping client-string-BP3 entirely is a possibility but, given the 
            context and history, it would just be a confusing version of 
            "SHOULD NOT".
          </t>
          <t>
            Using "MAY" would clearly specify that both ways of doing this
            are valid choices for clients and that servers will have to deal
            with clients that make either choice.
          </t>
          <t>
            This might be modified by a "SHOULD" (or even a "MUST") for
            particular groups of clients.
          </t>
          <t>

            There will have to be some text explaining why a client might make
            either choice 
            but, except for the particular cases referred to above,
            we will have to make sure that it is truly
            descriptive, and not slanted in either direction.
          </t>
        </list>
        </t>
    </section>
    <section title="Possible changes to handle differing nfs_client_id4 string values"
             anchor="poss-deal">
        <t>
          Given the difficulties caused by having different nfs_client_id4 
          client-string values for the same client, we have two choices:
        <list style="symbols">
          <t>
            Deprecate the existing treatment and basically say the client is
            on its own doing migration, if it follows it.
          </t>
          <t>
            Introduce a way of having the client provide client identity 
            information to the server, if it can be done compatibly while 
            staying within the bounds of v4.0.
          </t>
        </list> 
      </t>
    </section>
    <section title="Possible changes to add a new operation"
             anchor="poss-newop">
      <t>
        It might be possible to return server-identity information to 
        the client, just as is done in NFSv4.1 by the response to the 
        EXCHANGE_ID operation.  This could be done by a SETCLIENTID_PLUS
        optional operation, which acts like SETCLIENTID, except that it 
        returns server identity information.  Such information could be 
        used by clients, making it possible to for them to be aware of 
        server trunking relationships, rather than having to infer them
        from server behavior. 
      </t>
      <t>
        It has been generally thought that protocol extensions such as this
        are not appropriate
        in bis documents and other documents updating
        NFSv4 protocol definition RFC's.  However, 
        <xref target="NFSv4-vers" /> discusses means by 
        which protocol extensions, similar to
        those allowed between minor versions, can be used 
        to correct protocol mistakes.
      </t>
      <t>
        A decision to adopt this approach would require waiting for 
        <xref target="NFSv4-vers" /> to become a Proposed Standard. 
        In view of the time necessary for that to happen, this approach is
        not expected to be adopted in an RFC updating 
        <xref target="RFC7530" />, such as 
        <xref target="migr-v4.0-update" />.  Still, it is worth keeping
        in mind, if implementers have difficulties inferring trunking
        relationships using the techniques discussed there. 
      </t>
    </section>
    <section title="Other issues within migration-state sections"
             anchor="poss-ms-other">
               
        <t>
          There are a number of issues where the existing text is unclear
          and/or wrong and needs to be fixed in some way.
        <list style="symbols">
          <t>
            Lack of clarity in the discussion of moving clientids (as well as
            stateids) as part of moving state for migration.
          </t>
          <t>
            The discussion of synchronized leases is wrong in that there
            is no way to determine (in the current spec) when leases
            are for the same client and also wrong in suggesting a benefit
            from leases synchronized at the point of transfer.  What is
            needed is merger of leases, which is necessary to keep
            client complexity requirements from getting out of hand.
          </t>
          <t>
            Lack of clarity in the discussion of LEASE_MOVED handling,
            including failure to fully address situations in which 
            transparent state migration did not occur.  
          </t>
        </list>
        </t>
    </section>
    <section title="Issues within other sections"
             anchor="poss-other">
               
        <t>
          There are a number of cases in which certain sections, not
          specifically related to migration, require additional clarification.
          This is generally because text that is clear in a context in 
          which  leases and clientids are created in one place and live 
          there forever may need further refinement in the more dynamic
          environment that arises as part of migration.
        </t>
        <t>
          Some examples:
        <list style='symbols'>
          <t>
            Some people are under the impression that updating callback 
            endpoint information for an existing client, as used during
            migration, may cause the destination 
            server to free existing state. There need to be additions 
            to clarify the situation.
          </t>
          <t>
            The handling of the sets of clientid4's maintained by each server
            needs to be clarified.  In particular, the issue of how the
            client adapts to the presumably independent and uncoordinated 
            clientid4 sets needs to be clearly addressed
          </t>
          <t>
            Statements regarding handling of invalid clientid4's need to be
            clarified and/or refined in light of the possibilities that
            arise due to lease motion and merger.
          </t>
          <t>
            Confusion and lack of clarity about NFS4ERR_CLID_INUSE. 
          </t>
        </list>
        </t>
    </section>
  </section>

  <section title="Proposed resolution of NFSv4.0 protocol difficulties"
           anchor="prop-res">
    <t>
      This section lists the changes which we believe are necessary to 
      resolve the difficulties mentioned above.  Such changes, along with
      other clarifications found to be desirable during drafting and review
      are contained in <xref target="migr-v4.0-update" />. 
    </t>
    <section title="Proposed changes: nfs_client_id4 client-string"
             anchor="prop-string">
               
        <t>
          We propose replacing client-string-BP3 with the following text:
        <list style="none">
          <t>
             The string MAY be different for each server 
             network address that the client accesses, rather than 
             common to all server network addresses.  
          </t>
        </list>
        </t>
        <t>
          In addition, given the importance of the issue of client
          identity and the fact that both client string-approaches are
          to be considered valid, a greatly expanded treatment of client 
          identity desirable.  It should have the following 
          major elements.
        <list style="symbols">
          <t>
            It should fully describe the consequences of making the
            string different for each network address (the non-uniform
            client-string approach) and of making it the same for
            all network addresses (the uniform client string approach). 
          </t>
          <t>
            It should give helpful guidance about the factors that
            might affect client implementation choice between these 
            approaches.
          </t>
          <t>
            It should describe the compatibility issues that might cause
            servers to be incompatible with the uniform approach and
            give guidance about dealing with these. 
          </t>
          <t>
            It should describe how a client using the uniform approach
            might use server behavior to determine server address trunking
            patterns.
          </t>
          <t>
            It should present a clearer and more complete set of 
            recommendations to guide client string construction.
          </t>
        </list>
        </t>
      </section>
    <section title="Proposed changes: merged (vs. synchronized) leases"
             anchor="prop-deal">
               
        <t>
            In the current definitive definition of the NFSv4.0 protocol, 
            <xref target="RFC7530" />, 
            the section entitled "Migration and State" says:
        <list>
          <t>
            As part of the transfer of information between servers, 
            leases would be transferred as well.  The leases being 
            transferred to the new server will typically have a different 
            expiration time from those for the same client, previously 
            on the old server.  To maintain the property that all leases 
            on a given server for a given client expire at the same time, 
            the server should advance the expiration time to the later of 
            the leases being transferred or the leases already present.  
            This allows the client to maintain lease renewal of both
            classes without special effort: 
          </t>
         </list>
        </t>
        <t>
          There are a number of problems with this and any resolution of our 
          difficulties must address them somehow.
        <list style='symbols'>
           <t>
             The current v4.0 spec recommends that the client make it
             essentially impossible to determine when two leases are from
             "the same client". 
           </t>
           <t>
             It is not appropriate to speak of "maintain[ing] the property
             that all leases on a given server for a given client expire 
             at the same time", since this is not a property that holds
             even in the absence of migration.   A server listening on
             multiple network addresses may have the same client appear as
             multiple clients with no way to recognize the client as the same.      
           </t>
           <t>
             Even if the client identity issue could be resolved, advancing
             the lease time at the point of migration would not maintain the
             desired synchronization property.  The leases would be synchronized
             until one of them was renewed, after which they would be 
             unsynchronized again.
           </t>
        </list>
        </t>
        <t>
           To avoid client complexity, we need to have no more  
           than one lease between
           a single client and a single server.  This requires merger of leases
           since there is no real help from synchronizing them at a single 
           instant.
        </t>
        <t>
           For the uniform approach, the destination server would simply merge 
           leases as part of state transfer, since two leases with the
           same nfs_client_id4 values must be for the same client. 
        </t>
        <t>
          We have made the following decisions as far as proposed normative
          statements regarding for state merger.  They reflect the facts that 
          we want to allow full migration support in the simplest way
          possible and that we can't say MUST since we 
          have older clients and servers to deal with.
        <list style='symbols'>
          <t>
            Clients MAY use the uniform client-string approach and are 
            well-advised to do so if they are concerned about getting good 
            migration support.  
          </t>
          <t>
            Servers SHOULD provide automatic lease merger during state 
            migration so that clients using the uniform id approach get the 
            support automatically.
          </t>
        </list>
        </t>
        <t>
          If servers obey the SHOULD and clients choose to 
          adopt the uniform id approach, having more than a
          single lease for a given client-server pair will be a transient 
          situation, cleaned up as part of adapting to use of migrated state.
        </t>
        <t>
          Since clients and servers will be a mixture of old and new and 
          because nothing is a MUST we have to
          ensure that no combination will show worse behavior than is 
          exhibited by current 
          (i.e. old) clients and servers.
        </t>
    </section>
    <section title="Other proposed changes to migration-state sections"
             anchor="prop-ms-other">
               
      <section title="Proposed changes: Client ID migration"
               anchor="prop-ms-clid-migr">
               
          <t>
            In the current definitive definition of the NFSv4.0 protocol 
            <xref target="RFC7530" />,
            the section entitled "Migration and State" says:
          <list style='empty'>
            <t>
              In the case of migration, the servers involved in the migration
              of a filesystem SHOULD transfer all server state from the
              original to the new server.  This must be done in a way that is
              transparent to the client.  This state transfer will ease the
              client's transition when a filesystem migration occurs.  If the
              servers are successful in transferring all state, the client
              will continue to use stateids assigned by the original server.
              Therefore the new server must recognize these stateids as valid.
              This holds true for the client ID as well.  Since responsibility
              for an entire filesystem is transferred with a migration event,
              there is no possibility that conflicts will arise on the new
              server as a result of the transfer of locks.
           </t>
         </list>
         </t>
         <t>
           This poses some difficulties, mostly because the part about 
           "client ID" is not clear:
         <list style='symbols'>
           <t>
             It isn't clear what part of the paragraph the "this" in the
             statement "this holds true ..." is meant to signify.
           </t>
           <t>
             The phrase "the client ID" is ambiguous, possibly indicating
             the clientid4 and possibly indicating the nfs_client_id4. 
           </t>
           <t>
             If the text means to suggest that the same clientid4 must be 
             used, the logic is not clear since the issue is not the same as
             for stateids of which there might be many.  Adapting to the change
             of a single clientid, as might happen as a part of lease
             migration, is relatively easy for the client.
           </t>
         </list>
         </t>
         <t>
           We have decided that it is best to address this issue as follows: 
         <list style='symbols'>
           <t>
             Make it clear that both clientid4 and nfs_client_id4 
             (including both id string and boot verifier) are to
             be transferred.
           </t>
           <t>
             Indicate that the initial transfer will result in the same
             clientid4 after transfer but this is not guaranteed since
             there may conflict with an existing clientid4 on the destination
             server and because lease merger can result in a change
             of the clientid4.
           </t>
         </list>
         </t>
      </section>
      <section title="Proposed changes: Callback re-establishment"
               anchor="prop-ms-cb-est">
               
          <t>
           In the current definitive definition of the NFSv4.0 protocol 
           <xref target="RFC7530" />,
           the section entitled  "Migration and State" says:
         <list>
           <t>
             A client SHOULD re-establish new callback information with 
             the new server as soon as possible, according to sequences 
             described in sections "Operation 35: SETCLIENTID - 
             Negotiate Client ID" and "Operation 36: SETCLIENTID_CONFIRM - 
             Confirm Client ID".  This ensures that server operations
             are not blocked by the inability to recall delegations.
           </t>
          </list>
          </t>
          <t>
            The above will need to be fixed to reflect the possibility of 
            merging of leases, 
          </t>
      </section>
      <section title="Proposed changes: NFS4ERR_LEASE_MOVED rework"
               anchor="prop-ms-lm-migr">
               
          <t>
           In the current definitive definition of the NFSv4.0 protocol 
           <xref target="RFC7530" />,
           the 
           section entitled  "Notification of Migrated Lease" says:
         <list>
           <t>
             Upon receiving the NFS4ERR_LEASE_MOVED error, a client 
             that supports filesystem migration MUST probe all 
             filesystems from that server on which it holds open state.  
             Once the client has successfully probed all those 
             filesystems which are migrated, the server MUST resume
             normal handling of stateful requests from that client.
           </t>
         </list>
         </t>
         <t>
           There is a lack of clarity that is prompted by ambiguity 
           about what exactly probing is and what the interlock between
           client and server must be.  This has led to some worry about
           the scalability of the probing process, and although
           the time required does scale linearly with the number of 
           filesystems that the client may have state for with respect 
           to a given server, the actual process can be done
           efficiently.
         </t>
         <t>
           To address these issues we propose rewriting the above to
           be more clear and to give suggestions about how to do the 
           required scanning efficiently.
         </t>
      </section>
    </section>
    <section title="Proposed changes to other sections"
             anchor="prop-other">
               
      <section title="Proposed changes: callback update"
               anchor="prop-other-callback-update">
                 
          <t>
            Some changes are necessary to reduce confusion about the process
            of callback information update and in particular to make it
            clear that no state is freed
            as a result:
          <list style="symbols">          
            <t>
              Make it clear that after migration there are confirmed
              entries for transferred clientid4/nfs_client_id4 pairs.
            </t>
            <t>
              Be explicit in the sections headed "otherwise," in the
              descriptions of SETCLIENTID and SETCLIENTID_CONFIRM,
              that these don't apply in the cases we are concerned about.
            </t>
          </list>          
          </t>
      </section>
      <section title="Proposed changes: clientid4 handling"
               anchor="prop-other-clientid4">
                 
          <t>
            To address both of the clientid4-related issues mentioned in
            <xref target='poss-other' />, we propose replacing the last three
            paragraphs of the section entitled "Client ID" with the following:
          <list style='empty'>
            <t>
              Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has 
              successfully completed, the client uses the shorthand client 
              identifier, of type clientid4, instead of the longer and 
              less compact nfs_client_id4 structure.  This shorthand client 
              identifier (a client ID) is assigned by the server and should 
              be chosen so that it will not conflict with a client ID 
              previously assigned by same server.  This applies across 
              server restarts or reboots.  
            </t>
            <t>
              Distinct servers MAY assign clientid4's independently, and
              will generally do so.  Therefore, a client has to be prepared
              to deal with multiple instances of the same clientid4 value
              received on distinct IP addresses, denoting separate entities.
              When trunking of server IP addresses is not a consideration,
              a client should keep track of (IP-address, clientid4) pairs,
              so that each pair is distinct.  In the face of possible 
              trunking of server IP addresses, the client will use the
              receipt of the same clientid4 from multiple IP-addresses,
              as an indication that the two IP-addresses may be trunked and
              proceed to determine, from the observed server behavior whether
              the two addresses are in fact trunked.     
            </t>
            <t>
             When a clientid4 is presented to a server and that clientid4
             is not recognized, the server will reject the request with
             the error NFS4ERR_STALE_CLIENTID.   This can occur for a number
             of reasons:
             <list style='symbols'>
               <t>
                 A server reboot causing loss of the server's knowledge of
                 the client
               </t>
               <t>
                 Client error sending an incorrect clientid4 or a valid 
                 clientid4 to the wrong server.
               </t>
               <t>
                 Loss of lease state due to lease expiration.
               </t>
               <t>
                 Client or server error causing the server to believe that
                 the client has rebooted (i.e. receiving a SETCLIENTID with an
                 nfs_client_id4 which has a matching id string and a 
                 non-matching boot verifier).
               </t>
               <t>
                 Migration of all state under the associated lease causes its
                 non-existence to be recognized on the source server.
               </t>
               <t>
                 Merger of state under the associated lease with another
                 lease under a different clientid causes the clientid4 serving
                 as the source of the merge to cease being recognized on its 
                 server.   
               </t>
            </list>
            </t>
            <t>
               In the event of a server reboot, or loss of lease state due
               to lease expiration, the client must obtain a new clientid4 
               by use of the SETCLIENTID operation and then
               proceed to any other necessary recovery for the server reboot 
               case (See the section entitled "Server Failure and Recovery").
               In cases of server or client error resulting in this error,
               use of SETCLIENTID to establish a new lease is desirable as 
               well. 
            </t>
            <t>
               In the last two cases, different recovery procedures are 
               required.  
               Note that in cases in which there is any uncertainty about
               which sort of handling is applicable, the distinguishing 
               characteristic is that in reboot-like cases, the clientid4 and
               all associated stateids cease to exist while in 
               migration-related
               cases, the clientid4 ceases to exist while the stateids are 
               still valid. 
            </t>
            <t>
               The client must also employ the SETCLIENTID operation when it
               receives a NFS4ERR_STALE_STATEID error using a stateid derived 
               from its current clientid4, since this indicates a situation,
               such as server reboot which has invalidated the existing 
               clientid4 and associated stateids (see the section
               entitled "lock-owner" for details).
            </t>
            <t>
               See the detailed descriptions of SETCLIENTID and 
               SETCLIENTID_CONFIRM for a complete specification of the 
               operations.
            </t>
          </list>
          </t>

      </section>
      <section title="Proposed changes: NFS4ERR_CLID_INUSE"
               anchor="prop-ms-clid-inuse">
        <t>
          It appears to be the intention that only a single principal 
          be used for client establishment between any client-server 
          pair.  However:
        <list style='symbols'>
          <t>
            There is no explicit statement to this effect.
          </t>
          <t>
            The error that indicates a principal conflict has
            a name which does not clarify this issue: NFS4ERR_CLID_INUSE.
          </t>
          <t>
            The definition of the error is also not very helpful: "The 
            SETCLIENTID operation has found that a client id is already 
            in use by another client".
          </t>
        </list>
        </t>
        <t>
          As a result, servers exist which reject a SETCLIENTID simply
          because there already exists a clientid for the same client,
          established using a different IP address.  Although this is
          generally understood to be erroneous, such servers still
          exist and the spec should make the correct behavior clear. 
        </t>
        <t>
          Although the error name cannot be changed, the following changes
          should be made to avoid confusion:
        <list style='symbols'>
          <t>
            The definition of the error should be changed to read as
            follows:
          <list style='empty'>
            <t>
              The SETCLIENTID operation has found that the specified
              nfs_client_id4 was previously presented with a different 
              principal and that client instance 
              currently holds an active lease.  A server MAY return this
              error if the same principal is used but a change in 
              authentication flavor gives good reason to reject the 
              new SETCLIENTID operation as not bona fide. 
            </t>
          </list>
          </t>
          <t>
            In the description of SETCLIENTID, the phrase "then the 
            server returns a NFS4ERR_CLID_INUSE error" should be expanded 
            to read "then the server returns a NFS4ERR_CLID_INUSE error,
            since use of a single client with multiple principals is not 
            allowed."
          </t>
        </list>
        </t>
      </section>
    </section>
  </section>
  <section title="Issues for NFSv4.1"
           anchor="issues-41">
          <t>
            Because NFSv4.1 embraces the uniform client-string approach,
            as advised by section 2.4 of 
            <xref target="RFC5661" />, addressing migration issues is simpler. 
          </t>
          <t>
            Nevertheless, there are some issues that will have to be
            addressed.  Some examples:
          <list style='symbols'>
            <t>
              The other necessary part of addressing
              migration issues, providing for the server's merger of leases that
              relate to the same client,
              is not currently addressed by NFSv4.1 and changes need
              to be made to make it clear that state needs to be 
              appropriately merged as part of migration, to avoid
              multiple clientids between a client-server pair. 
            </t>
            <t>
              There needs to be some clarification of how migration, and 
              particularly transparent state migration, should interact 
              with pNFS layouts.
            </t>
            <t>
              The current discussion (in <xref target="RFC5661" />), 
              of the possibility of server_owner changes is incomplete 
              and confusing.
            </t>
          </list>
          </t>
          <t>
            Discussion of how to resolve these issues will appear in the
            sections below.  
          </t>

    <section title="Addressing state merger in NFSv4.1"
             anchor="state-merger-41">
          <t>
            The existing treatment of state transfer in 
            <xref target="RFC5661" />, has similar problems to that in
            <xref target="RFC7530" /> 
            in that it assumes that the
            state for multiple filesystems on different servers will not be
            merged to so that it appears under a single common clientid.
            We've already seen the reasons that this is a problem, with
            regard to NFSv4.0.	 
          </t>
          <t>
            Although we don't have the problems stemming from the
            non-uniform client-string approach, there are a number of
            complexities in the existing treatment of state management
            in the section entitled "Lock State and File System Transitions"
            in <xref target="RFC5661" /> that make this non-trivial to address:
          <list style='symbols'>
            <t>
              Migration is currently treated together with other
              sorts of filesystem transitions including transitioning between
              replicas without any NFS4ERR_MOVED errors.
            </t>
            <t>
              There is separate handling and discussion of the cases of
              matching and non-matching server scopes.
            </t>
            <t>
              In the case of matching server scopes, the text calls for
              an impossible degree of transparency.
            </t>
            <t>
              In the case of non-matching server scopes, the text  
              does not mention transparent state migration at all, 
              resulting in a functional regression from NFSV4.0
              
            </t>
          </list>
          </t>
    </section>
    <section title="Addressing pNFS relationship with migration"
             anchor="migration-pnfs-41">
          <t>
            This is made difficult because, within the PNFS framework,
            migration might mean any of several things:
          <list style="symbols">
            <t>
              Transfer of the MDS, leaving DS's alone.
              <vspace blankLines='1' />  
              This would be minimally disruptive to those using layouts
              but would require the pNFS control protocol to support
              the DS being directed to a new MDS. 
            </t>
            <t>
              Transfer of a DS, leaving everything else in place.
            <vspace blankLines='1' />  
              Such a transfer can be handled without using migration at
              all.  The server can recall/revoke layouts, as appropriate.
            </t>
            <t>
              Transfer of the filesystem to a new filesystem with both
              MDS and DS's moving.
            <vspace blankLines='1' />
              In such a transfer, an entirely different set of DS's will
              be at the target location.  There may even be no pNFS support
              on the destination filesystem at all.  
            </t>
          </list> 
          </t>
          <t>
            Migration needs to support both the first and last of these 
            models. 
          </t>
    </section>
    <section title="Addressing server owner changes in NFSv4.1"
             anchor="server-owner-41">
          <t>
            Section 2.10.5 of <xref target="RFC5661" /> states the
            following.
          <list>
            <t>
              The client should be prepared for the possibility that
              eir_server_owner values may be different on subsequent 
              EXCHANGE_ID requests made to the same network address, 
              as a result of various sorts of reconfiguration events.  
              When this happens and the changes result in the invalidation 
              of previously valid forms of trunking, the client should 
              cease to use those forms, either by dropping
              connections or by adding sessions.  For a discussion of 
              lock reclaim as it relates to such reconfiguration events, 
              see Section 8.4.2.1.
            </t>
          </list>
          </t>
          <t>
            While this paragraph is literally true in that such
            reconfiguration events can happen and clients have to
            deal with them, it is confusing in that it can be read as
            suggesting that clients have to deal with them
            without disruption, which in general is impossible.
          </t>
          <t>
            A clearer alternative would be:
          <list>
            <t>
              It is always possible that, as a result of various sorts 
              of reconfiguration events, eir_server_scope and 
              eir_server_owner values may be different on subsequent 
              EXCHANGE_ID requests made to the same network address. 
            </t>
            <t>
              In most cases such reconfiguration events will be 
              disruptive and indicate that an IP address formerly connected
              to one server is now connected to an entirely different one. 
            </t>
            <t>
              Some guidelines on client handling of such situations follow:
            <list style ='symbols'>
              <t>
                When eir_server_scope changes, the client has no assurance
                that any id's it obtained previously (e.g. file handles) can
                be validly used on the new server, and, even if the new 
                server accepts them, there is no assurance that this is not 
                due to accident.  Thus it is best to treat all such state 
                as lost/stale although a client may assume that the 
                probability  of inadvertent acceptance is low and treat 
                this situation as within the next case. 
              </t>
              <t>
                When eir_server_scope remains the same and 
                eir_server_owner.so_major_id changes, the client can use 
                filehandles it has and attempt reclaims.  It may find that
                these are now stale but if NFS4ERR_STALE is not received,
                he can proceed to reclaim his opens. 
              </t>
              <t>
                When eir_server_scope and 
                eir_server_owner.so_major_id remain the same,
                the client has to use the now-current values
                of eir_server-owner.so_minor_id in deciding on appropriate 
                forms of trunking.
              </t>
            </list>
            </t>
          </list>
          </t>
    </section>
  </section>

          
  <section title="Security Considerations">
      <t>
        With regard to NFSv4.0, the Security Considerations section of
        <xref target="RFC7530" />
        encourages
        clients to protect the integrity of the SECINFO operation,
        any GETATTR operation for the fs_locations attribute.
        A needed change is to include the operations 
        SETCLIENTID/SETCLIENTID_CONFIRM as among those for which 
        integrity protection is recommended.
        A migration recovery event can use any or all of these operations.
      </t>
      <t>
        With regard to NFSv4.1, the Security Considerations section of
        <xref target="RFC5661" /> takes proper care of migration-related
        issues.  No change is needed.
      </t>
  </section>

  <section title="IANA Considerations">
      <t>
        This document does not require actions by IANA.
      </t>
   </section>        
        
   <section title="Acknowledgements">
      <t>
        The editor and authors of this document gratefully acknowledge
        the contributions of
        Trond Myklebust of NetApp and Robert Thurlow of Oracle.
        We also thank
        Tom Haynes of NetApp and Spencer Shepler of Microsoft
        for their guidance and suggestions.
      </t>
      <t>
        Special thanks go to members of the Oracle Solaris NFS team, especially
        Rick Mesta and James Wahlig,
        for their work implementing an NFSv4.0 migration prototype and identifying
        many of the issues documented here.
      </t>
  </section>
  </middle>
  <back>
        <references title="Normative References">
      &RFC2119;
      &RFC5661;
      &RFC7530;
      </references>

      <references title="Informative References">
        <reference anchor="NFSv4-vers"
                   target="http://www.ietf.org/id/draft-ietf-nfsv4-versioning-03.txt">
        <front>
          <title>NFSv4 Version Management</title>

          <author initials="D." surname="Noveck">
            <organization>HP</organization>
          </author>

          <date year="2016" />
        </front>
        <annotation>
          Work in progress.
        </annotation>
        </reference>
        <reference anchor="migr-v4.0-update"
                   target="http://www.ietf.org/id/draft-ietf-nfsv4-rfc3530-migration-update-08.txt">
        <front>
          <title>NFSv4.0 migration: Specification Update</title>

          <author role="editor" initials="D." surname="Noveck">
            <organization>EMC</organization>
          </author>
          <author  initials="P." surname="Shivam">
            <organization>Oracle</organization>
          </author>
          <author  initials="C." surname="Lever">
            <organization>Oracle</organization>
          </author>
          <author  initials="B." surname="Baker">
            <organization>Oracle</organization>
          </author>

          <date year="2016" />
        </front>
        <annotation>
          Work in progress.
        </annotation>
      </reference>
        </references>
    </back>
</rfc>

