idnits 2.17.1 

draft-ietf-nfsv4-minorversion1-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 21340.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line
     21351.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line
     21358.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line
     21364.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 882 has weird spacing: '...privacy    no ...'

  == Line 901 has weird spacing: '...privacy    no ...'

  == Line 912 has weird spacing: '...privacy    no ...'

  == Line 3056 has weird spacing: '...str_cis    ser...'

  == Line 3358 has weird spacing: '...str_cis   nii_...'

  == (24 more instances...)

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     If the server determines that the client holds no associated state
     for its client ID (including sessions, opens, locks, delegations,
     layouts, and wants), the server may choose to unilaterally release the
     client ID.  The server may make this choice for an inactive client so
     that resources are not consumed by those intermittently active clients. 
     If the client contacts the server after this release, the server must
     ensure the client receives the appropriate error so that it will use the
     EXCHANGE_ID/CREATE_SESSION sequence to establish a new identity.  It
     should be clear that the server must be very hesitant to release a client
     ID since the resulting work on the client to recover from such an event
     will be the same burden as if the server had failed and restarted. 
     Typically a server would not release a client ID unless there had been no
     activity from that client for many minutes.  As long as there are
     sessions, opens, locks, delegations, layouts, or wants, the server MUST
     not release the client ID.  See Section 2.10.8.1.4 for discussion on
     releasing inactive sessions.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     When the server gets a EXCHANGE_ID for a client owner that
     currently has state, or an unexpired lease, and the principal that issues
     the EXCHANGE_ID is different than principal the previously established
     the client owner, the server MUST not destroy the any state that
     currently exists for client owner.  Regardless, the server has two
     choices.  First, it can return NFS4ERR_CLID_INUSE.  Second, it can allow
     the EXCHANGE_ID, and simply treat the client owner as consisting of both
     the co_ownerid and the principal that issued the EXCHANGE_ID.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The NFSv4.1 server MUST not return NFS4ERR_WRONGSEC to any
     operation other than a put filehandle operation, LOOKUP, LOOKUPP, and
     OPEN (by component name).

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHOULD not' in this paragraph:
     
     Note that for two such file systems, any information within the
     fs_locations_info attribute that indicates the need for special
     transition activity, i.e. the appearance of the two file system instances
     with different _handle_, _fileid_, _verifier_, _change_ classes, MUST be
     ignored by the client.  The server SHOULD not indicate that these
     instances belong to different _handle_, _fileid_, _verifier_, _change_
     classes, whether the two instances are shown belonging to the same
     _simultaneous-use_ class or not.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The set of fls_info data is subject to expansion in a future minor
     version, or in a standard-track RFC, within the context of a single minor
     version.  The server SHOULD NOT send and the client MUST not use indices
     within the fls_info array that are not defined in standards-track RFC's.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     o  OPEN4_RESULT_CONFIRM is deprecated and MUST not be returned by
     an NFSv4.1 server.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHOULD not' in this paragraph:
     
     If the clora_changed field is TRUE, then the client SHOULD not
     write and commit its modified data to the storage devices specified by
     the layout being recalled.  Instead, it is preferable for the client to
     write and commit the modified data through the metadata server.
     Alternatively, the client may attempt to obtain a new layout.  Note: in
     order to obtain a new layout the client must first return the old layout.
      Since obtaining a new layout is not guaranteed to succeed, the client
     must be ready to write and commit its modified data through the metadata
     server.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 4, 2007) is 6253 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'RPCRDMA' on line 2404

  -- Looks like a reference, but probably isn't: 'NFSDDP' on line 2348

  -- Looks like a reference, but probably isn't: 'RDDP' on line 2453

  -- Looks like a reference, but probably isn't: 'XNFS' on line 6002

  -- Looks like a reference, but probably isn't: 'Floyd' on line 6641

  == Missing Reference: '0' is mentioned on line 12009, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 3530 (ref. '2') (Obsoleted by RFC 7530)

  ** Obsolete normative reference: RFC 1831 (ref. '4') (Obsoleted by RFC 5531)

  ** Obsolete normative reference: RFC 1884 (ref. '9') (Obsoleted by RFC 2373)

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  ** Obsolete normative reference: RFC 3454 (ref. '12') (Obsoleted by RFC
     7564)

  ** Obsolete normative reference: RFC 3491 (ref. '13') (Obsoleted by RFC
     5891)

  ** Downref: Normative reference to an Informational RFC: RFC 2104 (ref.
     '14')

  ** Obsolete normative reference: RFC 2434 (ref. '16') (Obsoleted by RFC
     5226)

  == Outdated reference: A later version (-02) exists of
     draft-zelenka-pnfs-obj-01

  -- Obsolete informational reference (is this intentional?): RFC 3720 (ref.
     '29') (Obsoleted by RFC 7143)


     Summary: 8 errors (**), 0 flaws (~~), 17 warnings (==), 16 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                         S. Shepler
3	Internet-Draft                                                 M. Eisler
4	Intended status: Standards Track                               D. Noveck
5	Expires: September 5, 2007                                       Editors
6	                                                           March 4, 2007

8	                         NFSv4 Minor Version 1
9	                 draft-ietf-nfsv4-minorversion1-10.txt

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on September 5, 2007.

36	Copyright Notice

38	   Copyright (C) The IETF Trust (2007).

40	Abstract

42	   This Internet-Draft describes NFSv4 minor version one, including
43	   features retained from the base protocol and protocol extensions made
44	   subsequently.  The current draft includes description of the major
45	   extensions, Sessions, Directory Delegations, and parallel NFS (pNFS).
46	   This Internet-Draft is an active work item of the NFSv4 working
47	   group.  Active and resolved issues may be found in the issue tracker
48	   at: http://www.nfsv4-editor.org/cgi-bin/roundup/nfsv4.  New issues
49	   related to this document should be raised with the NFSv4 Working
50	   Group nfsv4@ietf.org and logged in the issue tracker.

52	Requirements Language

54	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
55	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
56	   document are to be interpreted as described in RFC 2119 [1].

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .  10
61	     1.1.   The NFSv4.1 Protocol . . . . . . . . . . . . . . . . . .  10
62	     1.2.   NFS Version 4 Goals  . . . . . . . . . . . . . . . . . .  10
63	     1.3.   Minor Version 1 Goals  . . . . . . . . . . . . . . . . .  11
64	     1.4.   Overview of NFS version 4.1 Features . . . . . . . . . .  11
65	       1.4.1.   RPC and Security . . . . . . . . . . . . . . . . . .  12
66	       1.4.2.   Protocol Structure . . . . . . . . . . . . . . . . .  12
67	       1.4.3.   File System Model  . . . . . . . . . . . . . . . . .  13
68	       1.4.4.   Locking Facilities . . . . . . . . . . . . . . . . .  14
69	     1.5.   General Definitions  . . . . . . . . . . . . . . . . . .  15
70	     1.6.   Differences from NFSv4.0 . . . . . . . . . . . . . . . .  17
71	   2.  Core Infrastructure . . . . . . . . . . . . . . . . . . . . .  17
72	     2.1.   Introduction . . . . . . . . . . . . . . . . . . . . . .  18
73	     2.2.   RPC and XDR  . . . . . . . . . . . . . . . . . . . . . .  18
74	       2.2.1.   RPC-based Security . . . . . . . . . . . . . . . . .  18
75	     2.3.   COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . .  21
76	     2.4.   Client Identifiers and Client Owners . . . . . . . . . .  22
77	       2.4.1.   Server Release of Client ID  . . . . . . . . . . . .  26
78	       2.4.2.   Handling Client Owner Conflicts  . . . . . . . . . .  26
79	     2.5.   Server Owners  . . . . . . . . . . . . . . . . . . . . .  27
80	     2.6.   Security Service Negotiation . . . . . . . . . . . . . .  27
81	       2.6.1.   NFSv4 Security Tuples  . . . . . . . . . . . . . . .  28
82	       2.6.2.   SECINFO and SECINFO_NO_NAME  . . . . . . . . . . . .  28
83	       2.6.3.   Security Error . . . . . . . . . . . . . . . . . . .  28
84	     2.7.   Minor Versioning . . . . . . . . . . . . . . . . . . . .  32
85	     2.8.   Non-RPC-based Security Services  . . . . . . . . . . . .  34
86	       2.8.1.   Authorization  . . . . . . . . . . . . . . . . . . .  34
87	       2.8.2.   Auditing . . . . . . . . . . . . . . . . . . . . . .  34
88	       2.8.3.   Intrusion Detection  . . . . . . . . . . . . . . . .  35
89	     2.9.   Transport Layers . . . . . . . . . . . . . . . . . . . .  35
90	       2.9.1.   Required and Recommended Properties of Transports  .  35
91	       2.9.2.   Client and Server Transport Behavior . . . . . . . .  35
92	       2.9.3.   Ports  . . . . . . . . . . . . . . . . . . . . . . .  37
93	     2.10.  Session  . . . . . . . . . . . . . . . . . . . . . . . .  37
94	       2.10.1.  Motivation and Overview  . . . . . . . . . . . . . .  37
95	       2.10.2.  NFSv4 Integration  . . . . . . . . . . . . . . . . .  38
96	       2.10.3.  Channels . . . . . . . . . . . . . . . . . . . . . .  39
97	       2.10.4.  Exactly Once Semantics . . . . . . . . . . . . . . .  42
98	       2.10.5.  RDMA Considerations  . . . . . . . . . . . . . . . .  51
99	       2.10.6.  Sessions Security  . . . . . . . . . . . . . . . . .  53
100	       2.10.7.  Session Mechanics - Steady State . . . . . . . . . .  57
101	       2.10.8.  Session Mechanics - Recovery . . . . . . . . . . . .  59
102	       2.10.9.  Parallel NFS and Sessions  . . . . . . . . . . . . .  62
103	   3.  Protocol Data Types . . . . . . . . . . . . . . . . . . . . .  62
104	     3.1.   Basic Data Types . . . . . . . . . . . . . . . . . . . .  62
105	     3.2.   Structured Data Types  . . . . . . . . . . . . . . . . .  64
106	   4.  Filehandles . . . . . . . . . . . . . . . . . . . . . . . . .  73
107	     4.1.   Obtaining the First Filehandle . . . . . . . . . . . . .  74
108	       4.1.1.   Root Filehandle  . . . . . . . . . . . . . . . . . .  74
109	       4.1.2.   Public Filehandle  . . . . . . . . . . . . . . . . .  74
110	     4.2.   Filehandle Types . . . . . . . . . . . . . . . . . . . .  75
111	       4.2.1.   General Properties of a Filehandle . . . . . . . . .  75
112	       4.2.2.   Persistent Filehandle  . . . . . . . . . . . . . . .  76
113	       4.2.3.   Volatile Filehandle  . . . . . . . . . . . . . . . .  76
114	     4.3.   One Method of Constructing a Volatile Filehandle . . . .  77
115	     4.4.   Client Recovery from Filehandle Expiration . . . . . . .  78
116	   5.  File Attributes . . . . . . . . . . . . . . . . . . . . . . .  79
117	     5.1.   Mandatory Attributes . . . . . . . . . . . . . . . . . .  80
118	     5.2.   Recommended Attributes . . . . . . . . . . . . . . . . .  80
119	     5.3.   Named Attributes . . . . . . . . . . . . . . . . . . . .  81
120	     5.4.   Classification of Attributes . . . . . . . . . . . . . .  81
121	     5.5.   Mandatory Attributes - Definitions . . . . . . . . . . .  83
122	     5.6.   Recommended Attributes - Definitions . . . . . . . . . .  84
123	     5.7.   Time Access  . . . . . . . . . . . . . . . . . . . . . .  94
124	     5.8.   Interpreting owner and owner_group . . . . . . . . . . .  95
125	     5.9.   Character Case Attributes  . . . . . . . . . . . . . . .  97
126	     5.10.  Quota Attributes . . . . . . . . . . . . . . . . . . . .  97
127	     5.11.  mounted_on_fileid  . . . . . . . . . . . . . . . . . . .  98
128	     5.12.  Directory Notification Attributes  . . . . . . . . . . .  99
129	       5.12.1.  dir_notif_delay  . . . . . . . . . . . . . . . . . .  99
130	       5.12.2.  dirent_notif_delay . . . . . . . . . . . . . . . . .  99
131	     5.13.  PNFS Attributes  . . . . . . . . . . . . . . . . . . . .  99
132	       5.13.1.  fs_layout_type . . . . . . . . . . . . . . . . . . .  99
133	       5.13.2.  layout_alignment . . . . . . . . . . . . . . . . . .  99
134	       5.13.3.  layout_blksize . . . . . . . . . . . . . . . . . . . 100
135	       5.13.4.  layout_hint  . . . . . . . . . . . . . . . . . . . . 100
136	       5.13.5.  layout_type  . . . . . . . . . . . . . . . . . . . . 100
137	       5.13.6.  mdsthreshold . . . . . . . . . . . . . . . . . . . . 100
138	     5.14.  Retention Attributes . . . . . . . . . . . . . . . . . . 101
139	   6.  Access Control Lists  . . . . . . . . . . . . . . . . . . . . 103
140	     6.1.   Goals  . . . . . . . . . . . . . . . . . . . . . . . . . 103
141	     6.2.   File Attributes Discussion . . . . . . . . . . . . . . . 104
142	       6.2.1.   ACL Attribute  . . . . . . . . . . . . . . . . . . . 104
143	       6.2.2.   dacl and sacl Attributes . . . . . . . . . . . . . . 115
144	       6.2.3.   mode Attribute . . . . . . . . . . . . . . . . . . . 116
145	       6.2.4.   mode_set_masked Attribute  . . . . . . . . . . . . . 116
146	     6.3.   Common Methods . . . . . . . . . . . . . . . . . . . . . 117
147	       6.3.1.   Interpreting an ACL  . . . . . . . . . . . . . . . . 117
148	       6.3.2.   Computing a Mode Attribute from an ACL . . . . . . . 118
149	     6.4.   Requirements . . . . . . . . . . . . . . . . . . . . . . 119
150	       6.4.1.   Setting the mode and/or ACL Attributes . . . . . . . 120
151	       6.4.2.   Retrieving the mode and/or ACL Attributes  . . . . . 121
152	       6.4.3.   Creating New Objects . . . . . . . . . . . . . . . . 122
153	   7.  Single-server Name Space  . . . . . . . . . . . . . . . . . . 125
154	     7.1.   Server Exports . . . . . . . . . . . . . . . . . . . . . 126
155	     7.2.   Browsing Exports . . . . . . . . . . . . . . . . . . . . 126
156	     7.3.   Server Pseudo File System  . . . . . . . . . . . . . . . 126
157	     7.4.   Multiple Roots . . . . . . . . . . . . . . . . . . . . . 127
158	     7.5.   Filehandle Volatility  . . . . . . . . . . . . . . . . . 127
159	     7.6.   Exported Root  . . . . . . . . . . . . . . . . . . . . . 127
160	     7.7.   Mount Point Crossing . . . . . . . . . . . . . . . . . . 128
161	     7.8.   Security Policy and Name Space Presentation  . . . . . . 128
162	   8.  File Locking and Share Reservations . . . . . . . . . . . . . 129
163	     8.1.   Locking  . . . . . . . . . . . . . . . . . . . . . . . . 130
164	       8.1.1.   Client and Session ID  . . . . . . . . . . . . . . . 130
165	       8.1.2.   State-owner Definition . . . . . . . . . . . . . . . 130
166	       8.1.3.   Stateid Definition . . . . . . . . . . . . . . . . . 131
167	       8.1.4.   Use of the Stateid and Locking . . . . . . . . . . . 134
168	     8.2.   Lock Ranges  . . . . . . . . . . . . . . . . . . . . . . 137
169	     8.3.   Upgrading and Downgrading Locks  . . . . . . . . . . . . 137
170	     8.4.   Blocking Locks . . . . . . . . . . . . . . . . . . . . . 138
171	     8.5.   Lease Renewal  . . . . . . . . . . . . . . . . . . . . . 138
172	     8.6.   Crash Recovery . . . . . . . . . . . . . . . . . . . . . 139
173	       8.6.1.   Client Failure and Recovery  . . . . . . . . . . . . 139
174	       8.6.2.   Server Failure and Recovery  . . . . . . . . . . . . 140
175	       8.6.3.   Network Partitions and Recovery  . . . . . . . . . . 143
176	     8.7.   Server Revocation of Locks . . . . . . . . . . . . . . . 147
177	     8.8.   Share Reservations . . . . . . . . . . . . . . . . . . . 148
178	     8.9.   OPEN/CLOSE Operations  . . . . . . . . . . . . . . . . . 149
179	     8.10.  Open Upgrade and Downgrade . . . . . . . . . . . . . . . 149
180	     8.11.  Short and Long Leases  . . . . . . . . . . . . . . . . . 150
181	     8.12.  Clocks, Propagation Delay, and Calculating Lease
182	            Expiration . . . . . . . . . . . . . . . . . . . . . . . 151
183	     8.13.  Vestigial Locking Infrastructure From V4.0 . . . . . . . 151
184	   9.  Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 152
185	     9.1.   Performance Challenges for Client-Side Caching . . . . . 153
186	     9.2.   Delegation and Callbacks . . . . . . . . . . . . . . . . 153
187	       9.2.1.   Delegation Recovery  . . . . . . . . . . . . . . . . 155
188	     9.3.   Data Caching . . . . . . . . . . . . . . . . . . . . . . 157
189	       9.3.1.   Data Caching and OPENs . . . . . . . . . . . . . . . 157
190	       9.3.2.   Data Caching and File Locking  . . . . . . . . . . . 158
191	       9.3.3.   Data Caching and Mandatory File Locking  . . . . . . 160
192	       9.3.4.   Data Caching and File Identity . . . . . . . . . . . 160
193	     9.4.   Open Delegation  . . . . . . . . . . . . . . . . . . . . 161
194	       9.4.1.   Open Delegation and Data Caching . . . . . . . . . . 164
195	       9.4.2.   Open Delegation and File Locks . . . . . . . . . . . 165
196	       9.4.3.   Handling of CB_GETATTR . . . . . . . . . . . . . . . 165
197	       9.4.4.   Recall of Open Delegation  . . . . . . . . . . . . . 168
198	       9.4.5.   Clients that Fail to Honor Delegation Recalls  . . . 170
199	       9.4.6.   Delegation Revocation  . . . . . . . . . . . . . . . 171
200	     9.5.   Data Caching and Revocation  . . . . . . . . . . . . . . 171
201	       9.5.1.   Revocation Recovery for Write Open Delegation  . . . 172
202	     9.6.   Attribute Caching  . . . . . . . . . . . . . . . . . . . 173
203	     9.7.   Data and Metadata Caching and Memory Mapped Files  . . . 175
204	     9.8.   Name Caching . . . . . . . . . . . . . . . . . . . . . . 177
205	     9.9.   Directory Caching  . . . . . . . . . . . . . . . . . . . 178
206	   10. Multi-Server Name Space . . . . . . . . . . . . . . . . . . . 179
207	     10.1.  Location attributes  . . . . . . . . . . . . . . . . . . 179
208	     10.2.  File System Presence or Absence  . . . . . . . . . . . . 179
209	     10.3.  Getting Attributes for an Absent File System . . . . . . 181
210	       10.3.1.  GETATTR Within an Absent File System . . . . . . . . 181
211	       10.3.2.  READDIR and Absent File Systems  . . . . . . . . . . 182
212	     10.4.  Uses of Location Information . . . . . . . . . . . . . . 183
213	       10.4.1.  File System Replication  . . . . . . . . . . . . . . 183
214	       10.4.2.  File System Migration  . . . . . . . . . . . . . . . 185
215	       10.4.3.  Referrals  . . . . . . . . . . . . . . . . . . . . . 186
216	     10.5.  Additional Client-side Considerations  . . . . . . . . . 187
217	     10.6.  Effecting File System Transitions  . . . . . . . . . . . 188
218	       10.6.1.  File System Transitions and Simultaneous Access  . . 189
219	       10.6.2.  Simultaneous Use and Transparent Transitions . . . . 190
220	       10.6.3.  Filehandles and File System Transitions  . . . . . . 192
221	       10.6.4.  Fileid's and File System Transitions . . . . . . . . 192
222	       10.6.5.  Fsids and File System Transitions  . . . . . . . . . 193
223	       10.6.6.  The Change Attribute and File System Transitions . . 193
224	       10.6.7.  Lock State and File System Transitions . . . . . . . 194
225	       10.6.8.  Write Verifiers and File System Transitions  . . . . 197
226	     10.7.  Effecting File System Referrals  . . . . . . . . . . . . 197
227	       10.7.1.  Referral Example (LOOKUP)  . . . . . . . . . . . . . 198
228	       10.7.2.  Referral Example (READDIR) . . . . . . . . . . . . . 202
229	     10.8.  The Attribute fs_absent  . . . . . . . . . . . . . . . . 204
230	     10.9.  The Attribute fs_locations . . . . . . . . . . . . . . . 204
231	     10.10. The Attribute fs_locations_info  . . . . . . . . . . . . 206
232	       10.10.1. The fs_locations_server4 Structure . . . . . . . . . 209
233	       10.10.2. The fs_locations_info4 Structure . . . . . . . . . . 214
234	       10.10.3. The fs_locations_item4 Structure . . . . . . . . . . 215
235	     10.11. The Attribute fs_status  . . . . . . . . . . . . . . . . 216
236	   11. Directory Delegations . . . . . . . . . . . . . . . . . . . . 220
237	     11.1.  Introduction to Directory Delegations  . . . . . . . . . 220
238	     11.2.  Directory Delegation Design  . . . . . . . . . . . . . . 221
239	     11.3.  Attributes in Support of Directory Notifications . . . . 222
240	     11.4.  Delegation Recall  . . . . . . . . . . . . . . . . . . . 222
241	     11.5.  Directory Delegation Recovery  . . . . . . . . . . . . . 222
242	   12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 222
243	     12.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . 222
244	     12.2.  PNFS Definitions . . . . . . . . . . . . . . . . . . . . 224
245	       12.2.1.  Metadata . . . . . . . . . . . . . . . . . . . . . . 224
246	       12.2.2.  Metadata Server  . . . . . . . . . . . . . . . . . . 224
247	       12.2.3.  Client . . . . . . . . . . . . . . . . . . . . . . . 225
248	       12.2.4.  Storage Device . . . . . . . . . . . . . . . . . . . 225
249	       12.2.5.  Data Server  . . . . . . . . . . . . . . . . . . . . 225
250	       12.2.6.  Storage Protocol or Data Protocol  . . . . . . . . . 225
251	       12.2.7.  Control Protocol . . . . . . . . . . . . . . . . . . 225
252	       12.2.8.  Layout . . . . . . . . . . . . . . . . . . . . . . . 226
253	       12.2.9.  Layout Types . . . . . . . . . . . . . . . . . . . . 226
254	       12.2.10. Layout Iomode  . . . . . . . . . . . . . . . . . . . 226
255	       12.2.11. Layout Segment . . . . . . . . . . . . . . . . . . . 227
256	       12.2.12. Device IDs . . . . . . . . . . . . . . . . . . . . . 228
257	     12.3.  PNFS Operations  . . . . . . . . . . . . . . . . . . . . 228
258	     12.4.  PNFS Attributes  . . . . . . . . . . . . . . . . . . . . 229
259	     12.5.  Layout Semantics . . . . . . . . . . . . . . . . . . . . 229
260	       12.5.1.  Guarantees Provided by Layouts . . . . . . . . . . . 229
261	       12.5.2.  Getting a Layout . . . . . . . . . . . . . . . . . . 230
262	       12.5.3.  Committing a Layout  . . . . . . . . . . . . . . . . 231
263	       12.5.4.  Recalling a Layout . . . . . . . . . . . . . . . . . 234
264	       12.5.5.  Metadata Server Write Propagation  . . . . . . . . . 240
265	     12.6.  PNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 240
266	     12.7.  Recovery . . . . . . . . . . . . . . . . . . . . . . . . 241
267	       12.7.1.  Client Recovery  . . . . . . . . . . . . . . . . . . 241
268	       12.7.2.  Dealing with Lease Expiration on the Client  . . . . 242
269	       12.7.3.  Dealing with Loss of Layout State on the Metadata
270	                Server . . . . . . . . . . . . . . . . . . . . . . . 243
271	       12.7.4.  Recovery from Metadata Server Restart  . . . . . . . 244
272	       12.7.5.  Operations During Metadata Server Grace Period . . . 246
273	       12.7.6.  Storage Device Recovery  . . . . . . . . . . . . . . 246
274	     12.8.  Metadata and Storage Device Roles  . . . . . . . . . . . 247
275	     12.9.  Security Considerations  . . . . . . . . . . . . . . . . 248
276	   13. PNFS: NFSv4.1 File Layout Type  . . . . . . . . . . . . . . . 249
277	     13.1.  Session Considerations . . . . . . . . . . . . . . . . . 249
278	     13.2.  File Layout Definitions  . . . . . . . . . . . . . . . . 251
279	     13.3.  File Layout Data Types . . . . . . . . . . . . . . . . . 251
280	     13.4.  Interpreting the File Layout . . . . . . . . . . . . . . 255
281	     13.5.  Sparse and Dense Stripe Unit Packing . . . . . . . . . . 257
282	     13.6.  Data Server Multipathing . . . . . . . . . . . . . . . . 259
283	     13.7.  Operations Issued to NFSv4.1 Data Servers  . . . . . . . 259
284	     13.8.  COMMIT Through Metadata Server . . . . . . . . . . . . . 260
285	     13.9.  Global Stateid Requirements  . . . . . . . . . . . . . . 261
286	     13.10. The Layout Iomode  . . . . . . . . . . . . . . . . . . . 261
287	     13.11. Data Server State Propagation  . . . . . . . . . . . . . 261
288	       13.11.1. Lock State Propagation . . . . . . . . . . . . . . . 262
289	       13.11.2. Open-mode Validation . . . . . . . . . . . . . . . . 262
290	       13.11.3. File Attributes  . . . . . . . . . . . . . . . . . . 263
291	     13.12. Data Server Component File Size  . . . . . . . . . . . . 263
292	     13.13. Recovery Considerations  . . . . . . . . . . . . . . . . 264
293	     13.14. Security Considerations for the File Layout Type . . . . 265
294	   14. Internationalization  . . . . . . . . . . . . . . . . . . . . 265
295	     14.1.  Stringprep profile for the utf8str_cs type . . . . . . . 266
296	     14.2.  Stringprep profile for the utf8str_cis type  . . . . . . 268
297	     14.3.  Stringprep profile for the utf8str_mixed type  . . . . . 269
298	     14.4.  UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 271
299	   15. Error Values  . . . . . . . . . . . . . . . . . . . . . . . . 271
300	     15.1.  Error Definitions  . . . . . . . . . . . . . . . . . . . 271
301	     15.2.  Operations and their valid errors  . . . . . . . . . . . 285
302	     15.3.  Callback operations and their valid errors . . . . . . . 299
303	     15.4.  Errors and the operations that use them  . . . . . . . . 300
304	   16. NFS version 4.1 Procedures  . . . . . . . . . . . . . . . . . 307
305	     16.1.  Procedure 0: NULL - No Operation . . . . . . . . . . . . 307
306	     16.2.  Procedure 1: COMPOUND - Compound Operations  . . . . . . 308
307	   17. NFS version 4.1 Operations  . . . . . . . . . . . . . . . . . 313
308	     17.1.  Operation 3: ACCESS - Check Access Rights  . . . . . . . 313
309	     17.2.  Operation 4: CLOSE - Close File  . . . . . . . . . . . . 315
310	     17.3.  Operation 5: COMMIT - Commit Cached Data . . . . . . . . 317
311	     17.4.  Operation 6: CREATE - Create a Non-Regular File Object . 319
312	     17.5.  Operation 7: DELEGPURGE - Purge Delegations Awaiting
313	            Recovery . . . . . . . . . . . . . . . . . . . . . . . . 322
314	     17.6.  Operation 8: DELEGRETURN - Return Delegation . . . . . . 323
315	     17.7.  Operation 9: GETATTR - Get Attributes  . . . . . . . . . 323
316	     17.8.  Operation 10: GETFH - Get Current Filehandle . . . . . . 325
317	     17.9.  Operation 11: LINK - Create Link to a File . . . . . . . 326
318	     17.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 327
319	     17.11. Operation 13: LOCKT - Test For Lock  . . . . . . . . . . 331
320	     17.12. Operation 14: LOCKU - Unlock File  . . . . . . . . . . . 332
321	     17.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 334
322	     17.14. Operation 16: LOOKUPP - Lookup Parent Directory  . . . . 335
323	     17.15. Operation 17: NVERIFY - Verify Difference in
324	            Attributes . . . . . . . . . . . . . . . . . . . . . . . 337
325	     17.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 338
326	     17.17. Operation 19: OPENATTR - Open Named Attribute
327	            Directory  . . . . . . . . . . . . . . . . . . . . . . . 352
328	     17.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 354
329	     17.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 355
330	     17.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 356
331	     17.21. Operation 24: PUTROOTFH - Set Root Filehandle  . . . . . 357
332	     17.22. Operation 25: READ - Read from File  . . . . . . . . . . 358
333	     17.23. Operation 26: READDIR - Read Directory . . . . . . . . . 360
334	     17.24. Operation 27: READLINK - Read Symbolic Link  . . . . . . 364
335	     17.25. Operation 28: REMOVE - Remove File System Object . . . . 365
336	     17.26. Operation 29: RENAME - Rename Directory Entry  . . . . . 367
337	     17.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 369
338	     17.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 370
339	     17.29. Operation 33: SECINFO - Obtain Available Security  . . . 370
340	     17.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 374
341	     17.31. Operation 37: VERIFY - Verify Same Attributes  . . . . . 376
342	     17.32. Operation 38: WRITE - Write to File  . . . . . . . . . . 377
343	     17.33. Operation 40: BACKCHANNEL_CTL - Backchannel control  . . 382
344	     17.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 383
345	     17.35. Operation 42: EXCHANGE_ID - Instantiate Client ID  . . . 387
346	     17.36. Operation 43: CREATE_SESSION - Create New Session and
347	            Confirm Client ID  . . . . . . . . . . . . . . . . . . . 395
348	     17.37. Operation 44: DESTROY_SESSION - Destroy existing
349	            session  . . . . . . . . . . . . . . . . . . . . . . . . 405
350	     17.38. Operation 45: FREE_STATEID - Free stateid with no
351	            locks  . . . . . . . . . . . . . . . . . . . . . . . . . 406
352	     17.39. Operation 46: GET_DIR_DELEGATION - Get a directory
353	            delegation . . . . . . . . . . . . . . . . . . . . . . . 407
354	     17.40. Operation 47: GETDEVICEINFO - Get Device Information . . 412
355	     17.41. Operation 48: GETDEVICELIST  . . . . . . . . . . . . . . 413
356	     17.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
357	            a layout . . . . . . . . . . . . . . . . . . . . . . . . 414
358	     17.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 417
359	     17.44. Operation 51: LAYOUTRETURN - Release Layout
360	            Information  . . . . . . . . . . . . . . . . . . . . . . 420
361	     17.45. Operation 52: SECINFO_NO_NAME - Get Security on
362	            Unnamed Object . . . . . . . . . . . . . . . . . . . . . 423
363	     17.46. Operation 53: SEQUENCE - Supply per-procedure
364	            sequencing and control . . . . . . . . . . . . . . . . . 424
365	     17.47. Operation 54: SET_SSV  . . . . . . . . . . . . . . . . . 429
366	     17.48. Operation 55: TEST_STATEID - Test stateids for
367	            validity . . . . . . . . . . . . . . . . . . . . . . . . 431
368	     17.49. Operation 56: WANT_DELEGATION  . . . . . . . . . . . . . 432
369	     17.50. Operation 57: DESTROY_CLIENTID - Destroy existing
370	            client ID  . . . . . . . . . . . . . . . . . . . . . . . 435
371	     17.51. Operation 10044: ILLEGAL - Illegal operation . . . . . . 436
372	   18. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 437
373	     18.1.  Procedure 0: CB_NULL - No Operation  . . . . . . . . . . 437
374	     18.2.  Procedure 1: CB_COMPOUND - Compound Operations . . . . . 437
375	   19. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 439
376	     19.1.  Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 439
377	     19.2.  Operation 4: CB_RECALL - Recall an Open Delegation . . . 441
378	     19.3.  Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 442
379	     19.4.  Operation 6: CB_NOTIFY - Notify directory changes  . . . 444
380	     19.5.  Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 447
381	     19.6.  Operation 8: CB_RECALL_ANY - Keep any N delegations  . . 448
382	     19.7.  Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 451
383	     19.8.  Operation 10: CB_RECALL_SLOT - change flow control
384	            limits . . . . . . . . . . . . . . . . . . . . . . . . . 452
385	     19.9.  Operation 11: CB_SEQUENCE - Supply callback channel
386	            sequencing and control . . . . . . . . . . . . . . . . . 453
387	     19.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 455
388	     19.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
389	            lock availability  . . . . . . . . . . . . . . . . . . . 456
390	     19.12. Operation 10044: CB_ILLEGAL - Illegal Callback
391	            Operation  . . . . . . . . . . . . . . . . . . . . . . . 457
392	   20. Security Considerations . . . . . . . . . . . . . . . . . . . 458
393	   21. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 458
394	     21.1.  Defining new layout types  . . . . . . . . . . . . . . . 458
395	   22. References  . . . . . . . . . . . . . . . . . . . . . . . . . 459
396	     22.1.  Normative References . . . . . . . . . . . . . . . . . . 459
397	     22.2.  Informative References . . . . . . . . . . . . . . . . . 460
398	   Appendix A.  Acknowledgments  . . . . . . . . . . . . . . . . . . 461
399	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 462
400	   Intellectual Property and Copyright Statements  . . . . . . . . . 464

402	1.  Introduction

404	1.1.  The NFSv4.1 Protocol

406	   The NFSv4.1 protocol is a minor version of the NFSv4 protocol
407	   described in [2].  It generally follows the guidelines for minor
408	   versioning model laid in Section 10 of RFC 3530.  However, it
409	   diverges from guidelines 11 ("a client and server that supports minor
410	   version X must support minor versions 0 through X-1"), and 12 ("no
411	   features may be introduced as mandatory in a minor version").  These
412	   divergences are due to the introduction of the sessions model for
413	   managing non-idempotent operations and the RECLAIM_COMPLETE
414	   operation.  These two new features are infrastructural in nature and
415	   simplify implementation of existing and other new features.  Making
416	   them optional would add undue complexity to protocol definition and
417	   implementation.  NFSv4.1 accordingly updates the Minor Versioning
418	   guidelines (Section 2.7).

420	   NFSv4.1, as a minor version, is consistent with the overall goals for
421	   NFS Version 4, but extends the protocol so as to better meet those
422	   goals, based on experiences with NFSv4.0.  In addition, NFSv4.1 has
423	   adopted some additional goals, which motivate some of the major
424	   extensions in minor version 1.

426	1.2.  NFS Version 4 Goals

428	   The NFS version 4 protocol is a further revision of the NFS protocol
429	   defined already by versions 2 [17]] and 3 [18].  It retains the
430	   essential characteristics of previous versions: design for easy
431	   recovery, independent of transport protocols, operating systems and
432	   file systems, simplicity, and good performance.  The NFS version 4
433	   revision has the following goals:

435	   o  Improved access and good performance on the Internet.

437	      The protocol is designed to transit firewalls easily, perform well
438	      where latency is high and bandwidth is low, and scale to very
439	      large numbers of clients per server.

441	   o  Strong security with negotiation built into the protocol.

443	      The protocol builds on the work of the ONCRPC working group in
444	      supporting the RPCSEC_GSS protocol.  Additionally, the NFS version
445	      4 protocol provides a mechanism to allow clients and servers the
446	      ability to negotiate security and require clients and servers to
447	      support a minimal set of security schemes.

449	   o  Good cross-platform interoperability.

451	      The protocol features a file system model that provides a useful,
452	      common set of features that does not unduly favor one file system
453	      or operating system over another.

455	   o  Designed for protocol extensions.

457	      The protocol is designed to accept standard extensions within a
458	      framework that enable and encourages backward compatibility.

460	1.3.  Minor Version 1 Goals

462	   Minor version one has the following goals, within the framework
463	   established by the overall version 4 goals.

465	   o  To correct significant structural weaknesses and oversights
466	      discovered in the base protocol.

468	   o  To add clarity and specificity to areas left unaddressed or not
469	      addressed in sufficient detail in the base protocol.

471	   o  To add specific features based on experience with the existing
472	      protocol and recent industry developments.

474	   o  To provide protocol support to take advantage of clustered server
475	      deployments including the ability to provide scalable parallel
476	      access to files distributed among multiple servers.

478	1.4.  Overview of NFS version 4.1 Features

480	   To provide a reasonable context for the reader, the major features of
481	   NFS version 4.1 protocol will be reviewed in brief.  This will be
482	   done to provide an appropriate context for both the reader who is
483	   familiar with the previous versions of the NFS protocol and the
484	   reader that is new to the NFS protocols.  For the reader new to the
485	   NFS protocols, there is still a set of fundamental knowledge that is
486	   expected.  The reader should be familiar with the XDR and RPC
487	   protocols as described in [3] and [4].  A basic knowledge of file
488	   systems and distributed file systems is expected as well.

490	   This description of version 4.1 features will not distinguish those
491	   added in minor version one from those present in the base protocol
492	   but will treat minor version 1 as a unified whole.  See Section 1.6
493	   for a description of the differences between the two minor versions.

495	1.4.1.  RPC and Security

497	   As with previous versions of NFS, the External Data Representation
498	   (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFS
499	   version 4.1 protocol are those defined in [3] and [4].  To meet end-
500	   to-end security requirements, the RPCSEC_GSS framework [5] will be
501	   used to extend the basic RPC security.  With the use of RPCSEC_GSS,
502	   various mechanisms can be provided to offer authentication,
503	   integrity, and privacy to the NFS version 4 protocol.  Kerberos V5
504	   will be used as described in [6] to provide one security framework.
505	   The LIPKEY and SPKM-3 GSS-API mechanisms described in [7] will be
506	   used to provide for the use of user password and client/server public
507	   key certificates by the NFS version 4 protocol.  With the use of
508	   RPCSEC_GSS, other mechanisms may also be specified and used for NFS
509	   version 4.1 security.

511	   To enable in-band security negotiation, the NFS version 4.1 protocol
512	   has operations which provide the client a method of querying the
513	   server about its policies regarding which security mechanisms must be
514	   used for access to the server's file system resources.  With this,
515	   the client can securely match the security mechanism that meets the
516	   policies specified at both the client and server.

518	1.4.2.  Protocol Structure

520	1.4.2.1.  Core Protocol

522	   Unlike NFS Versions 2 and 3, which used a series of ancillary
523	   protocols (e.g.  NLM, NSM, MOUNT), within all minor versions of NFS
524	   version 4 only a single RPC protocol is used to make requests of the
525	   server.  Facilities that had been separate protocols, such as
526	   locking, are now integrated within a single unified protocol.

528	1.4.2.2.  Parallel Access

530	   Minor version one supports high-performance data access to a
531	   clustered server implementation by enabling a separation of metadata
532	   access and data access, with the latter done to multiple servers in
533	   parallel.

535	   Such parallel data access is controlled by recallable objects known
536	   as "layouts", which are integrated into the protocol locking model.
537	   Clients direct requests for data access to a set of data servers
538	   specified by the layout via a data storage protocol which may be
539	   NFSv4.1 or may be another protocol.

541	1.4.3.  File System Model

543	   The general file system model used for the NFS version 4.1 protocol
544	   is the same as previous versions.  The server file system is
545	   hierarchical with the regular files contained within being treated as
546	   opaque octet streams.  In a slight departure, file and directory
547	   names are encoded with UTF-8 to deal with the basics of
548	   internationalization.

550	   The NFS version 4.1 protocol does not require a separate protocol to
551	   provide for the initial mapping between path name and filehandle.
552	   All file systems exported by a server are presented as a tree so that
553	   all file systems are reachable from a special per-server global root
554	   filehandle.  This allows LOOKUP operations to be used to perform
555	   functions previously provided by the MOUNT protocol.  The server
556	   provides any necessary pseudo file systems to bridge any gaps that
557	   arise due to unexported gaps between exported file systems.

559	1.4.3.1.  Filehandles

561	   As in previous versions of the NFS protocol, opaque filehandles are
562	   used to identify individual files and directories.  Lookup-type and
563	   create operations are used to go from file and directory names to the
564	   filehandle which is then used to identify the object to subsequent
565	   operations.

567	   The NFS version 4.1 protocol provides support for persistent
568	   filehandles, guaranteed to be valid for the lifetime of the file
569	   system object designated.  In addition it provides support to servers
570	   to provide filehandles with more limited validity guarantees, called
571	   volatile filehandles.

573	1.4.3.2.  File Attributes

575	   The NFS version 4.1 protocol has a rich and extensible attribute
576	   structure.  Only a small set of the defined attributes are mandatory
577	   and must be provided by all server implementations.  The other
578	   attributes are known as "recommended" attributes.

580	   One significant recommended file attribute is the Access Control List
581	   (ACL) attribute.  This attribute provides for directory and file
582	   access control beyond the model used in NFS Versions 2 and 3.  The
583	   ACL definition allows for specification of specific sets of
584	   permissions for individual users and groups.  In addition, ACL
585	   inheritance allows propagation of access permissions and restriction
586	   down a directory tree as file system objects are created.

588	   One other type of attribute is the named attribute.  A named
589	   attribute is an opaque octet stream that is associated with a
590	   directory or file and referred to by a string name.  Named attributes
591	   are meant to be used by client applications as a method to associate
592	   application-specific data with a regular file or directory.

594	1.4.3.3.  Multi-server Namespace

596	   NFS Version 4.1 contains a number of features to allow implementation
597	   of namespaces that cross server boundaries and that allow and
598	   facilitate a non-disruptive transfer of support for individual file
599	   systems between servers.  They are all based upon attributes that
600	   allow one file system to specify alternate or new locations for that
601	   file system.

603	   These attributes may be used together with the concept of absent file
604	   system which provide specifications for additional locations but no
605	   actual file system content.  This allows a number of important
606	   facilities:

608	   o  Location attributes may be used with absent file systems to
609	      implement referrals whereby one server may direct the client to a
610	      file system provided by another server.  This allows extensive
611	      multi-server namespaces to be constructed.

613	   o  Location attributes may be provided for present file systems to
614	      provide the locations of alternate file system instances or
615	      replicas to be used in the event that the current file system
616	      instance becomes unavailable.

618	   o  Location attributes may be provided when a previously present file
619	      system becomes absent.  This allows non-disruptive migration of
620	      file systems to alternate servers.

622	1.4.4.  Locking Facilities

624	   As mentioned previously, NFS v4.1, is a single protocol which
625	   includes locking facilities.  These locking facilities include
626	   support for many types of locks including a number of sorts of
627	   recallable locks.  Recallable locks such as delegations allow the
628	   client to be assured that certain events will not occur so long as
629	   that lock is held.  When circumstances change, the lock is recalled
630	   via a callback request.  The assurances provided by delegations allow
631	   more extensive caching to be done safely when circumstances allow it.

633	   o  Share reservations as established by OPEN operations.

635	   o  Byte-range locks.

637	   o  File delegations which are recallable locks that assure the holder
638	      that inconsistent opens and file changes cannot occur so long as
639	      the delegation is held.

641	   o  Directory delegations which are recallable delegations that assure
642	      the holder that inconsistent directory modifications cannot occur
643	      so long as the delegation is held.

645	   o  Layouts which are recallable objects that assure the holder that
646	      direct access to the file data may be performed directly by the
647	      client and that no change to the data's location inconsistent with
648	      that access may be made so long as the layout is held.

650	   All locks for a given client are tied together under a single client-
651	   wide lease.  All requests made on sessions associated with the client
652	   renew that lease.  When leases are not promptly renewed lock are
653	   subject to revocation.  In the event of server reinitialization,
654	   clients have the opportunity to safely reclaim their locks within a
655	   special grace period.

657	1.5.  General Definitions

659	   The following definitions are provided for the purpose of providing
660	   an appropriate context for the reader.

662	   Client  The "client" is the entity that accesses the NFS server's
663	      resources.  The client may be an application which contains the
664	      logic to access the NFS server directly.  The client may also be
665	      the traditional operating system client remote file system
666	      services for a set of applications.

668	      A client is uniquely identified by a Client Owner.

670	      In the case of file locking the client is the entity that
671	      maintains a set of locks on behalf of one or more applications.
672	      This client is responsible for crash or failure recovery for those
673	      locks it manages.

675	      Note that multiple clients may share the same transport and
676	      connection and multiple clients may exist on the same network
677	      node.

679	   Client ID  A 64-bit quantity used as a unique, short-hand reference
680	      to a client supplied Verifier and client owner.  The server is
681	      responsible for supplying the client ID.

683	   Client Owner  The client owner is a unique string, opaque to the
684	      server, which identifies a client.  Multiple network connections
685	      and source network addresses originating those connections may
686	      share a client owner.  The server is expected to treat requests
687	      from connnections with the same client owner has coming from the
688	      same client.

690	   Lease  An interval of time defined by the server for which the client
691	      is irrevocably granted a lock.  At the end of a lease period the
692	      lock may be revoked if the lease has not been extended.  The lock
693	      must be revoked if a conflicting lock has been granted after the
694	      lease interval.

696	      All leases granted by a server have the same fixed interval.  Note
697	      that the fixed interval was chosen to alleviate the expense a
698	      server would have in maintaining state about variable length
699	      leases across server failures.

701	   Lock  The term "lock" is used to refer to any of record (octet-range)
702	      locks, share reservations, delegations or layouts unless
703	      specifically stated otherwise.

705	   Server  The "Server" is the entity responsible for coordinating
706	      client access to a set of file systems.  A server can span
707	      multiple network addresses.  In NFSv4.1, a server is a two tiered
708	      entity allows for servers consisting of multiple components the
709	      flexibility to tightly or loosely couple their components without
710	      requiring tight synchronization among the components.  Every
711	      server has a "Server Owner" which reflects the two tiers of a
712	      server entity.

714	   Server Owner  The "Server Owner" identifies the server to the client.
715	      The server owner consists of a major and minor identifier.  When
716	      the client has two connections each to a peer with the same major
717	      and minor identifier, the client assumes both peers are the same
718	      server (the server namespace is the same via each connection), and
719	      further assumes session and lock state is sharable across both
720	      connections.  When each peer has the same major identifier but
721	      different minor identifier, the client assumes both peers can
722	      serve the same namespace, but session and lock state is not
723	      sharable across both connections.

725	   Stable Storage  NFS version 4 servers must be able to recover without
726	      data loss from multiple power failures (including cascading power
727	      failures, that is, several power failures in quick succession),
728	      operating system failures, and hardware failure of components
729	      other than the storage medium itself (for example, disk,
730	      nonvolatile RAM).

732	      Some examples of stable storage that are allowable for an NFS
733	      server include:

735	      1.  Media commit of data, that is, the modified data has been
736	          successfully written to the disk media, for example, the disk
737	          platter.

739	      2.  An immediate reply disk drive with battery-backed on- drive
740	          intermediate storage or uninterruptible power system (UPS).

742	      3.  Server commit of data with battery-backed intermediate storage
743	          and recovery software.

745	      4.  Cache commit with uninterruptible power system (UPS) and
746	          recovery software.

748	   Stateid  A 128-bit quantity returned by a server that uniquely
749	      defines the open and locking state provided by the server for a
750	      specific open or lock owner for a specific file and type of lock.

752	   Verifier  A 64-bit quantity generated by the client that the server
753	      can use to determine if the client has restarted and lost all
754	      previous lock state.

756	1.6.  Differences from NFSv4.0

758	   The following summarizes the differences between minor version one
759	   and the base protocol:

761	   o  Implementation of the sessions model.

763	   o  Support for parallel access to data.

765	   o  Addition of the RECLAIM_COMPLETE operation to better structure the
766	      lock reclamation process.

768	   o  Support for delegations on directories and other file types in
769	      addition to regular files.

771	   o  Operations to re-obtain a delegation.

773	   o  Support for client and server implementation id's.

775	2.  Core Infrastructure
776	2.1.  Introduction

778	   NFS version 4.1 (NFSv4.1) relies on core infrastructure common to
779	   nearly every operation.  This core infrastructure is described in the
780	   remainder of this section.

782	2.2.  RPC and XDR

784	   The NFS version 4.1 (NFSv4.1) protocol is a Remote Procedure Call
785	   (RPC) application that uses RPC version 2 and the corresponding
786	   eXternal Data Representation (XDR) as defined in RFC1831 [4] and
787	   RFC4506 [3].

789	2.2.1.  RPC-based Security

791	   Previous NFS versions have been thought of as having a host-based
792	   authentication model, where the NFS server authenticates the NFS
793	   client, and trust the client to authenticate all users.  Actually,
794	   NFS has always depended on RPC for authentication.  The first form of
795	   RPC authentication which required a host-based authentication
796	   approach.  NFSv4 also depends on RPC for basic security services, and
797	   mandates RPC support for a user-based authentication model.  The
798	   user-based authentication model has user principals authenticated by
799	   a server, and in turn the server authenticated by user principals.
800	   RPC provides some basic security services which are used by NFSv4.

802	2.2.1.1.  RPC Security Flavors

804	   As described in section 7.2 "Authentication" of [4], RPC security is
805	   encapsulated in the RPC header, via a security or authentication
806	   flavor, and information specific to the specification of the security
807	   flavor.  Every RPC header conveys information used to identify and
808	   authenticate a client and server.  As discussed in Section 2.2.1.1.1,
809	   some security flavors provide additional security services.

811	   NFSv4 clients and servers MUST implement RPCSEC_GSS.  (This
812	   requirement to implement is not a requirement to use.)  Other
813	   flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well.

815	2.2.1.1.1.  RPCSEC_GSS and Security Services

817	   RPCSEC_GSS ([5]) uses the functionality of GSS-API RFC2743 [8].  This
818	   allows for the use of various security mechanisms by the RPC layer
819	   without the additional implementation overhead of adding RPC security
820	   flavors.

822	2.2.1.1.1.1.  Identification, Authentication, Integrity, Privacy

824	   Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate
825	   users on clients to servers, and servers to users.  It can also
826	   perform integrity checking on the entire RPC message, including the
827	   RPC header, and the arguments or results.  Finally, privacy, usually
828	   via encryption, is a service available with RPCSEC_GSS.  Privacy is
829	   performed on the arguments and results.  Note that if privacy is
830	   selected, integrity, authentication, and identification are enabled.
831	   If privacy is not selected, but integrity is selected, authentication
832	   and identification are enabled.  If integrity and privacy are not
833	   selected, but authentication is enabled, identification is enabled.
834	   RPCSEC_GSS does not provide identification as a separate service.

836	   Although GSS-API has an authentication service distinct from its
837	   privacy and integrity services, GSS-API's authentication service is
838	   not used for RPCSEC_GSS's authentication service.  Instead, each RPC
839	   request and response header is integrity protected with the GSS-API
840	   integrity service, and this allows RPCSEC_GSS to offer per-RPC
841	   authentication and identity.  See [5] for more information.

843	   NFSv4 client and servers MUST support RPCSEC_GSS's integrity and
844	   authentication service.  NFSv4.1 servers MUST support RPCSEC_GSS's
845	   privacy service.

847	2.2.1.1.1.2.  Security mechanisms for NFS version 4

849	   RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide
850	   security services.  Therefore NFSv4 clients and servers MUST support
851	   three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY.

853	   The use of RPCSEC_GSS requires selection of: mechanism, quality of
854	   protection (QOP), and service (authentication, integrity, privacy).
855	   For the mandated security mechanisms, NFSv4 specifies that a QOP of
856	   zero (0) is used, leaving it up to the mechanism or the mechanism's
857	   configuration to use an appropriate level of protection that QOP zero
858	   maps to.  Each mandated mechanism specifies minimum set of
859	   cryptographic algorithms for implementing integrity and privacy.
860	   NFSv4 clients and servers MUST be implemented on operating
861	   environments that comply with the mandatory cryptographic algorithms
862	   of each mandated mechanism.

864	2.2.1.1.1.2.1.  Kerberos V5

866	   The Kerberos V5 GSS-API mechanism as described in RFC1964 [6] (
867	   [[Comment.1: need new Kerberos RFC]] ) MUST be implemented with the
868	   RPCSEC_GSS services as specified in the following table:

870	      column descriptions:
871	      1 == number of pseudo flavor
872	      2 == name of pseudo flavor
873	      3 == mechanism's OID
874	      4 == RPCSEC_GSS service
875	      5 == NFSv4.1 clients MUST support
876	      6 == NFSv4.1 servers MUST support

878	      1      2        3                    4                     5   6
879	      ------------------------------------------------------------------
880	      390003 krb5     1.2.840.113554.1.2.2 rpc_gss_svc_none      yes yes
881	      390004 krb5i    1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes
882	      390005 krb5p    1.2.840.113554.1.2.2 rpc_gss_svc_privacy    no yes

884	   Note that the number and name of the pseudo flavor is presented here
885	   as a mapping aid to the implementor.  Because the NFSv4 protocol
886	   includes a method to negotiate security and it understands the GSS-
887	   API mechanism, the pseudo flavor is not needed.  The pseudo flavor is
888	   needed for the NFS version 3 since the security negotiation is done
889	   via the MOUNT protocol as described in [19].

891	2.2.1.1.1.2.2.  LIPKEY

893	   The LIPKEY V5 GSS-API mechanism as described in [7] MUST be
894	   implemented with the RPCSEC_GSS services as specified in the
895	   following table:

897	      1      2        3                    4                     5   6
898	      ------------------------------------------------------------------
899	      390006 lipkey   1.3.6.1.5.5.9        rpc_gss_svc_none      yes yes
900	      390007 lipkey-i 1.3.6.1.5.5.9        rpc_gss_svc_integrity yes yes
901	      390008 lipkey-p 1.3.6.1.5.5.9        rpc_gss_svc_privacy    no yes

903	2.2.1.1.1.2.3.  SPKM-3 as a security triple

905	   The SPKM-3 GSS-API mechanism as described in [7] MUST be implemented
906	   with the RPCSEC_GSS services as specified in the following table:

908	      1      2        3                    4                     5   6
909	      ------------------------------------------------------------------
910	      390009 spkm3    1.3.6.1.5.5.1.3      rpc_gss_svc_none      yes yes
911	      390010 spkm3i   1.3.6.1.5.5.1.3      rpc_gss_svc_integrity yes yes
912	      390011 spkm3p   1.3.6.1.5.5.1.3      rpc_gss_svc_privacy    no yes

914	2.2.1.1.1.3.  GSS Server Principal

916	   Regardless of what security mechanism under RPCSEC_GSS is being used,
917	   the NFS server, MUST identify itself in GSS-API via a
918	   GSS_C_NT_HOSTBASED_SERVICE name type.  GSS_C_NT_HOSTBASED_SERVICE
919	   names are of the form:

921	        service@hostname

923	   For NFS, the "service" element is

925	        nfs

927	   Implementations of security mechanisms will convert nfs@hostname to
928	   various different forms.  For Kerberos V5, LIPKEY, and SPKM-3, the
929	   following form is RECOMMENDED:

931	        nfs/hostname

933	2.3.  COMPOUND and CB_COMPOUND

935	   A significant departure from the versions of the NFS protocol before
936	   version 4 is the introduction of the COMPOUND procedure.  For the
937	   NFSv4 protocol, in all minor versions, there are exactly two RPC
938	   procedures, NULL and COMPOUND.  The COMPOUND procedure is defined as
939	   a series of individual operations and these operations perform the
940	   sorts of functions performed by traditional NFS procedures.

942	   The operations combined within a COMPOUND request are evaluated in
943	   order by the server, without any atomicity guarantees.  A limited set
944	   of facilities exist to pass results from one operation to another.
945	   Once an operation returns a failing result, the evaluation ends and
946	   the results of all evaluated operations are returned to the client.

948	   With the use of the COMPOUND procedure, the client is able to build
949	   simple or complex requests.  These COMPOUND requests allow for a
950	   reduction in the number of RPCs needed for logical file system
951	   operations.  For example, multi-component lookup requests can be
952	   constructed by combining multiple LOOKUP operations.  Those can be
953	   further combined with operations such as GETATTR, READDIR, or OPEN
954	   plus READ to do more complicated sets of operation without incurring
955	   additional latency.

957	   NFSv4 also contains a considerable set of callback operations in
958	   which the server makes an RPC directed at the client.  Callback RPC's
959	   have a similar structure to that of the normal server requests.  For
960	   the NFS version 4 protocol callbacks in all minor versions, there are
961	   two RPC procedures, NULL and CB_COMPOUND.  The CB_COMPOUND procedure
962	   is defined in an analogous fashion to that of COMPOUND with its own
963	   set of callback operations.

965	   Addition of new server and callback operation within the COMPOUND and
966	   CB_COMPOUND request framework provide means of extending the protocol
967	   in subsequent minor versions.

969	   Except for a small number of operations needed for session creation,
970	   server requests and callback requests are performed within the
971	   context of a session.  Sessions provide a client context for every
972	   request and support robust replay protection for non-idempotent
973	   requests.

975	2.4.  Client Identifiers and Client Owners

977	   For each operation that obtains or depends on locking state, the
978	   specific client must be determinable by the server.  In NFSv4, each
979	   distinct client instance is represented by a client ID, which is a
980	   64-bit identifier that identifies a specific client at a given time
981	   and which is changed whenever the client or the server re-
982	   initializes.  Client IDs are used to support lock identification and
983	   crash recovery.

985	   In NFSv4.1, during steady state operation, the client ID associated
986	   with each operation is derived from the session (see Section 2.10) on
987	   which the operation is issued.  Each session is associated with a
988	   specific client ID at session creation and that client ID then
989	   becomes the client ID associated with all requests issued using it.
990	   Therefore, unlike NFSv4.0, the only NFSv4.1 operations possible
991	   before a client ID is established, are those directly connected with
992	   establishing the client ID.

994	   A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION
995	   operation using that client ID (eir_clientid as returned from
996	   EXCHANGE_ID) is required to establish the identification on the
997	   server.  Establishment of identification by a new incarnation of the
998	   client also has the effect of immediately releasing any locking state
999	   that a previous incarnation of that same client might have had on the
1000	   server.  Such released state would include all lock, share
1001	   reservation, and, where the server is not supporting the
1002	   CLAIM_DELEGATE_PREV claim type, all delegation state associated with
1003	   same client with the same identity.  For discussion of delegation
1004	   state recovery, see Section 9.2.1.

1006	   Releasing such state requires that the server be able to determine
1007	   that one client instance is the successor of another.  Where this
1008	   cannot be done, for any of a number of reasons, the locking state
1009	   will remain for a time subject to lease expiration (see Section 8.5)
1010	   and the new client will need to wait for such state to be removed, if
1011	   it makes conflicting lock requests.

1013	   Client identification is encapsulated in the following Client Owner
1014	   structure:

1016	           struct client_owner4 {
1017	                   verifier4       co_verifier;
1018	                   opaque          co_ownerid<NFS4_OPAQUE_LIMIT>;
1019	           };

1021	   The first field, co_verifier, is a client incarnation verifier that
1022	   is used to detect client reboots.  Only if the co_verifier is
1023	   different from that the server had previously recorded for the client
1024	   (as identified by the second field of the structure, co_ownerid) does
1025	   the server start the process of canceling the client's leased state.

1027	   The second field, co_ownerid is a variable length string that
1028	   uniquely defines the client so that subsequent instances of the same
1029	   client bear the same co_ownerid with a different verifier.

1031	   There are several considerations for how the client generates the
1032	   co_ownerid string:

1034	   o  The string should be unique so that multiple clients do not
1035	      present the same string.  The consequences of two clients
1036	      presenting the same string range from one client getting an error
1037	      to one client having its leased state abruptly and unexpectedly
1038	      canceled.

1040	   o  The string should be selected so the subsequent incarnations (e.g.
1041	      reboots) of the same client cause the client to present the same
1042	      string.  The implementor is cautioned from an approach that
1043	      requires the string to be recorded in a local file because this
1044	      precludes the use of the implementation in an environment where
1045	      there is no local disk and all file access is from an NFS version
1046	      4 server.

1048	   o  The string should be the same for each server network address that
1049	      the client accesses, rather than common to all server network
1050	      addresses (note: the precise opposite was advised in RFC3530).
1051	      This way, if a server has multiple interfaces, the client can
1052	      trunk traffic over multiple network paths as described in
1053	      Section 2.10.3.4.1.

1055	   o  The algorithm for generating the string should not assume that the
1056	      client's network address will not change, unless the client
1057	      implementation knows it is using statically assigned network
1058	      addresses.  This includes changes between client incarnations and
1059	      even changes while the client is still running in its current
1060	      incarnation.  This means that if the client includes just the
1061	      client's network address in the co_ownerid string, there is a real
1062	      risk, with dynamic address assignment, that after the client gives
1063	      up the network address, another client, using a similar algorithm
1064	      for generating the co_ownerid string, would generate a conflicting
1065	      co_ownerid string.

1067	   Given the above considerations, an example of a well generated
1068	   co_ownerid string is one that includes:

1070	   o  If applicable, the client's statically assigned network address.

1072	   o  Additional information that tends to be unique, such as one or
1073	      more of:

1075	      *  The client machine's serial number (for privacy reasons, it is
1076	         best to perform some one way function on the serial number).

1078	      *  A MAC address (again, a one way function should be performed).

1080	      *  The timestamp of when the NFS version 4 software was first
1081	         installed on the client (though this is subject to the
1082	         previously mentioned caution about using information that is
1083	         stored in a file, because the file might only be accessible
1084	         over NFS version 4).

1086	      *  A true random number.  However since this number ought to be
1087	         the same between client incarnations, this shares the same
1088	         problem as that of the using the timestamp of the software
1089	         installation.

1091	   o  For a user level NFS version 4 client, it should contain
1092	      additional information to distinguish the client from other user
1093	      level clients running on the same host, such as a process
1094	      identifier or other unique sequence.

1096	   As a security measure, the server MUST NOT cancel a client's leased
1097	   state if the principal established the state for a given co_ownerid
1098	   string is not the same as the principal issuing the EXCHANGE_ID.

1100	   A server may compare an client_owner4 in a EXCHANGE_ID with an
1101	   nfs_client_id4 established using SETCLIENTID using NFSv4 minor
1102	   version 0, so that an NFSv4.1 client is not forced to delay until
1103	   lease expiration for locking state established by the earlier client
1104	   using minor version 0.  This requires the client_owner4 be
1105	   constructed the same way as the nfs_client_id4.  If the latter's
1106	   contents included the server's network address, and the NFSv4.1
1107	   client does not wish to use a client ID that prevents trunking, it
1108	   should issue two EXCHANGE_ID operations.  The first EXCHANGE_ID will
1109	   have a client_owner4 equal to the nfs_client_id4.  This will clear
1110	   the state created by the NFSv4.0 client.  The second EXCHANGE_ID will
1111	   not have the server's network address.  The state created for the
1112	   second EXCHANGE_ID will not have to wait for lease expiration,
1113	   because there will be no state to expire.

1115	   Once an EXCHANGE_ID has been done, and the resulting client ID
1116	   established as associated with a session, all requests made on that
1117	   session implicitly identify that client ID, which in turn designates
1118	   the client specified using the long-form client_owner4 structure.
1119	   The shorthand client identifier (a client ID) is assigned by the
1120	   server (the eir_clientid result from EXCHANGE_ID) and should be
1121	   chosen so that it will not conflict with a client ID previously
1122	   assigned by the server.  This applies across server restarts or
1123	   reboots.

1125	   In the event of a server restart, a client may find out that its
1126	   current client ID is no longer valid when receives a
1127	   NFS4ERR_STALE_CLIENTID error.  The precise circumstances depend of
1128	   the characteristics of the sessions involved, specifically whether
1129	   the session is persistent (see Section 2.10.4.5).

1131	   When a session is not persistent, the client will need to create a
1132	   new session.  When the existing client ID is presented to a server as
1133	   part of creating a session and that client ID is not recognized, as
1134	   would happen after a server reboot, the server will reject the
1135	   request with the error NFS4ERR_STALE_CLIENTID.  When this happens,
1136	   the client must obtain a new client ID by use of the EXCHANGE_ID
1137	   operation and then use that client ID as the basis of the basis of a
1138	   new session and then proceed to any other necessary recovery for the
1139	   server reboot case (See Section 8.6.2).

1141	   In the case of the session being persistent, the client will re-
1142	   establish communication using the existing session after the reboot.
1143	   This session will be associated with a client ID that has had state
1144	   revoked (but the persistent session is never associated with a stale
1145	   client ID, because if the session is persistent, the client ID MUST
1146	   persist), and the client will receive an indication of that fact in
1147	   the sr_status_flags field returned by the SEQUENCE operation (see
1148	   Section 17.46.4).  The client can then use the existing session to do
1149	   whatever operations are necessary to determine the status of requests
1150	   outstanding at the time of reboot, while avoiding issuing new
1151	   requests, particularly any involving locking on that session.  Such
1152	   requests would fail with an NFS4ERR_STALE_STATEID error, if
1153	   attempted.

1155	   See the detailed descriptions of EXCHANGE_ID (Section 17.35 and
1156	   CREATE_SESSION (Section 17.36) for a complete specification of these
1157	   operations.

1159	2.4.1.  Server Release of Client ID

1161	   NFSv4.1 introduces a new operation called DESTROY_CLIENTID
1162	   (Section 17.50) which the client SHOULD use to destroy a client ID it
1163	   no longer needs.  This permits graceful, bilateral release of a
1164	   client ID.

1166	   If the server determines that the client holds no associated state
1167	   for its client ID (including sessions, opens, locks, delegations,
1168	   layouts, and wants), the server may choose to unilaterally release
1169	   the client ID.  The server may make this choice for an inactive
1170	   client so that resources are not consumed by those intermittently
1171	   active clients.  If the client contacts the server after this
1172	   release, the server must ensure the client receives the appropriate
1173	   error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to
1174	   establish a new identity.  It should be clear that the server must be
1175	   very hesitant to release a client ID since the resulting work on the
1176	   client to recover from such an event will be the same burden as if
1177	   the server had failed and restarted.  Typically a server would not
1178	   release a client ID unless there had been no activity from that
1179	   client for many minutes.  As long as there are sessions, opens,
1180	   locks, delegations, layouts, or wants, the server MUST not release
1181	   the client ID.  See Section 2.10.8.1.4 for discussion on releasing
1182	   inactive sessions.

1184	2.4.2.  Handling Client Owner Conflicts

1186	   If the co_ownerid string in a EXCHANGE_ID request is properly
1187	   constructed, and if the client takes care to use the same principal
1188	   for each successive use of EXCHANGE_ID, then, barring an active
1189	   denial of service attack, conflicts are not possible.

1191	   However, client bugs, server bugs, or perhaps a deliberate change of
1192	   the principal owner of the co_ownerid string (such as the case of a
1193	   client that changes security flavors, and under the new flavor, there
1194	   is no mapping to the previous owner) will in rare cases result in a
1195	   conflict.

1197	   When the server gets a EXCHANGE_ID for a client owner that currently
1198	   has no state, or if it has state, but the lease has expired, server
1199	   MUST allow the EXCHANGE_ID, and confirm the new client ID if followed
1200	   by the appropriate CREATE_SESSION.

1202	   When the server gets a EXCHANGE_ID for a client owner that currently
1203	   has state, or an unexpired lease, and the principal that issues the
1204	   EXCHANGE_ID is different than principal the previously established
1205	   the client owner, the server MUST not destroy the any state that
1206	   currently exists for client owner.  Regardless, the server has two
1207	   choices.  First, it can return NFS4ERR_CLID_INUSE.  Second, it can
1208	   allow the EXCHANGE_ID, and simply treat the client owner as
1209	   consisting of both the co_ownerid and the principal that issued the
1210	   EXCHANGE_ID.

1212	2.5.  Server Owners

1214	   The Server Owner is somewhat similar to a Client Owner (Section 2.4),
1215	   but unlike the Client Owner, there is no shorthand serverid.  The
1216	   Server Owner is defined in the following structure:

1218	   struct server_owner4 {
1219	           uint64_t        so_minor_id;
1220	           opaque          so_major_id<NFS4_OPAQUE_LIMIT>;
1221	   };

1223	   The Server Owner is returned in the results of EXCHANGE_ID.  When the
1224	   so_major_id fields are the same in two EXCHANGE_ID results, the
1225	   connections each EXCHANGE_ID are sent over can be assumed to address
1226	   the same Server (as defined in Section 1.5).  If the so_minor_id
1227	   fields are also the same, then not only do both connections connect
1228	   to the same server, but the session and other state can be shared
1229	   across both connections.  The reader is cautioned that multiple
1230	   servers may deliberately or accidentally claim to have the same
1231	   so_major_id or so_major_id/so_minor_id; the reader should examine
1232	   Section 2.10.3.4.1 and Section 17.35.

1234	   The considerations for generating an so_major_id are similar to that
1235	   for generating a co_ownerid string (see Section 2.4).  The
1236	   consequences of two servers generating conflict so_major_id values
1237	   are less dire than they are for co_ownerid conflicts because the
1238	   client can use RPCSEC_GSS to compare the authenticity of each server
1239	   (see Section 2.10.3.4.1).

1241	2.6.  Security Service Negotiation

1243	   With the NFS version 4 server potentially offering multiple security
1244	   mechanisms, the client needs a method to determine or negotiate which
1245	   mechanism is to be used for its communication with the server.  The
1246	   NFS server may have multiple points within its file system namespace
1247	   that are available for use by NFS clients.  These points can be
1248	   considered security policy boundaries, and in some NFS
1249	   implementations are tied to NFS export points.  In turn the NFS
1250	   server may be configured such that each of these security policy
1251	   boundaries may have different or multiple security mechanisms in use.

1253	   The security negotiation between client and server must be done with
1254	   a secure channel to eliminate the possibility of a third party
1255	   intercepting the negotiation sequence and forcing the client and
1256	   server to choose a lower level of security than required or desired.
1257	   See Section 20 for further discussion.

1259	2.6.1.  NFSv4 Security Tuples

1261	   An NFS server can assign one or more "security tuples" to each
1262	   security policy boundary in its namespace.  Each security tuple
1263	   consists of a security flavor (see Section 2.2.1.1), and if the
1264	   flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of
1265	   protection, and an RPCSEC_GSS service.

1267	2.6.2.  SECINFO and SECINFO_NO_NAME

1269	   The SECINFO and SECINFO_NO_NAME operations allow the client to
1270	   determine, on a per filehandle basis, what security tuple is to be
1271	   used for server access.  In general, the client will not have to use
1272	   either operation except during initial communication with the server
1273	   or when the client crosses security policy boundaries at the server.
1274	   It is possible that the server's policies change during the client's
1275	   interaction therefore forcing the client to negotiate a new security
1276	   tuple.

1278	   Where the use of different security tuples would affect the type of
1279	   access that would be allowed if a request was issued over the same
1280	   connection used for the SECINFO or SECINFO_NO_NAME operation (e.g.
1281	   read-only vs. read-write) access, security tuples that allow greater
1282	   access should be presented first.  Where the general level of access
1283	   is the same and different security flavors limit the range of
1284	   principals whose privileges are recognized (e.g. allowing or
1285	   disallowing root access), flavors supporting the greatest range of
1286	   principals should be listed first.

1288	2.6.3.  Security Error

1290	   Based on the assumption that each NFS version 4 client and server
1291	   must support a minimum set of security (i.e., LIPKEY, SPKM-3, and
1292	   Kerberos-V5 all under RPCSEC_GSS), the NFS client will initiate file
1293	   access to the server with one of the minimal security tuples.  During
1294	   communication with the server, the client may receive an NFS error of
1295	   NFS4ERR_WRONGSEC.  This error allows the server to notify the client
1296	   that the security tuple currently being used is contravenes the
1297	   server's security policy.  The client is then responsible for
1298	   determining (see Section 2.6.3.1) what security tuples are available
1299	   at the server and choosing one which is appropriate for the client.

1301	2.6.3.1.  Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME

1303	   This section explains of the mechanics of NFSv4.1 security
1304	   negotiation.  The term "put filehandle operation" refers to
1305	   PUTROOTFH, PUTPUBFH, PUTFH, and RESTOREFH.

1307	2.6.3.1.1.  Put Filehandle Operation + SAVEFH

1309	   The client is saving a filehandle for a future RESTOREFH.  The server
1310	   MUST NOT return NFS4ERR_WRONG to either the put filehandle operation
1311	   or SAVEFH.

1313	2.6.3.1.2.  Two or More Put Filehandle Operations

1315	   For a series of N put filehandle operations, the server MUST NOT
1316	   return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations.
1317	   The Nth put filehandle operation is handled as if it is the first in
1318	   a series of operations, and the second in the series of operations is
1319	   not a put filehandle operation.  For example if the server received
1320	   PUTFH, PUTROOTFH, LOOKUP, then the PUTFH is ignored for
1321	   NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is
1322	   processed as according to Section 2.6.3.1.3.

1324	2.6.3.1.3.  Put Filehandle Operation + LOOKUP (or OPEN by Name)

1326	   This situation also applies to a put filehandle operation followed by
1327	   a LOOKUP or an OPEN operation that specifies a component name.

1329	   In this situation, the client is potentially crossing a security
1330	   policy boundary, and the set of security tuples the parent directory
1331	   supports differ from those of the child.  The server implementation
1332	   may decide whether to impose any restrictions on security policy
1333	   administration.  There are at least three approaches
1334	   (sec_policy_child is the tuple set of the child export,
1335	   sec_policy_parent is that of the parent).

1337	     a)  sec_policy_child <= sec_policy_parent (<= for subset).  This
1338	      means that the set of security tuples specified on the security
1339	      policy of a child directory is always a subset of that of its
1340	      parent directory.

1342	     b)  sec_policy_child ^ sec_policy_parent != {} (^ for intersection,
1343	      {} for the empty set).  This means that the security tuples
1344	      specified on the security policy of a child directory always has a
1345	      non empty intersection with that of the parent.

1347	     c)  sec_policy_child ^ sec_policy_parent == {}.  This means that
1348	      the set of tuples specified on the security policy of a child
1349	      directory may not intersect with that of the parent.  In other
1350	      words, there are no restrictions on how the system administrator
1351	      may set up these tuples.

1353	   For a server to support approach (b) (when client chooses a flavor
1354	   that is not a member of sec_policy_parent) and (c), the put
1355	   filehandle operation must NOT return NFS4ERR_WRONGSEC in case of
1356	   security mismatch.  Instead, it should be returned from the LOOKUP
1357	   (or OPEN by component name) that follows.

1359	   Since the above guideline does not contradict approach (a), it should
1360	   be followed in general.  Even if approach (a) is implemented, it is
1361	   possible for the security tuple used to be acceptable for the target
1362	   of LOOKUP but not for the filehandles used in the put filehandle
1363	   operation.  The put filehandle operation could be a PUTROOTFH or
1364	   PUTPUBFH, where the client cannot know the security tuples for the
1365	   root or public filehandle.  Or the security policy for the filehandle
1366	   used by the put filehandle operation could have changed since the
1367	   time the filehandle was obtained.

1369	   Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in
1370	   response to the put filehandle operation if the operation is
1371	   immediately followed by a LOOKUP or an OPEN by component name.

1373	2.6.3.1.4.  Put Filehandle Operation + LOOKUPP

1375	   Since SECINFO only works its way down, there is no way LOOKUPP can
1376	   return NFS4ERR_WRONGSEC without SECINFO_NO_NAME.  SECINFO_NO_NAME
1377	   solves this issue because via style SECINFO_STYLE4_PARENT, it works
1378	   in the opposite direction as SECINFO.  As with Section 2.6.3.1.3, the
1379	   put filehandle operation must not return NFS4ERR_WRONGSEC whenever it
1380	   is followed by LOOKUPP.  If the server does not support
1381	   SECINFO_NO_NAME, the client's only recourse is to issue the put
1382	   filehandle operation, LOOKUPP, GETFH sequence of operations with
1383	   every security tuple it supports.

1385	   Regardless whether SECINFO_NO_NAME is supported, an NFSv4.1 server
1386	   MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle
1387	   operation if the operation is immediately followed by a LOOKUPP.

1389	2.6.3.1.5.  Put Filehandle Operation + SECINFO/SECINFO_NO_NAME

1391	   A security sensitive client is allowed to choose a strong security
1392	   tuple when querying a server to determine a file object's permitted
1393	   security tuples.  The security tuple chosen by the client does not
1394	   have to be included in the tuple list of the security policy of the
1395	   either parent directory indicated in the put filehandle operation, or
1396	   the child file object indicated in SECINFO (or any parent directory
1397	   indicated in SECINFO_NO_NAME).  Of course the server has to be
1398	   configured for whatever security tuple the client selects, otherwise
1399	   the request will fail at RPC layer with an appropriate authentication
1400	   error.

1402	   In theory, there is no connection between the security flavor used by
1403	   SECINFO or SECINFO_NO_NAME and those supported by the security
1404	   policy.  But in practice, the client may start looking for strong
1405	   flavors from those supported by the security policy, followed by
1406	   those in the mandatory set.

1408	   The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put
1409	   filehandle operation whenever it is immediately followed by SECINFO
1410	   or SECINFO_NO_NAME.  The NFSv4.1 server MUST NOT return
1411	   NFS4ERR_WRONGSEC from SECINFO or SECINFO_NO_NAME.

1413	2.6.3.1.6.  Put Filehandle Operation + Nothing

1415	   The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC.

1417	2.6.3.1.7.  Put Filehandle Operation + Anything Else

1419	   "Anything Else" includes OPEN by filehandle.

1421	   The security policy enforcement applies to the filehandle specified
1422	   in the put filehandle operation.  Therefore PUTFH must return
1423	   NFS4ERR_WRONGSEC in case of security tuple on the part of the
1424	   mismatch.  This avoids the complexity adding NFS4ERR_WRONGSEC as an
1425	   allowable error to every other operation.

1427	   A COMPOUND containing the series put filehandle operation +
1428	   SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way
1429	   for the client to recover from NFS4ERR_WRONGSEC.

1431	   The NFSv4.1 server MUST not return NFS4ERR_WRONGSEC to any operation
1432	   other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
1433	   component name).

1435	2.7.  Minor Versioning

1437	   To address the requirement of an NFS protocol that can evolve as the
1438	   need arises, the NFS version 4 protocol contains the rules and
1439	   framework to allow for future minor changes or versioning.

1441	   The base assumption with respect to minor versioning is that any
1442	   future accepted minor version must follow the IETF process and be
1443	   documented in a standards track RFC.  Therefore, each minor version
1444	   number will correspond to an RFC.  Minor version zero of the NFS
1445	   version 4 protocol is represented by [2], and minor version one is
1446	   represented by this document [[Comment.2: change "document" to "RFC"
1447	   when we publish]] .  The COMPOUND and CB_COMPOUND procedures support
1448	   the encoding of the minor version being requested by the client.

1450	   The following items represent the basic rules for the development of
1451	   minor versions.  Note that a future minor version may decide to
1452	   modify or add to the following rules as part of the minor version
1453	   definition.

1455	   1.   Procedures are not added or deleted

1457	        To maintain the general RPC model, NFS version 4 minor versions
1458	        will not add to or delete procedures from the NFS program.

1460	   2.   Minor versions may add operations to the COMPOUND and
1461	        CB_COMPOUND procedures.

1463	        The addition of operations to the COMPOUND and CB_COMPOUND
1464	        procedures does not affect the RPC model.

1466	        *  Minor versions may append attributes to GETATTR4args,
1467	           bitmap4, and GETATTR4res.

1469	           This allows for the expansion of the attribute model to allow
1470	           for future growth or adaptation.

1472	        *  Minor version X must append any new attributes after the last
1473	           documented attribute.

1475	           Since attribute results are specified as an opaque array of
1476	           per-attribute XDR encoded results, the complexity of adding
1477	           new attributes in the midst of the current definitions would
1478	           be too burdensome.

1480	   3.   Minor versions must not modify the structure of an existing
1481	        operation's arguments or results.

1483	        Again the complexity of handling multiple structure definitions
1484	        for a single operation is too burdensome.  New operations should
1485	        be added instead of modifying existing structures for a minor
1486	        version.

1488	        This rule does not preclude the following adaptations in a minor
1489	        version.

1491	        *  adding bits to flag fields such as new attributes to
1492	           GETATTR's bitmap4 data type

1494	        *  adding bits to existing attributes like ACLs that have flag
1495	           words

1497	        *  extending enumerated types (including NFS4ERR_*) with new
1498	           values

1500	   4.   Minor versions may not modify the structure of existing
1501	        attributes.

1503	   5.   Minor versions may not delete operations.

1505	        This prevents the potential reuse of a particular operation
1506	        "slot" in a future minor version.

1508	   6.   Minor versions may not delete attributes.

1510	   7.   Minor versions may not delete flag bits or enumeration values.

1512	   8.   Minor versions may declare an operation as mandatory to NOT
1513	        implement.

1515	        Specifying an operation as "mandatory to not implement" is
1516	        equivalent to obsoleting an operation.  For the client, it means
1517	        that the operation should not be sent to the server.  For the
1518	        server, an NFS error can be returned as opposed to "dropping"
1519	        the request as an XDR decode error.  This approach allows for
1520	        the obsolescence of an operation while maintaining its structure
1521	        so that a future minor version can reintroduce the operation.

1523	        1.  Minor versions may declare attributes mandatory to NOT
1524	            implement.

1526	        2.  Minor versions may declare flag bits or enumeration values
1527	            as mandatory to NOT implement.

1529	   9.   Minor versions may downgrade features from mandatory to
1530	        recommended, or recommended to optional.

1532	   10.  Minor versions may upgrade features from optional to recommended
1533	        or recommended to mandatory.

1535	   11.  A client and server that supports minor version X should support
1536	        minor versions 0 (zero) through X-1 as well.

1538	   12.  Except for infrastructural changes, no new features may be
1539	        introduced as mandatory in a minor version.

1541	        This rule allows for the introduction of new functionality and
1542	        forces the use of implementation experience before designating a
1543	        feature as mandatory.  On the other hand, some classes of
1544	        features are infrastructural and have broad effects.  Allowing
1545	        such features to not be mandatory complicates implementation of
1546	        the minor version.

1548	   13.  A client MUST NOT attempt to use a stateid, filehandle, or
1549	        similar returned object from the COMPOUND procedure with minor
1550	        version X for another COMPOUND procedure with minor version Y,
1551	        where X != Y.

1553	2.8.  Non-RPC-based Security Services

1555	   As described in Section 2.2.1.1.1.1, NFSv4 relies on RPC for
1556	   identification, authentication, integrity, and privacy.  NFSv4 itself
1557	   provides additional security services as described in the next
1558	   several subsections.

1560	2.8.1.  Authorization

1562	   Authorization to access a file object via an NFSv4 operation is
1563	   ultimately determined by the NFSv4 server.  A client can predetermine
1564	   its access to a file object via the OPEN (Section 17.16) and the
1565	   ACCESS (Section 17.1) operations.

1567	   Principals with appropriate access rights can modify the
1568	   authorization on a file object via the SETATTR (Section 17.30)
1569	   operation.  Four attributes that affect access rights are: mode,
1570	   owner, owner_group, and acl.  See Section 5.

1572	2.8.2.  Auditing

1574	   NFSv4 provides auditing on a per file object basis, via the ACL
1575	   attribute as described in Section 6.  It is outside the scope of this
1576	   specification to specify audit log formats or management policies.

1578	2.8.3.  Intrusion Detection

1580	   NFSv4 provides alarm control on a per file object basis, via the ACL
1581	   attribute as described in Section 6.  Alarms may serve as the basis
1582	   for intrusion detection.  It is outside the scope of this
1583	   specification to specify heuristics for detecting intrusion via
1584	   alarms.

1586	2.9.  Transport Layers

1588	2.9.1.  Required and Recommended Properties of Transports

1590	   NFSv4 works over RDMA and non-RDMA_based transports with the
1591	   following attributes:

1593	   o  The transport supports reliable delivery of data, which NFSv4
1594	      requires but neither NFSv4 nor RPC has facilities for ensuring.
1595	      [20]

1597	   o  The transport delivers data in the order it was sent.  Ordered
1598	      delivery simplifies detection of transmit errors, and simplifies
1599	      the sending of arbitrary sized requests and responses, via the
1600	      record marking protocol [4].

1602	   Where an NFS version 4 implementation supports operation over the IP
1603	   network protocol, any transport used between NFS and IP MUST be among
1604	   the IETF-approved congestion control transport protocols.  At the
1605	   time this document was written, the only two transports that had the
1606	   above attributes were TCP and SCTP.  To enhance the possibilities for
1607	   interoperability, an NFS version 4 implementation MUST support
1608	   operation over the TCP transport protocol.

1610	   Even if NFS version 4 is used over a non-IP network protocol, it is
1611	   RECOMMENDED that the transport support congestion control.

1613	   It is permissible for a connectionless transport to be used under
1614	   NFSv4.1, however reliable and in-order delivery of data by the
1615	   connectionless transport is still required.  NFSv4.1 assumes that a
1616	   client transport address and server transport address used to send
1617	   data over a transport together constitute a connection, even if the
1618	   underlying transport eschews the concept of a connection.

1620	2.9.2.  Client and Server Transport Behavior

1622	   If a connection-oriented transport (e.g.  TCP) is used the client and
1623	   server SHOULD use long lived connections for at least three reasons:

1625	   1.  This will prevent the weakening of the transport's congestion
1626	       control mechanisms via short lived connections.

1628	   2.  This will improve performance for the WAN environment by
1629	       eliminating the need for connection setup handshakes.

1631	   3.  The NFSv4.1 callback model differs from NFSv4.0, and requires the
1632	       client and server to maintain a client-created channel (see
1633	       Section 2.10.3.4for the server to use.

1635	   In order to reduce congestion, if a connection-oriented transport is
1636	   used, and the request is not the NULL procedure,

1638	   o  A requester MUST NOT retry a request unless the connection the
1639	      request was issued over was disconnected before the reply was
1640	      received.

1642	   o  A replier MUST NOT silently drop a request, even if the request is
1643	      a retry.  (The silent drop behavior of RPCSEC_GSS [5] does not
1644	      apply because this behavior happens at the RPCSEC_GSS layer, a
1645	      lower layer in the request processing).  Instead, the replier
1646	      SHOULD return an appropriate error (see Section 2.10.4.1) or it
1647	      MAY disconnect the connection.

1649	   When using RDMA transports there are other reasons for not tolerating
1650	   retries over the same connection:

1652	   o  RDMA transports use "credits" to enforce flow control, where a
1653	      credit is a right to a peer to transmit a message.  If one peer
1654	      were to retransmit a request (or reply), it would consume an
1655	      additional credit.  If the replier retransmitted a reply, it would
1656	      certainly result in an RDMA connection loss, since the requester
1657	      would typically only post a single receive buffer for each
1658	      request.  If the requester retransmitted a request, the additional
1659	      credit consumed on the server might lead to RDMA connection
1660	      failure unless the client accounted for it and decreased its
1661	      available credit, leading to wasted resources.

1663	   o  RDMA credits present a new issue to the reply cache in NFSv4.1.
1664	      The reply cache may be used when a connection within a session is
1665	      lost, such as after the client reconnects.  Credit information is
1666	      a dynamic property of the RDMA connection, and stale values must
1667	      not be replayed from the cache.  This implies that the reply cache
1668	      contents must not be blindly used when replies are issued from it,
1669	      and credit information appropriate to the channel must be
1670	      refreshed by the RPC layer.

1672	   In addition, the NFSv4.1 requester is not allowed to stop waiting for
1673	   a reply, as described in Section 2.10.4.2.

1675	2.9.3.  Ports

1677	   Historically, NFS version 2 and version 3 servers have resided on
1678	   port 2049.  The registered port 2049 RFC3232 [21] for the NFS
1679	   protocol should be the default configuration.  NFSv4 clients SHOULD
1680	   NOT use the RPC binding protocols as described in RFC1833 [22].

1682	2.10.  Session

1684	2.10.1.  Motivation and Overview

1686	   Previous versions and minor versions of NFS have suffered from the
1687	   following:

1689	   o  Lack of support for exactly once semantics (EOS).  This includes
1690	      lack of support for EOS through server failure and recovery.

1692	   o  Limited callback support, including no support for sending
1693	      callbacks through firewalls, and races between responses from
1694	      normal requests, and callbacks.

1696	   o  Limited trunking over multiple network paths.

1698	   o  Requiring machine credentials for fully secure operation.

1700	   Through the introduction of a session, NFSv4.1 addresses the above
1701	   shortfalls with practical solutions:

1703	   o  EOS is enabled by a reply cache with a bounded size, making it
1704	      feasible to keep on persistent storage and enable EOS through
1705	      server failure and recovery.  One reason that previous revisions
1706	      of NFS did not support EOS was because some EOS approaches often
1707	      limited parallelism.  As will be explained in Section 2.10.4),
1708	      NFSv4.1 supports both EOS and unlimited parallelism.

1710	   o  The NFSv4.1 client provides creates transport connections and
1711	      gives them to the server for sending callbacks, thus solving the
1712	      firewall issue (Section 17.34).  Races between responses from
1713	      client requests, and callbacks caused by the requests are detected
1714	      via the session's sequencing properties which are a byproduct of
1715	      EOS (Section 2.10.4.3).

1717	   o  The NFSv4.1 client can add an arbitrary number of connections to
1718	      the session, and thus provide trunking (Section 2.10.3.4.1).

1720	   o  The NFSv4.1 session produces a session key independent of client
1721	      and server machine credentials which can be used to compute a
1722	      digest for protecting key session management operations
1723	      Section 2.10.6.3).

1725	   o  The NFSv4.1 client can also create secure RPCSEC_GSS contexts for
1726	      use by the session's callback channel that do not require the
1727	      server to authenticate to a client machine principal
1728	      (Section 2.10.6.2).

1730	   A session is a dynamically created, long-lived server object created
1731	   by a client, used over time from one or more transport connections.
1732	   Its function is to maintain the server's state relative to the
1733	   connection(s) belonging to a client instance.  This state is entirely
1734	   independent of the connection itself, and indeed the state exists
1735	   whether the connection exists or not (though locks, delegations, etc.
1736	   and generally expire in the extended absence of an open connection).
1737	   The session in effect becomes the object representing an active
1738	   client on a set of zero or more connections.

1740	2.10.2.  NFSv4 Integration

1742	   Sessions are part of NFSv4.1 and not NFSv4.0.  Normally, a major
1743	   infrastructure change like sessions would require a new major version
1744	   number to an RPC program like NFS.  However, because NFSv4
1745	   encapsulates its functionality in a single procedure, COMPOUND, and
1746	   because COMPOUND can support an arbitrary number of operations,
1747	   sessions are almost trivially added.  COMPOUND includes a minor
1748	   version number field, and for NFSv4.1 this minor version is set to 1.
1749	   When the NFSv4 server processes a COMPOUND with the minor version set
1750	   to 1, it expects a different set of operations than it does for
1751	   NFSv4.0.  One operation it expects is the SEQUENCE operation, which
1752	   is required for every COMPOUND that operates over an established
1753	   session.

1755	2.10.2.1.  SEQUENCE and CB_SEQUENCE

1757	   In NFSv4.1, when the SEQUENCE operation is present, it is always the
1758	   first operation in the COMPOUND procedure.  The primary purpose of
1759	   SEQUENCE is to carry the session identifier.  The session identifier
1760	   associates all other operations in the COMPOUND procedure with a
1761	   particular session.  SEQUENCE also contains required information for
1762	   maintaining EOS (see Section 2.10.4).  Session-enabled NFSv4.1
1763	   COMPOUND requests thus have the form:

1765	       +-----+--------------+-----------+------------+-----------+----
1766	       | tag | minorversion | numops    |SEQUENCE op | op + args | ...
1767	       |     |   (== 1)     | (limited) |  + args    |           |
1768	       +-----+--------------+-----------+------------+-----------+----

1770	       and the reply's structure is:

1772	       +------------+-----+--------+-------------------------------+--//
1773	       |last status | tag | numres |status + SEQUENCE op + results |  //
1774	       +------------+-----+--------+-------------------------------+--//
1775	               //-----------------------+----
1776	               // status + op + results | ...
1777	               //-----------------------+----

1779	   A CB_COMPOUND procedure request and reply has a similar form, but
1780	   instead of a SEQUENCE operation, there is a CB_SEQUENCE operation,
1781	   and there is an additional field called "callback_ident", which is
1782	   superfluous in NFSv4.1.  CB_SEQUENCE has the same information as
1783	   SEQUENCE, but includes other information needed to solve callback
1784	   races (Section 2.10.4.3).

1786	2.10.2.2.  Client ID and Session Association

1788	   Sessions are subordinate to the client ID (Section 2.4).  Each client
1789	   ID can have zero or more active sessions.  A client ID, and a session
1790	   bound to it are required to do anything useful in NFSv4.1.  Each time
1791	   a session is used, the state leased to its associated client ID is
1792	   automatically renewed.

1794	   State such as share reservations, locks, delegations, and layouts
1795	   (Section 1.4.4) is tied to the client ID, not the sessions of the
1796	   client ID.  Successive state changing operations from a given state
1797	   owner can go over different sessions, as long each session is
1798	   associated with the same client ID.  Callbacks can arrive over a
1799	   different session than the session that sent the operation the
1800	   acquired the state that the callback is for.  For example, if session
1801	   A is used to acquire a delegation, a request to recall the delegation
1802	   can arrive over session B.

1804	2.10.3.  Channels

1806	   Each session has one or two channels: the "operation" or "fore"
1807	   channel used for ordinary requests from client to server, and the
1808	   "back" channel, used for callback requests from server to client.
1809	   The session allocates resources for each channel, including separate
1810	   reply caches (see Section 2.10.4.1).  These resources are for the
1811	   most part specified at time the session is created.

1813	2.10.3.1.  Operation Channel

1815	   The operation channel carries COMPOUND requests and responses.  A
1816	   session always has an operation channel.

1818	2.10.3.2.  Backchannel

1820	   The backchannel carries CB_COMPOUND requests and responses.  Whether
1821	   there is a backchannel or not is a decision of the client; NFSv4.1
1822	   servers MUST support backchannels.

1824	2.10.3.3.  Session and Channel Association

1826	   Because there are at most two channels per session, and because each
1827	   channel has a distinct purpose, channels are not assigned
1828	   identifiers.  The operation and backchannel are implicitly created
1829	   and associated when the session is created.

1831	2.10.3.4.  Connection and Channel Association

1833	   Each channel is associated with zero or more transport connections.
1834	   A connection can be bound to one channel or both channels of a
1835	   session; the client and server negotiate whether a connection will
1836	   carry traffic for one channel or both channels via the CREATE_SESSION
1837	   (Section 17.36) and the BIND_CONN_TO_SESSION (Section 17.34)
1838	   operations.  When a session is created via CREATE_SESSION, it is
1839	   automatically bound to the operation channel, and optionally the
1840	   backchannel.  If the client does not specify connecting binding
1841	   enforcement when the session is created, then additional connections
1842	   are automatically bound to the operation channel when the are used
1843	   with a SEQUENCE operation that has the session's sessionid.

1845	   A connection MAY be bound to the channels of other sessions.  The
1846	   client decides, and the NFSv4.1 server MUST allow it.  A connection
1847	   MAY be bound to the channels of other sessions of other clientids.
1848	   Again, the client decides, and the server MUST allow it.

1850	   It is permissible for connections of multiple types to be bound to
1851	   the same channel.  For example a TCP and RDMA connection can be bound
1852	   to the operation channel.  In the event an RDMA and non-RDMA
1853	   connection are bound to the same channel, the maximum number of slots
1854	   must be at least one more than the total number of credits.  This way
1855	   if all RDMA credits are use, the non-RDMA connection can have at
1856	   least one outstanding request.

1858	   It is permissible for a connection of one type to be bound to the
1859	   operation channel, and another type bound to the backchannel.

1861	2.10.3.4.1.  Trunking

1863	   A client is allowed to issue EXCHANGE_ID multiple times to the same
1864	   server.  The client may be unaware that two different server network
1865	   addresses refer to the same server.  The use of EXCHANGE_ID allows a
1866	   client to become aware that an additional network address refers to a
1867	   server the client already has an established client ID and session
1868	   for.  The eir_server_owner and eir_server_scope results from
1869	   EXCHANGE_ID give a client a hint that the server it is connected to
1870	   may be the same as the server it is connected to via another
1871	   connection.  When EXCHANGE_ID is issued over two different
1872	   connections, and each return the same eir_server_owner.so_major_id
1873	   and eir_server_scope, the client treats the connections as connected
1874	   to the same server (subject to verification, as described later in
1875	   this section (Paragraph 2), even if the destination network addresses
1876	   are different).  As long two unrelated servers have not selected and
1877	   returned a conflicting pair of eir_major_id and eir_server_scope, or
1878	   unless the client has used different co_ownerid values in each
1879	   EXCHANGE_ID request, or the server has lost client ID state (e.g. the
1880	   server has rebooted) the server MUST return the same eir_clientid
1881	   result.  Otherwise, the client and server use the common eir_clientid
1882	   to identify the client.  The eir_server_owner.so_minor_id field
1883	   allows the server to control binding of connections to sessions.
1884	   When two connections have a matching eir_server_scope, so_major_id
1885	   and so_minor_id, the client may bind both connections to a common
1886	   session; this is session trunking.  When two connections have a
1887	   matching so_major_id and eir_server_scope, but different so_minor_id,
1888	   the client will need to create a new session for the client ID in
1889	   order to use the connection; this is client ID trunking.  In either
1890	   session or client ID trunking, the bandwidth capacity can scale with
1891	   the number of connections.

1893	   When two servers over two connections claim matching or partially
1894	   matching eir_server_owner, eir_server_scope, and eir_clientid values
1895	   the client does not have to trust the servers' claims.  The client
1896	   may verify these claims before trunking traffic in the following
1897	   ways:

1899	   o  For session trunking, clients and servers can reliably verify if
1900	      connections between different network paths are in fact bound to
1901	      the same NFSv4.1 server and usable on the same session.  The
1902	      SET_SSV (Section 17.47) operation allows a client and server to
1903	      establish a unique, shared key value (the SSV).  When a new
1904	      connection is bound to the session (via the BIND_CONN_TO_SESSION
1905	      operation, see Section 17.34), the client offers a digest that is
1906	      based on the SSV.  If the client mistakenly tries to bind a
1907	      connection to a session of a wrong server, the server will either
1908	      reject the attempt because it is not aware of the session
1909	      identifier of the BIND_CONN_TO_SESSION arguments, or it will
1910	      reject the attempt because the digest for the SSV does not match
1911	      what the server expects.  Even if the server mistakenly or
1912	      maliciously accepts the connection bind attempt, the digest it
1913	      computes in the response will not be verified by the client, the
1914	      client will know it cannot use the connection for trunking the
1915	      specified channel.

1917	   o  In the case of client ID trunking, the client can use RPCSEC_GSS
1918	      to verify that each connection is aimed at the same server.  When
1919	      the client invokes EXCHANGE_ID, it should use RPCSEC_GSS.  If each
1920	      RPCSEC_GSS context over each connection has the same server
1921	      principal, then -- barring a compromise of the server's GSS
1922	      credentials -- the servers at the end of each connection are the
1923	      same.

1925	2.10.4.  Exactly Once Semantics

1927	   Via the session, NFSv4.1 offers exactly once semantics (EOS) for
1928	   requests sent over a channel.  EOS is supported on both the operation
1929	   and back channels.

1931	   Each COMPOUND or CB_COMPOUND request that is issued with a leading
1932	   SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
1933	   exactly once.  This requirement is regardless whether the request is
1934	   issued with reply caching specified (see Section 2.10.4.1.2).  The
1935	   requirement holds even if the requester is issuing the request over a
1936	   session created between a pNFS data client and pNFS data server.  The
1937	   rationale for this requirement is understood by categorizing requests
1938	   into three classifications:

1940	   o  Nonidempotent requests.

1942	   o  Idempotent modifying requests.

1944	   o  Idempotent non-modifying requests.

1946	   An example of a non-idempotent request is RENAME.  If is obvious that
1947	   if a replier executes the same RENAME request twice, and the first
1948	   execution succeeds, the re-execution will fail.  If the replier
1949	   returns the result from the re-execution, this result is incorrect.
1950	   Therefore, EOS is required for nonidempotent requests.

1952	   An example of an idempotent modifying request is a COMPOUND request
1953	   containing a WRITE operation.  Repeated execution of the same WRITE
1954	   has the same effect as execution of that write once.  Nevertheless,
1955	   putting enforcing EOS for WRITEs and other idempotent modifying
1956	   requests is necessary to avoid data corruption.

1958	   Suppose a client issues WRITEs A, B, C to a noncompliant server that
1959	   does not enforce EOS, and receives no response, perhaps due to a
1960	   network partition.  The client reconnects to the server and re-issues
1961	   all three WRITEs.  Now, the server has outstanding two instances of
1962	   each of A, B, and C. The server can be in a situation in which it
1963	   executes and replies to the retries of A, B, and C while the first A,
1964	   B, and C are still waiting around in the server's I/O system for some
1965	   resource.  Upon receiving the replies to the second attempts of
1966	   WRITEs A, B, and C, the client believes its writes are done so it is
1967	   free to do issue WRITE D which overlaps the range of one or more of
1968	   A, B, C. If any of A, B, or C are subsequently are executed for the
1969	   second time, then what has been written by D can be overwritten and
1970	   thus corrupted.

1972	   Note that it is not required the server cache the reply to the
1973	   modifying operation to avoid data corruption (but if the client
1974	   specified the reply to be cached, the server must cache it).

1976	   An example of an idempotent non-modifying request is a COMPOUND
1977	   containing SEQUENCE, PUTFH, READLINK and nothing else.  The re-
1978	   execution of a such a request will not cause data corruption, or
1979	   produce an incorrect result.  Nonetheless, for simplicity, the
1980	   replier MUST enforce EOS for such requests.

1982	2.10.4.1.  Slot Identifiers and Reply Cache

1984	   The RPC layer provides a transaction ID (xid), which, while required
1985	   to be unique, is not especially convenient for tracking requests.
1986	   The xid is only meaningful to the requester it cannot be interpreted
1987	   at the replier except to test for equality with previously issued
1988	   requests.  Because RPC operations may be completed by the replier in
1989	   any order, many transaction IDs may be outstanding at any time.  The
1990	   requester may therefore perform a computationally expensive lookup
1991	   operation in the process of demultiplexing each reply.

1993	   In the NFSv4.1, there is a limit to the number of active requests.
1994	   This immediately enables a computationally efficient index for each
1995	   request which is designated as a Slot Identifier, or slotid.

1997	   When the requester issues a new request, it selects a slotid in the
1998	   range 0..N-1, where N is the replier's current "outstanding requests"
1999	   limit granted to the requester on the session over which the request
2000	   is to be issued.  The value of N outstanding requests starts out as
2001	   the value of ca_maxrequests (Section 17.36), but can be adjusted by
2002	   the response to SEQUENCE or CB_SEQUENCE as described later in this
2003	   section.  The slotid must be unused by any of the requests which the
2004	   requester has already active on the session.  "Unused" here means the
2005	   requester has no outstanding request for that slotid.  Because the
2006	   slot id is always an integer in the range 0..N-1, requester
2007	   implementations can use the slotid from a replier response to
2008	   efficiently match responses with outstanding requests, such as, for
2009	   example, by using the slotid to index into an outstanding request
2010	   array.  This can be used to avoid expensive hashing and lookup
2011	   functions in the performance-critical receive path.

2013	   The sequenceid, which accompanies the slotid in each request, is for
2014	   an important check at the server: it must be able to be determined
2015	   efficiently whether a request using a certain slotid is a retransmit
2016	   or a new, never-before-seen request.  It is not feasible for the
2017	   client to assert that it is retransmitting to implement this, because
2018	   for any given request the client cannot know the server has seen it
2019	   unless the server actually replies.  Of course, if the client has
2020	   seen the server's reply, the client would not retransmit.

2022	   The sequenceid MUST increase monotonically for each new transmit of a
2023	   given slotid, and MUST remain unchanged for any retransmission.  The
2024	   server must in turn compare each newly received request's sequenceid
2025	   with the last one previously received for that slotid, to see if the
2026	   new request is:

2028	   o  A new request, in which the sequenceid is one greater than that
2029	      previously seen in the slot (accounting for sequence wraparound).
2030	      The replier proceeds to execute the new request.

2032	   o  A retransmitted request, in which the sequenceid is equal to that
2033	      last seen in the slot.  Note that this request may be either
2034	      complete, or in progress.  The replier performs replay processing
2035	      in these cases.

2037	   o  A misordered replay, in which the sequenceid is less than
2038	      (accounting for sequence wraparound) than that previously seen in
2039	      the slot.  The replier MUST return NFS4ERR_SEQ_MISORDERED (as the
2040	      result from SEQUENCE or CB_SEQUENCE).

2042	   o  A misordered new request, in which the sequenceid is two or more
2043	      than (accounting for sequence wraparound) than that previously
2044	      seen in the slot.  Note that because the sequenceid must
2045	      wraparound one it reaches 0xFFFFFFFF, a misordered new request and
2046	      a misordered replay cannot be distinguished.  Thus, the replier
2047	      MUST return NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or
2048	      CB_SEQUENCE).

2050	   Unlike the XID, the slotid is always within a specific range; this
2051	   has two implications.  The first implication is that for a given
2052	   session, the replier need only cache the results of a limited number
2053	   of COMPOUND requests.  The second implication derives from the first,
2054	   which is unlike XID-indexed reply caches (also know as duplicate
2055	   request caches - DRCs), the slotid-based reply cache cannot be
2056	   overflowed.  Through use of the sequenceid to identify retransmitted
2057	   requests, the replier does not need to actually cache the request
2058	   itself, reducing the storage requirements of the reply cache further.
2059	   These new facilities makes it practical to maintain all the required
2060	   entries for an effective reply cache.

2062	   The slotid and sequenceid therefore take over the traditional role of
2063	   the XID and port number in the replier reply cache implementation,
2064	   and the session replaces the IP address.  This approach is
2065	   considerably more portable and completely robust - it is not subject
2066	   to the frequent reassignment of ports as clients reconnect over IP
2067	   networks.  In addition, the RPC XID is not used in the reply cache,
2068	   enhancing robustness of the cache in the face of any rapid reuse of
2069	   XIDs by the client.  [[Comment.3: We need to discuss the requirements
2070	   of the client for changing the XID.]]

2072	   The slotid information is included in each request, without violating
2073	   the minor versioning rules of the NFSv4.0 specification, by encoding
2074	   it in the SEQUENCE operation within each NFSv4.1 COMPOUND and
2075	   CB_COMPOUND procedure.  The operation easily piggybacks within
2076	   existing messages.  [[Comment.4: Need a better term than piggyback]]

2078	   The receipt of a new sequenced request arriving on any valid slot is
2079	   an indication that the previous reply cache contents of that slot may
2080	   be discarded.

2082	   The SEQUENCE (and CB_SEQUENCE) operation also carries a
2083	   "highest_slotid" value which carries additional client slot usage
2084	   information.  The requester must always provide a slotid representing
2085	   the outstanding request with the highest-numbered slot value.  The
2086	   requester should in all cases provide the most conservative value
2087	   possible, although it can be increased somewhat above the actual
2088	   instantaneous usage to maintain some minimum or optimal level.  This
2089	   provides a way for the requester to yield unused request slots back
2090	   to the replier, which in turn can use the information to reallocate
2091	   resources.

2093	   The replier responds with both a new target highest_slotid, and an
2094	   enforced highest_slotid, described as follows:

2096	   o  The target highest_slotid is an indication to the requester of the
2097	      highest_slotid the replier wishes the requester to be using.  This
2098	      permits the replier to withdraw (or add) resources from a
2099	      requester that has been found to not be using them, in order to
2100	      more fairly share resources among a varying level of demand from
2101	      other requesters.  The requester must always comply with the
2102	      replier's value updates, since they indicate newly established
2103	      hard limits on the requester's access to session resources.
2104	      However, because of request pipelining, the requester may have
2105	      active requests in flight reflecting prior values, therefore the
2106	      replier must not immediately require the requester to comply.

2108	   o  The enforced highest_slotid indicates the highest slotid the
2109	      requester is permitted to use on a subsequent SEQUENCE or
2110	      CB_SEQUENCE operation.

2112	   The requester is required to use the lowest available slot when
2113	   issuing a new request.  This way, the replier may be able to retire
2114	   slot entries faster.  However, where the replier is actively
2115	   adjusting its granted maximum request count (i.e. the highest_slotid)
2116	   to the requester, it will not not be able to use just the receipt of
2117	   the slotid and highest_slotid in the request.  Neither the slotid nor
2118	   the highest_slotid used in a request may reflect the replier's
2119	   current idea of the requester's session limit, because the request
2120	   may have been sent from the requester before the update was received.
2121	   Therefore, in the downward adjustment case, the replier may have to
2122	   retain a number of reply cache entries at least as large as the old
2123	   value of maximum requests outstanding, until operation sequencing
2124	   rules allow it to infer that the requester has seen its reply.

2126	2.10.4.1.1.  Errors from SEQUENCE and CB_SEQUENCE

2128	   Any time SEQUENCE or CB_SEQUENCE return an error, the sequenceid of
2129	   the slot MUST NOT change.  The replier MUST NOT modify the reply
2130	   cache entry for the slot whenever an error is returned from SEQUENCE
2131	   or CB_SEQUENCE.

2133	2.10.4.1.2.  Optional Reply Caching

2135	   On a per-request basis the requester can choose to direct the replier
2136	   to cache the reply to all operations after the first operation
2137	   (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
2138	   fields of the arguments to SEQUENCE or CB_SEQUENCE.  The reason it
2139	   would not direct the replier to cache the entire reply is that the
2140	   request is composed of all idempotent operations [20].  Caching the
2141	   reply may offer little benefit, and if the reply is too large (see
2142	   Section 2.10.4.4), it may not be cacheable anyway.

2144	   Whether the requester requests the reply to be cached or not has no
2145	   effect on the slot processing.  If the results of SEQUENCE or
2146	   CB_SEQUENCE are NFS4_OK, then the slot's sequenceid MUST be
2147	   incremented by one.  If a requester does not direct the replier to
2148	   cache, the reply, the replier MUST do one of following:

2150	   o  The replier can cache the entire original reply.  Even though
2151	      sa_cachethis or csa_cachethis are FALSE, the replier is always
2152	      free to cache.  It may choose this approach in order to simplify
2153	      implementation.

2155	   o  The replier enters into its reply cache a reply consisting of the
2156	      original results to the SEQUENCE or CB_SEQUENCE operation,
2157	      followed by the error NFS4ERR_RETRY_UNCACHED_REP.  Thus if the
2158	      requester later retries the request, it will get
2159	      NFS4ERR_RETRY_UNCACHE_REP.

2161	2.10.4.1.3.  Multiple Connections and Sharing the Reply Cache

2163	   Multiple connections can be bound to a session's channel, hence the
2164	   connections share the same table of slotids.  For connections over
2165	   non-RDMA transports like TCP, there are no particular considerations.
2166	   Considerations for multiple RDMA connections sharing a slot table are
2167	   discussed in Section 2.10.5.1.  [[Comment.5: Also need to discuss
2168	   when RDMA and non-RDMA share a slot table.]]

2170	2.10.4.2.  Retry and Replay

2172	   A client MUST NOT retry a request, unless the connection it used to
2173	   send the request disconnects.  The client can then reconnect and
2174	   resend the request, or it can resend the request over a different
2175	   connection.  In the case of the server resending over the
2176	   backchannel, it cannot reconnect, and either resends the request over
2177	   another connection that the client has bound to the backchannel, or
2178	   if there is no other backchannel connection, waits for the client to
2179	   bind a connection to the backchannel.

2181	   A client MUST wait for a reply to a request before using the slot for
2182	   another request.  If it does not wait for a reply, then the client
2183	   does not know what sequenceid to use for the slot on its next
2184	   request.  For example, suppose a client sends a request with
2185	   sequenceid 1, and does not wait for the response.  The next time it
2186	   uses the slot, it sends the new request with sequenceid 2.  If the
2187	   server has not seen the request with sequenceid 1, then the server is
2188	   expecting sequenceid 2, and rejects the client's new request with
2189	   NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE).

2191	   RDMA fabrics do not guarantee that the memory handles (Steering Tags)
2192	   within each RDMA three-tuple are valid on a scope [[Comment.6: What
2193	   is a three-tuple?]] outside that of a single connection.  Therefore,
2194	   handles used by the direct operations become invalid after connection
2195	   loss.  The server must ensure that any RDMA operations which must be
2196	   replayed from the reply cache use the newly provided handle(s) from
2197	   the most recent request.

2199	2.10.4.3.  Resolving server callback races with sessions

2201	   It is possible for server callbacks to arrive at the client before
2202	   the reply from related forward channel operations.  For example, a
2203	   client may have been granted a delegation to a file it has opened,
2204	   but the reply to the OPEN (informing the client of the granting of
2205	   the delegation) may be delayed in the network.  If a conflicting
2206	   operation arrives at the server, it will recall the delegation using
2207	   the callback channel, which may be on a different transport
2208	   connection, perhaps even a different network.  In NFSv4.0, if the
2209	   callback request arrives before the related reply, the client may
2210	   reply to the server with an error.

2212	   The presence of a session between client and server alleviates this
2213	   issue.  When a session is in place, each client request is uniquely
2214	   identified by its { slotid, sequenceid } pair.  By the rules under
2215	   which slot entries (reply cache entries) are retired, the server has
2216	   knowledge whether the client has "seen" each of the server's replies.
2217	   The server can therefore provide sufficient information to the client
2218	   to allow it to disambiguate between an erroneous or conflicting
2219	   callback and a race condition.

2221	   For each client operation which might result in some sort of server
2222	   callback, the server should "remember" the { slotid, sequenceid }
2223	   pair of the client request until the slotid retirement rules allow
2224	   the server to determine that the client has, in fact, seen the
2225	   server's reply.  Until the time the { slotid, sequenceid } request
2226	   pair can be retired, any recalls of the associated object MUST carry
2227	   an array of these referring identifiers (in the CB_SEQUENCE
2228	   operation's arguments), for the benefit of the client.  After this
2229	   time, it is not necessary for the server to provide this information
2230	   in related callbacks, since it is certain that a race condition can
2231	   no longer occur.

2233	   The CB_SEQUENCE operation which begins each server callback carries a
2234	   list of "referring" { slotid, sequenceid } tuples.  If the client
2235	   finds the request corresponding to the referring slotid and sequenced
2236	   id be currently outstanding (i.e. the server's reply has not been
2237	   seen by the client), it can determine that the callback has raced the
2238	   reply, and act accordingly.

2240	   The client must not simply wait forever for the expected server reply
2241	   to arrive on any of the session's operations channels, because it is
2242	   possible that they will be delayed indefinitely.  However, it should
2243	   wait for a period of time, and if the time expires it can provide a
2244	   more meaningful error such as NFS4ERR_DELAY.

2246	   [[Comment.7: We need to consider the clients' options here, and
2247	   describe them...  NFS4ERR_DELAY has been discussed as a legal reply
2248	   to CB_RECALL?]]

2250	   There are other scenarios under which callbacks may race replies,
2251	   among them pNFS layout recalls, described in Section 12.5.4.2
2252	   [[Comment.8: fill in the blanks w/others, etc...]]

2254	2.10.4.4.  COMPOUND and CB_COMPOUND Construction Issues

2256	   Very large requests and replies may pose both buffer management
2257	   issues (especially with RDMA) and reply cache issues.  When the
2258	   session is created, (Section 17.36) the client and server negotiate
2259	   the maximum sized request they will send or process
2260	   (ca_maxrequestsize), the maximum sized reply they will return or
2261	   process (ca_maxresponsesize), and the maximum sized reply they will
2262	   store in the reply cache (ca_maxresponsesize_cached).

2264	   If a request exceeds ca_maxrequestsize, the reply will have the
2265	   status NFS4ERR_REQ_TOO_BIG.  A replier may return NFS4ERR_REQ_TOO_BIG
2266	   as the status for first operation (SEQUENCE or CB_SEQUENCE) in the
2267	   request, or it may chose to return it on a subsequent operation.

2269	   If a reply exceeds ca_maxresponsesize, the reply will have the status
2270	   NFS4ERR_REP_TOO_BIG.  A replier may return NFS4ERR_REP_TOO_BIG as the
2271	   status for first operation (SEQUENCE or CB_SEQUENCE) in the request,
2272	   or it may chose to return it on a subsequent operation.

2274	   If sa_cachethis or csa_cachethis are TRUE, then the replier MUST
2275	   cache a reply except if an error is returned by the SEQUENCE or
2276	   CB_SEQUENCE operation (see Section 2.10.4.1.1).  If the reply exceeds
2277	   ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are
2278	   TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE.  Even
2279	   if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter)
2280	   is returned on a operation other than first operation (SEQUENCE or
2281	   CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or
2282	   csa_cachethis are TRUE.  For example, if a COMPOUND has eleven
2283	   operations, including SEQUENCE, the fifth operation is a RENAME, and
2284	   the tenth operation is a READ for one million bytes, server may
2285	   return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation.  Since
2286	   the server executed several operations, especially the non-idempotent
2287	   RENAME, the client's request to cache the reply needs to be honored
2288	   in order for correct operation of exactly once semantics.  If the
2289	   client retries the request, the server will have cached a reply that
2290	   contains results for ten of the eleven requested operations, with the
2291	   tenth operation having a status of NFS4ERR_REP_TOO_BIG_TO_CACHE.

2293	   A client needs to take care that when sending operations that change
2294	   the current filehandle (except for PUTFH, PUTPUBFH, and PUTROOTFH)
2295	   that it not exceed the maximum reply buffer before the GETFH
2296	   operation.  Otherwise the client will have to retry the operation
2297	   that changed the current filehandle, in order obtain the desired
2298	   filehandle.  For the OPEN operation (see Section 17.16), retry is not
2299	   always available as an option.  The following guidelines for the
2300	   handling of filehandle changing operations are advised:

2302	   o  A client SHOULD issue GETFH immediately after a current filehandle
2303	      changing operation.  This is especially important after any
2304	      current filehandle changing non-idempotent operation.  It is
2305	      critical to issue GETFH immediately after OPEN.

2307	   o  A server MAY return NFS4ERR_REP_TOO_BIG or
2308	      NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
2309	      filehandle changing operation if the reply would be too large on
2310	      the next operation.

2312	   o  A server SHOULD return NFS4ERR_REP_TOO_BIG or
2313	      NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
2314	      filehandle changing non-idempotent operation if the reply would be
2315	      too large on the next operation, especially if the operation is
2316	      OPEN.

2318	   o  A server MAY return NFS4ERR_UNSAFE_COMPOUND if it looks at the
2319	      next operation after a non-idempotent current filehandle changing
2320	      operation, and finds it is not GETFH.  The server would do this if
2321	      it is unable to determine in advance whether the total response
2322	      size would exceed ca_maxresponsesize_cached or ca_maxresponsesize.

2324	2.10.4.5.  Persistence

2326	   Since the reply cache is bounded, it is practical for the server
2327	   reply cache to persist across server reboots, and to be kept in
2328	   stable storage (a client's reply cache for callbacks need not persist
2329	   across client reboots unless the client intends for its session and
2330	   other state to persist across reboots).

2332	   o  The slot table including the sequenceid and cached reply for each
2333	      slot.

2335	   o  The sessionid.

2337	   o  The client ID.

2339	   o  The SSV (see Section 2.10.6.3).

2341	   The CREATE_SESSION (see Section 17.36 operation determines the
2342	   persistence of the reply cache.

2344	2.10.5.  RDMA Considerations

2346	   A complete discussion of the operation of RPC-based protocols atop
2347	   RDMA transports is in [RPCRDMA].  A discussion of the operation of
2348	   NFSv4, including NFSv4.1 over RDMA is in [NFSDDP].  Where RDMA is
2349	   considered, this specification assumes the use of such a layering; it
2350	   addresses only the upper layer issues relevant to making best use of
2351	   RPC/RDMA.

2353	2.10.5.1.  RDMA Connection Resources

2355	   RDMA requires its consumers to register memory and post buffers of a
2356	   specific size and number for receive operations.

2358	   Registration of memory can be a relatively high-overhead operation,
2359	   since it requires pinning of buffers, assignment of attributes (e.g.
2360	   readable/writable), and initialization of hardware translation.
2361	   Preregistration is desirable to reduce overhead.  These registrations
2362	   are specific to hardware interfaces and even to RDMA connection
2363	   endpoints, therefore negotiation of their limits is desirable to
2364	   manage resources effectively.

2366	   Following the basic registration, these buffers must be posted by the
2367	   RPC layer to handle receives.  These buffers remain in use by the
2368	   RPC/NFSv4 implementation; the size and number of them must be known
2369	   to the remote peer in order to avoid RDMA errors which would cause a
2370	   fatal error on the RDMA connection.

2372	   NFSv4.1 manages slots as resources on a per session basis (see
2373	   Section 2.10), while RDMA connections manage credits on a per
2374	   connection basis.  This means that in order for a peer to send data
2375	   over RDMA to a remote buffer, it has to have both an NFSv4.1 slot,
2376	   and an RDMA credit.

2378	2.10.5.2.  Flow Control

2380	   NFSv4.0 and all previous versions do not provide for any form of flow
2381	   control; instead they rely on the windowing provided by transports
2382	   like TCP to throttle requests.  This does not work with RDMA, which
2383	   provides no operation flow control and will terminate a connection in
2384	   error when limits are exceeded.  Limits such as maximum number of
2385	   requests outstanding are therefore negotiated when a session is
2386	   created (see the ca_maxrequests field in Section 17.36).  These
2387	   limits then provide the maxima each session's channels' connections
2388	   must operate within.  RDMA connections are managed within these
2389	   limits as described in section 3.3 of [RPCRDMA]; if there are
2390	   multiple RDMA connections, then the maximum requests for a channel
2391	   will be divided among the RDMA connections.  The limits may also be
2392	   modified dynamically at the server's choosing by manipulating certain
2393	   parameters present in each NFSv4.1 request.  In addition, the
2394	   CB_RECALL_SLOT callback operation (see Section 19.8 can be issued by
2395	   a server to a client to return RDMA credits to the server, thereby
2396	   lowering the maximum number of requests a client can have outstanding
2397	   to the server.

2399	2.10.5.3.  Padding

2401	   Header padding is requested by each peer at session initiation (see
2402	   the csa_headerpadsize argument to CREATE_SESSION in Section 17.36),
2403	   and subsequently used by the RPC RDMA layer, as described in
2404	   [RPCRDMA].  Zero padding is permitted.

2406	   Padding leverages the useful property that RDMA receives preserve
2407	   alignment of data, even when they are placed into anonymous
2408	   (untagged) buffers.  If requested, client inline writes will insert
2409	   appropriate pad bytes within the request header to align the data
2410	   payload on the specified boundary.  The client is encouraged to add
2411	   sufficient padding (up to the negotiated size) so that the "data"
2412	   field of the NFSv4.1 WRITE operation is aligned.  Most servers can
2413	   make good use of such padding, which allows them to chain receive
2414	   buffers in such a way that any data carried by client requests will
2415	   be placed into appropriate buffers at the server, ready for file
2416	   system processing.  The receiver's RPC layer encounters no overhead
2417	   from skipping over pad bytes, and the RDMA layer's high performance
2418	   makes the insertion and transmission of padding on the sender a
2419	   significant optimization.  In this way, the need for servers to
2420	   perform RDMA Read to satisfy all but the largest client writes is
2421	   obviated.  An added benefit is the reduction of message round trips
2422	   on the network - a potentially good trade, where latency is present.

2424	   The value to choose for padding is subject to a number of criteria.
2425	   A primary source of variable-length data in the RPC header is the
2426	   authentication information, the form of which is client-determined,
2427	   possibly in response to server specification.  The contents of
2428	   COMPOUNDs, sizes of strings such as those passed to RENAME, etc. all
2429	   go into the determination of a maximal NFSv4 request size and
2430	   therefore minimal buffer size.  The client must select its offered
2431	   value carefully, so as not to overburden the server, and vice- versa.
2432	   The payoff of an appropriate padding value is higher performance.

2434	                    Sender gather:
2435	        |RPC Request|Pad bytes|Length| -> |User data...|
2436	        \------+---------------------/       \
2437	                \                             \
2438	                 \    Receiver scatter:        \-----------+- ...
2439	            /-----+----------------\            \           \
2440	            |RPC Request|Pad|Length|   ->  |FS buffer|->|FS buffer|->...

2442	   In the above case, the server may recycle unused buffers to the next
2443	   posted receive if unused by the actual received request, or may pass
2444	   the now-complete buffers by reference for normal write processing.
2445	   For a server which can make use of it, this removes any need for data
2446	   copies of incoming data, without resorting to complicated end-to-end
2447	   buffer advertisement and management.  This includes most kernel-based
2448	   and integrated server designs, among many others.  The client may
2449	   perform similar optimizations, if desired.

2451	2.10.5.4.  Dual RDMA and Non-RDMA Transports

2453	   Some RDMA transports (for example see [RDDP]), [[Comment.9: need
2454	   xref]] require a "streaming" (non-RDMA) phase, where ordinary traffic
2455	   might flow before "stepping" up to RDMA mode, commencing RDMA
2456	   traffic.  Some RDMA transports start connections always in RDMA mode.
2457	   NFSv4.1 allows, but does not assume, a streaming phase before RDMA
2458	   mode.  When a connection is bound to a session, the client and server
2459	   negotiate whether the connection is used in RDMA or non-RDMA mode
2460	   (see Section 17.36 and Section 17.34).

2462	2.10.6.  Sessions Security

2464	2.10.6.1.  Session Callback Security

2466	   Via session connection binding, NFSv4.1 improves security over that
2467	   provided by NFSv4.0 for the callback channel.  The connection is
2468	   client-initiated (see Section 17.34), and subject to the same
2469	   firewall and routing checks as the operations channel.  The
2470	   connection cannot be hijacked by an attacker who connects to the
2471	   client port prior to the intended server.  At the client's option
2472	   (see Section 17.36 binding is fully authenticated before being
2473	   activated (see Section 17.34).  Traffic from the server over the
2474	   callback channel is authenticated exactly as the client specifies
2475	   (see Section 2.10.6.2).

2477	2.10.6.2.  Backchannel RPC Security

2479	   When the NFSv4.1 client establishes the backchannel, it informs the
2480	   server what security flavors and principals it must use when sending
2481	   requests over the backchannel.  If the security flavor is RPCSEC_GSS,
2482	   the client expresses the principal in the form of an established
2483	   RPCSEC_GSS context.  The server is free to use any flavor/principal
2484	   combination the server offers, but MUST NOT use unoffered
2485	   combinations.

2487	   This way, the client does not have to provide a target GSS principal
2488	   as it did with NFSv4.0, and the server does not have to implement an
2489	   RPCSEC_GSS initiator as it did with NFSv4.0.  [[Comment.10: xrefs]]

2491	   The CREATE_SESSION (Section 17.36) and BACKCHANNEL_CTL
2492	   (Section 17.33) operations allow the client to specify flavor/
2493	   principal combinations.

2495	2.10.6.3.  Protection from Unauthorized State Changes

2497	   Under some conditions, NFSv4.0 is vulnerable to a denial of service
2498	   issue with respect to its state management.

2500	   The attack works via an unauthorized client faking an open_owner4, an
2501	   open_owner/lock_owner pair, or stateid, combined with a seqid.  The
2502	   operation is sent to the NFSv4 server.  The NFSv4 server accepts the
2503	   state information, and as long as any status code from the result of
2504	   this operation is not NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID,
2505	   NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR,
2506	   NFS4ERR_RESOURCE, or NFS4ERR_NOFILEHANDLE, the sequence number is
2507	   incremented.  When the authorized client issues an operation, it gets
2508	   back NFS4ERR_BAD_SEQID, because its idea of the current sequence
2509	   number is off by one.  The authorized client's recovery options are
2510	   pretty limited, with SETCLIENTID, followed by complete reclaim of
2511	   state, which may or may not succeed completely.  That qualifies as a
2512	   denial of service attack.

2514	   If the client uses RPCSEC_GSS authentication and integrity, and every
2515	   client maps each open_owner and lock_owner one and only one
2516	   principal, and the server enforces this binding, then the conditions
2517	   leading to vulnerability to the denial of service do not exist.  One
2518	   should keep in mind that if AUTH_SYS is being used, far simpler
2519	   easier denial of service and other attacks are possible.

2521	   With NFSv4.1 sessions, the per-operation sequence number is ignored
2522	   (see Section 8.13) therefore the NFSv4.0 denial of service
2523	   vulnerability described above does not apply.  However as described
2524	   to this point in the specification, an attacker could forge the
2525	   sessionid and issue a SEQUENCE with a slot id that he expects the
2526	   legitimate client to use next.  The legitimate client could then use
2527	   the slotid with the same sequence number, and the server returns the
2528	   attacker's result from the replay cache, thereby disrupting the
2529	   legitimate client.

2531	   If we give each NFSv4.1 user their own session, and each user uses
2532	   RPCSEC_GSS authentication and integrity, then the denial of service
2533	   issue is solved, at the cost of additional per session state.  The
2534	   alternative NFSv4.1 specifies is described as follows.

2536	   Transport connections MUST be bound to a session by the client.  The
2537	   server MUST return an error to an operation (other than the operation
2538	   that binds the connection to the session) that uses an unbound
2539	   connection.  As a simplification, the transport connection used by
2540	   CREATE_SESSION (see Section 17.36) is automatically bound to the
2541	   session.  Additional connections are bound to a session via
2542	   BIND_CONN_TO_SESSION (see Section 17.34).

2544	   To prevent attackers from issuing BIND_CONN_TO_SESSION operations,
2545	   the arguments to BIND_CONN_TO_SESSION include a digest of a shared
2546	   secret called the secret session verifier (SSV) that only the client
2547	   and server know.  The digest is created via a one way, collision
2548	   resistant hash function, making it intractable for the attacker to
2549	   forge.

2551	   The SSV is sent to the server via SET_SSV (see Section 17.47).  To
2552	   prevent eavesdropping, a SET_SSV for the SSV SHOULD be protected via
2553	   RPCSEC_GSS with the privacy service.  The SSV can be changed by the
2554	   client at any time, by any principal.  However several aspects of SSV
2555	   changing prevent an attacker from engaging in a successful denial of
2556	   service attack:

2558	   o  A SET_SSV on the SSV does not replace the SSV with the argument to
2559	      SET_SSV.  Instead, the current SSV on the server is logically
2560	      exclusive ORed (XORed) with the argument to SET_SSV.  SET_SSV MUST
2561	      NOT be called with an SSV value that is zero.

2563	   o  The arguments to and results of SET_SSV include digests of the old
2564	      and new SSV, respectively.

2566	   o  Because the initial value of the SSV is zero, therefore known, the
2567	      client that opts for connecting binding enforcement, MUST issue at
2568	      least one SET_SSV operation before the first BIND_CONN_TO_SESSION
2569	      operation.  A client SHOULD issue SET_SSV as soon as a session is
2570	      created.

2572	   If a connection is disconnected, BIND_CONN_TO_SESSION is required to
2573	   bind a connection to the session, even if the connection that was
2574	   disconnected was the one CREATE_SESSION was created with.

2576	   If a client is assigned a machine principal then the client SHOULD
2577	   use the machine principal's RPCSEC_GSS context to privacy protect the
2578	   SSV from eavesdropping during the SET_SSV operation.  If a machine
2579	   principal is not being used, then the client MAY use the non-machine
2580	   principal's RPCSEC_GSS context to privacy protect the SSV.  The
2581	   server MUST accept either type of principal.  A client SHOULD change
2582	   the SSV each time a new principal uses the session.

2584	   Here are the types of attacks that can be attempted by an attacker
2585	   named Eve, and how the connection to session binding approach
2586	   addresses each attack:

2588	   o  If the Eve creates a connection after the legitimate client
2589	      establishes an SSV via privacy protection from a machine
2590	      principal's RPCSEC_GSS session, she does not know the SSV and so
2591	      cannot compute a digest that BIND_CONN_TO_SESSION will accept.
2592	      Users on the legitimate client cannot be disrupted by Eve.

2594	   o  If Eve is the first one log into the legitimate client, and the
2595	      client does not use machine principals, then Eve can cause an SSV
2596	      to be created via the legitimate client's NFSv4.1 implementation,
2597	      protected by the RPCSEC_GSS context created by the legitimate
2598	      client (which uses Eve's GSS principal and credentials).  Eve can
2599	      then eavesdrop on the network, and because she knows her
2600	      credentials, she can decrypt the SSV.  Eve can compute a digest
2601	      BIND_CONN_TO_SESSION will accept, and so bind a new connection to
2602	      the session.  Eve can change the slotid, sequence state, and/or
2603	      the SSV state in such a way that when Bob accesses the server via
2604	      the legitimate client, the legitimate client will be unable to use
2605	      the session.

2607	      The client's only recourse is to create a new session, which will
2608	      cause any state Eve created on the legitimate client over the old
2609	      (but hijacked) session to be lost.  This disrupts Eve, but because
2610	      she is the attacker, this is acceptable.

2612	      Once the legitimate client establishes an SSV over the new session
2613	      using Bob's RPCSEC_GSS context, Eve can use the new session via
2614	      the legitimate client, but she cannot disrupt Bob. Moreover,
2615	      because the client SHOULD have modified the SSV due to Eve using
2616	      the new session, Bob cannot get revenge on Eve by binding a rogue
2617	      connection to the session.

2619	      The question is how does the legitimate client detect that Eve has
2620	      hijacked the old session?  When the client detects that a new
2621	      principal, Bob, wants to use the session, it SHOULD have issued a
2622	      SET_SSV.

2624	      *  Let us suppose that from the rogue connection, Eve issued a
2625	         SET_SSV with the same slotid and sequence that the legitimate
2626	         client later uses.  The server will assume this is a replay,
2627	         and return to the legitimate client the reply it sent Eve.
2628	         However, unless Eve can correctly guess the SSV the legitimate
2629	         client will use, the digest verification checks in the SET_SSV
2630	         response will fail.  That is the clue to the client that the
2631	         session has been hijacked.

2633	      *  Alternatively, Eve issued a SET_SSV with a different slotid
2634	         than the legitimate client uses for its SET_SSV.  Then the
2635	         digest verification on the server fails, and the client is
2636	         again clued that the session has been hijacked.

2638	      *  Alternatively, Eve issued an operation other than SET_SSV, but
2639	         with the same slotid and sequence that the legitimate client
2640	         uses for its SET_SSV.  The server returns to the legitimate
2641	         client the response it sent Eve. The client sees that the
2642	         response is not at all what it expects.  The client assumes
2643	         either session hijacking or server bug, and either way destroys
2644	         the old session.

2646	   o  Eve binds a rogue connection to the session as above, and then
2647	      destroys the session.  Again, Bob goes to use the server from the
2648	      legitimate client.  The client has a very clear indication that
2649	      its session was hijacked, and does not even have to destroy the
2650	      old session before creating a new session, which Eve will be
2651	      unable to hijack because it will be protected with an SSV created
2652	      via Bob's RPCSEC_GSS protection.

2654	   o  If Eve creates a connection before the legitimate client
2655	      establishes an SSV, because the initial value of the SSV is zero
2656	      and therefore known, Eve can issue a SET_SSV that will pass the
2657	      digest verification check.  However because the new connection has
2658	      not been bound to the session, the SET_SSV is rejected for that
2659	      reason.

2661	   o  The connection to session binding model does not prevent
2662	      connection hijacking.  However, if an attacker can perform
2663	      connection hijacking, it can issue denial of service attacks that
2664	      are less difficult than attacks based on forging sessions.

2666	2.10.7.  Session Mechanics - Steady State

2668	2.10.7.1.  Obligations of the Server

2670	   The server has the primary obligation to monitor the state of
2671	   backchannel resources that the client has created for the server
2672	   (RPCSEC_GSS contexts and back channel connections).  When these
2673	   resources go away, the server takes action as specified in
2674	   Section 2.10.8.2.

2676	2.10.7.2.  Obligations of the Client

2678	   The client has the following obligations in order to utilize the
2679	   session:

2681	   o  Keep a necessary session from going idle on the server.  A client
2682	      that requires a session, but nonetheless is not sending operations
2683	      risks having the session be destroyed by the server.  This is
2684	      because sessions consume resources, and resource limitations may
2685	      force the server to cull the least recently used session.

2687	   o  Destroy the session when idle.  When a session has no state other
2688	      than the session, and no outstanding requests, the client should
2689	      consider destroying the session.

2691	   o  Maintain GSS contexts for callback.  If the client requires the
2692	      server to use the RPCSEC_GSS security flavor for callbacks, then
2693	      it needs to be sure the contexts handed to the server via
2694	      BACKCHANNEL_CTL are unexpired.  A good practice is to keep at
2695	      least two contexts outstanding, where the expiration time of the
2696	      newest context at the time it was created, is N times that of the
2697	      oldest context, where N is the number of contexts available for
2698	      callbacks.

2700	   o  Maintain an active connection.  The server requires a callback
2701	      path in order to gracefully recall recallable state, or notify the
2702	      client of certain events.

2704	2.10.7.3.  Steps the Client Takes To Establish a Session

2706	   The client issues EXCHANGE_ID to establish a client ID.

2708	   The client uses the client ID to issue a CREATE_SESSION on a
2709	   connection to the server.  The results of CREATE_SESSION indicate
2710	   whether the server will persist the session replay cache through a
2711	   server reboot or not, and the client notes this for future reference.

2713	   The client SHOULD have specified connecting binding enforcement when
2714	   the session was created.  If so, the client SHOULD issue SET_SSV in
2715	   the first COMPOUND after the session is created.  If it is not using
2716	   machine credentials, then each time a new principal goes to use the
2717	   session, it SHOULD issue a SET_SSV again.

2719	   If the client wants to use delegations, layouts, directory
2720	   notifications, or any other state that requires a callback channel,
2721	   then it MUST add a connection to the backchannel if CREATE_SESSION
2722	   did not already do so.  The client creates a connection, and calls
2723	   BIND_CONN_TO_SESSION to bind the connection to the session and the
2724	   session's backchannel.  If CREATE_SESSION did not already do so, the
2725	   client MUST tell the server what security is required in order for
2726	   the client to accept callbacks.  The client does this via
2727	   BACKCHANNEL_CTL.

2729	   If the client wants to use additional connections for the
2730	   backchannel, then it MUST call BIND_CONN_TO_SESSION on each
2731	   connection it wants to use with the session.  If the client wants to
2732	   use additional connections for the operation channel, then it MUST
2733	   call BIND_CONN_TO_SESSION if it specified connection binding
2734	   enforcement before using the connection.

2736	   At this point the client has reached a steady state as far as session
2737	   use.

2739	2.10.8.  Session Mechanics - Recovery

2741	2.10.8.1.  Events Requiring Client Action

2743	   The following events require client action to recover.

2745	2.10.8.1.1.  RPCSEC_GSS Context Loss by Callback Path

2747	   If all RPCSEC_GSS contexts granted to by the client to the server for
2748	   callback use have expired, the client MUST establish a new context
2749	   via BACKCHANNEL_CTL.  The sr_status_flags field of the SEQUENCE
2750	   results indicates when callback contexts are nearly expired, or fully
2751	   expired (see Section 17.46.4).

2753	2.10.8.1.2.  Connection Disconnect

2755	   If the client loses the last connection of the session, then it MUST
2756	   create a new connection, and if connecting binding enforcement was
2757	   specified when the session was created, bind it to the session via
2758	   BIND_CONN_TO_SESSION.

2760	   If there were requests outstanding at the time the of connection
2761	   disconnect, then the client MUST retry the request, as described in
2762	   Section 2.10.4.2.  Note that it is not necessary to retry requests
2763	   over a connection with the same source network address or the same
2764	   destination network address as the disconnected connection.  As long
2765	   as the sessionid, slotid, and sequenceid in the retry match that of
2766	   the original request, the server will recognize the request as a
2767	   retry if it did see the request prior to disconnect.

2769	   If the connection that was bound to the backchannel is lost, the
2770	   client may need to reconnect, and use BIND_CONN_TO_SESSION, to give
2771	   the connection to the backchannel.  If the connection that was lost
2772	   was the last one bound to the backchannel, the client MUST reconnect,
2773	   and bind the connection to the session and backchannel.  The server
2774	   should indicate when it has no callback connection via the
2775	   sr_status_flags result from SEQUENCE.

2777	2.10.8.1.3.  Backchannel GSS Context Loss

2779	   Via the sr_status_flags result of the SEQUENCE operation or other
2780	   means, the client will learn if some or all of the RPCSEC_GSS
2781	   contexts it assigned to the backchannel have been lost.  The client
2782	   may need to use BACKCHANNEL_CTL to assign new contexts.  It MUST
2783	   assign new contexts if there are no more contexts.

2785	2.10.8.1.4.  Loss of Session

2787	   The server may lose a record of the session.  Causes include:

2789	   o  Server crash and reboot

2791	   o  A catastrophe that causes the cache to be corrupted or lost on the
2792	      media it was stored on.  This applies even if the server indicated
2793	      in the CREATE_SESSION results that it would persist the cache.

2795	   o  The server purges the session of a client that has been inactive
2796	      for a very extended period of time.  [[Comment.11: XXX - Should we
2797	      add a value to the CREATE_SESSION results that tells a client how
2798	      long he can let a session stay idle before losing it?]]

2800	   Loss of replay cache is equivalent to loss of session.  The server
2801	   indicates loss of session to the client by returning
2802	   NFS4ERR_BADSESSION on the next operation that uses the sessionid
2803	   associated with the lost session.

2805	   After an event like a server reboot, the client may have lost its
2806	   connections.  The client assumes for the moment that the session has
2807	   not been lost.  It reconnects, and if it specified connecting binding
2808	   enforcement when the session was created, it invokes
2809	   BIND_CONN_TO_SESSION using the sessionid.  Otherwise, it invokes
2810	   SEQUENCE.  If BIND_CONN_TO_SESSION or SEQUENCE returns
2811	   NFS4ERR_BADSESSION, the client knows the session was lost.  If the
2812	   connection survives session loss, then the next SEQUENCE operation
2813	   the client issues over the connection will get back
2814	   NFS4ERR_BADSESSION.  The client again knows the session was lost.

2816	   When the client detects session loss, it must call CREATE_SESSION to
2817	   recover.  Any non-idempotent operations that were in progress may
2818	   have been performed on the server at the time of session loss.  The
2819	   client has no general way to recover from this.

2821	   Note that loss of session does not imply loss of lock, open,
2822	   delegation, or layout state.  Nor does loss of lock, open,
2823	   delegation, or layout state imply loss of session state.
2824	   [[Comment.12: Add reference to lock recovery section]] .  A session
2825	   can survive a server reboot, but lock recovery may still be needed.
2826	   The converse is also true.

2828	   It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID
2829	   (for example the server reboots and does not preserve client ID
2830	   state).  If so, the client needs to call EXCHANGE_ID, followed by
2831	   CREATE_SESSION.

2833	2.10.8.1.5.  Failover

2835	   [[Comment.13: Dave Noveck requested this section; not sure what is
2836	   needed here if this refers to failover to a replica.  What are the
2837	   session ramifications?]]

2839	2.10.8.2.  Events Requiring Server Action

2841	   The following events require server action to recover.

2843	2.10.8.2.1.  Client Crash and Reboot

2845	   As described in Section 17.35, a rebooted client causes the server to
2846	   delete any sessions it had.

2848	2.10.8.2.2.  Client Crash with No Reboot

2850	   If a client crashes and never comes back, it will never issue
2851	   EXCHANGE_ID with its old client owner.  Thus the server has session
2852	   state that will never be used again.  After an extended period of
2853	   time and if the server has resource constraints, it MAY destroy the
2854	   old session.

2856	2.10.8.2.3.  Extended Network Partition

2858	   To the server, the extended network partition may be no different
2859	   than a client crash with no reboot (see Section 2.10.8.2.2).  Unless
2860	   the server can discern that there is a network partition, it is free
2861	   to treat the situation as if the client has crashed for good.

2863	2.10.8.2.4.  Backchannel Connection Loss

2865	   If there were callback requests outstanding at the time the of a
2866	   connection disconnect, then the server MUST retry the request, as
2867	   described in Section 2.10.4.2.  Note that it is not necessary to
2868	   retry requests over a connection with the same source network address
2869	   or the same destination network address as the disconnected
2870	   connection.  As long as the sessionid, slotid, and sequenceid in the
2871	   retry match that of the original request, the callback target will
2872	   recognize the request as a retry if it did see the request prior to
2873	   disconnect.

2875	   If the connection lost is the last one bound to the backchannel, then
2876	   the server MUST indicate that in the sr_status_flags field of the
2877	   next SEQUENCE reply.

2879	2.10.8.2.5.  GSS Context Loss

2881	   The server SHOULD monitor when the last RPCSEC_GSS context assigned
2882	   to the backchannel is near expiry (i.e. between one and two periods
2883	   of lease time), and indicate so in the sr_status_flags field of the
2884	   next SEQUENCE reply.  The server MUST indicate when the backchannel's
2885	   last RPCSEC_GSS context has expired in the sr_status_flags field of
2886	   the next SEQUENCE reply.

2888	2.10.9.  Parallel NFS and Sessions

2890	   A client and server can potentially be a non-pNFS implementation, a
2891	   metadata server implementation, a data server implementation, or two
2892	   or three types of implementations.  The EXCHGID4_FLAG_USE_NON_PNFS,
2893	   EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
2894	   mutually exclusive) are passed in the EXCHANGE_ID arguments and
2895	   results to allow the client to indicate how it wants to use sessions
2896	   created under the client ID, and to allow the server to indicate how
2897	   it will allow the sessions to be used.  See Section 13.1 for pNFS
2898	   sessions considerations.

2900	3.  Protocol Data Types

2902	   The syntax and semantics to describe the data types of the NFS
2903	   version 4 protocol are defined in the XDR RFC4506 [3] and RPC RFC1831
2904	   [4] documents.  The next sections build upon the XDR data types to
2905	   define types and structures specific to this protocol.

2907	3.1.  Basic Data Types

2909	                   These are the base NFSv4 data types.

2911	   +---------------+---------------------------------------------------+
2912	   | Data Type     | Definition                                        |
2913	   +---------------+---------------------------------------------------+
2914	   | int32_t       | typedef int int32_t;                              |
2915	   | uint32_t      | typedef unsigned int uint32_t;                    |
2916	   | int64_t       | typedef hyper int64_t;                            |
2917	   | uint64_t      | typedef unsigned hyper uint64_t;                  |
2918	   | attrlist4     | typedef opaque attrlist4<>;                       |
2919	   |               | Used for file/directory attributes                |
2920	   | bitmap4       | typedef uint32_t bitmap4<>;                       |
2921	   |               | Used in attribute array encoding.                 |
2922	   | changeid4     | typedef uint64_t changeid4;                       |
2923	   |               | Used in definition of change_info                 |
2924	   | clientid4     | typedef uint64_t clientid4;                       |
2925	   |               | Shorthand reference to client identification      |
2926	   | component4    | typedef utf8str_cs component4;                    |
2927	   |               | Represents path name components                   |
2928	   | count4        | typedef uint32_t count4;                          |
2929	   |               | Various count parameters (READ, WRITE, COMMIT)    |
2930	   | length4       | typedef uint64_t length4;                         |
2931	   |               | Describes LOCK lengths                            |
2932	   | linktext4     | typedef utf8str_cs linktext4;                     |
2933	   |               | Symbolic link contents                            |
2934	   | mode4         | typedef uint32_t mode4;                           |
2935	   |               | Mode attribute data type                          |
2936	   | nfs_cookie4   | typedef uint64_t nfs_cookie4;                     |
2937	   |               | Opaque cookie value for READDIR                   |
2938	   | nfs_fh4       | typedef opaque nfs_fh4<NFS4_FHSIZE>               |
2939	   |               | Filehandle definition; NFS4_FHSIZE is defined as  |
2940	   |               | 128                                               |
2941	   | nfs_ftype4    | enum nfs_ftype4;                                  |
2942	   |               | Various defined file types                        |
2943	   | nfsstat4      | enum nfsstat4;                                    |
2944	   |               | Return value for operations                       |
2945	   | offset4       | typedef uint64_t offset4;                         |
2946	   |               | Various offset designations (READ, WRITE, LOCK,   |
2947	   |               | COMMIT)                                           |
2948	   | pathname4     | typedef component4 pathname4<>;                   |
2949	   |               | Represents path name for fs_locations             |
2950	   | qop4          | typedef uint32_t qop4;                            |
2951	   |               | Quality of protection designation in SECINFO      |
2952	   | sec_oid4      | typedef opaque sec_oid4<>;                        |
2953	   |               | Security Object Identifier The sec_oid4 data type |
2954	   |               | is not really opaque.  Instead contains an ASN.1  |
2955	   |               | OBJECT IDENTIFIER as used by GSS-API in the       |
2956	   |               | mech_type argument to GSS_Init_sec_context.  See  |
2957	   |               | RFC2743 [8] for details.                          |
2958	   | sequenceid4   | typedef uint32_t sequenceid4;                     |
2959	   |               | sequence number used for various session          |
2960	   |               | operations (EXCHANGE_ID, CREATE_SESSION,          |
2961	   |               | SEQUENCE, CB_SEQUENCE).                           |
2962	   | seqid4        | typedef uint32_t seqid4;                          |
2963	   |               | Sequence identifier used for file locking         |
2964	   | sessionid4    | typedef opaque sessionid4[16];                    |
2965	   |               | Session identifier                                |
2966	   | slotid4       | typedef uint32_t slotid4;                         |
2967	   |               | sequencing artifact various session operations    |
2968	   |               | (SEQUENCE, CB_SEQUENCE).                          |
2969	   | utf8string    | typedef opaque utf8string<>;                      |
2970	   |               | UTF-8 encoding for strings                        |
2971	   | utf8str_cis   | typedef opaque utf8str_cis;                       |
2972	   |               | Case-insensitive UTF-8 string                     |
2973	   | utf8str_cs    | typedef opaque utf8str_cs;                        |
2974	   |               | Case-sensitive UTF-8 string                       |
2975	   | utf8str_mixed | typedef opaque utf8str_mixed;                     |
2976	   |               | UTF-8 strings with a case sensitive prefix and a  |
2977	   |               | case insensitive suffix.                          |
2978	   | verifier4     | typedef opaque verifier4[NFS4_VERIFIER_SIZE];     |
2979	   |               | Verifier used for various operations (COMMIT,     |
2980	   |               | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE)        |
2981	   |               | NFS4_VERIFIER_SIZE is defined as 8.               |
2982	   +---------------+---------------------------------------------------+

2984	                          End of Base Data Types

2986	                                  Table 1

2988	3.2.  Structured Data Types

2990	3.2.1.  nfstime4

2992	   struct nfstime4 {
2993	       int64_t seconds;
2994	       uint32_t nseconds;
2995	   }

2997	   The nfstime4 structure gives the number of seconds and nanoseconds
2998	   since midnight or 0 hour January 1, 1970 Coordinated Universal Time
2999	   (UTC).  Values greater than zero for the seconds field denote dates
3000	   after the 0 hour January 1, 1970.  Values less than zero for the
3001	   seconds field denote dates before the 0 hour January 1, 1970.  In
3002	   both cases, the nseconds field is to be added to the seconds field
3003	   for the final time representation.  For example, if the time to be
3004	   represented is one-half second before 0 hour January 1, 1970, the
3005	   seconds field would have a value of negative one (-1) and the
3006	   nseconds fields would have a value of one-half second (500000000).
3007	   Values greater than 999,999,999 for nseconds are considered invalid.

3009	   This data type is used to pass time and date information.  A server
3010	   converts to and from its local representation of time when processing
3011	   time values, preserving as much accuracy as possible.  If the
3012	   precision of timestamps stored for a file system object is less than
3013	   defined, loss of precision can occur.  An adjunct time maintenance
3014	   protocol is recommended to reduce client and server time skew.

3016	3.2.2.  time_how4

3018	   enum time_how4 {
3019	       SET_TO_SERVER_TIME4 = 0,
3020	       SET_TO_CLIENT_TIME4 = 1
3021	   };

3023	3.2.3.  settime4

3025	   union settime4 switch (time_how4 set_it) {
3026	       case SET_TO_CLIENT_TIME4:
3027	           nfstime4       time;
3028	       default:
3029	           void;
3030	   };

3032	   The above definitions are used as the attribute definitions to set
3033	   time values.  If set_it is SET_TO_SERVER_TIME4, then the server uses
3034	   its local representation of time for the time value.

3036	3.2.4.  specdata4

3038	   struct specdata4 {
3039	       uint32_t specdata1; /* major device number */
3040	       uint32_t specdata2; /* minor device number */
3041	   };

3043	   This data type represents additional information for the device file
3044	   types NF4CHR and NF4BLK.

3046	3.2.5.  fsid4

3048	   struct fsid4 {
3049	       uint64_t        major;
3050	       uint64_t        minor;
3051	   };

3053	3.2.6.  fs_location4

3055	   struct fs_location4 {
3056	       utf8str_cis    server<>;
3057	       pathname4     rootpath;
3058	   };

3060	3.2.7.  fs_locations4

3062	   struct fs_locations4 {
3063	       pathname4     fs_root;
3064	       fs_location4  locations<>;
3065	   };

3067	   The fs_location4 and fs_locations4 data types are used for the
3068	   fs_locations recommended attribute which is used for migration and
3069	   replication support.

3071	3.2.8.  fattr4

3073	   struct fattr4 {
3074	       bitmap4       attrmask;
3075	       attrlist4     attr_vals;
3076	   };

3078	   The fattr4 structure is used to represent file and directory
3079	   attributes.

3081	   The bitmap is a counted array of 32 bit integers used to contain bit
3082	   values.  The position of the integer in the array that contains bit n
3083	   can be computed from the expression (n / 32) and its bit within that
3084	   integer is (n mod 32).

3086	   0            1
3087	   +-----------+-----------+-----------+--
3088	   |  count    | 31  ..  0 | 63  .. 32 |
3089	   +-----------+-----------+-----------+--

3091	3.2.9.  change_info4

3093	   struct change_info4 {
3094	       bool          atomic;
3095	       changeid4     before;
3096	       changeid4     after;
3097	   };

3099	   This structure is used with the CREATE, LINK, REMOVE, RENAME
3100	   operations to let the client know the value of the change attribute
3101	   for the directory in which the target file system object resides.

3103	3.2.10.  netaddr4

3105	   struct netaddr4 {
3106	       /* see struct rpcb in RFC1833 */
3107	       string r_netid<>;    /* network id */
3108	       string r_addr<>;     /* universal address */
3109	   };

3111	   The netaddr4 structure is used to identify TCP/IP based endpoints.
3112	   The r_netid and r_addr fields are specified in RFC1833 [22], but they
3113	   are underspecified in RFC1833 [22] as far as what they should look
3114	   like for specific protocols.

3116	   For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the
3117	   US-ASCII string:

3119	   h1.h2.h3.h4.p1.p2

3121	   The prefix, "h1.h2.h3.h4", is the standard textual form for
3122	   representing an IPv4 address, which is always four octets long.
3123	   Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
3124	   the first through fourth octets each converted to ASCII-decimal.
3125	   Assuming big-endian ordering, p1 and p2 are, respectively, the first
3126	   and second octets each converted to ASCII-decimal.  For example, if a
3127	   host, in big-endian order, has an address of 0x0A010307 and there is
3128	   a service listening on, in big endian order, port 0x020F (decimal
3129	   527), then complete universal address is "10.1.3.7.2.15".

3131	   For TCP over IPv4 the value of r_netid is the string "tcp".  For UDP
3132	   over IPv4 the value of r_netid is the string "udp".  That this
3133	   document specifies the universal address and netid for UDP/IPv6 does
3134	   not imply that UDP/IPv4 is a legal transport for NFSv4.1 (see
3135	   Section 2.9).

3137	   For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the
3138	   US-ASCII string:

3140	   x1:x2:x3:x4:x5:x6:x7:x8.p1.p2

3142	   The suffix "p1.p2" is the service port, and is computed the same way
3143	   as with universal addresses for TCP and UDP over IPv4.  The prefix,
3144	   "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for
3145	   representing an IPv6 address as defined in Section 2.2 of RFC1884
3146	   [9].  Additionally, the two alternative forms specified in Section
3147	   2.2 of RFC1884 [9] are also acceptable.

3149	   For TCP over IPv6 the value of r_netid is the string "tcp6".  For UDP
3150	   over IPv6 the value of r_netid is the string "udp6".  That this
3151	   document specifies the universal address and netid for UDP/IPv6 does
3152	   not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see
3153	   Section 2.9).

3155	3.2.11.  open_owner4

3157	   struct open_owner4 {
3158	       clientid4     clientid;
3159	       opaque        owner<NFS4_OPAQUE_LIMIT>
3160	   };

3162	   This structure is used to identify the owner of open state.
3163	   NFS4_OPAQUE_LIMIT is defined as 1024.

3165	3.2.12.  lock_owner4

3167	   struct lock_owner4 {
3168	       clientid4     clientid;
3169	       opaque        owner<NFS4_OPAQUE_LIMIT>
3170	   };

3172	   This structure is used to identify the owner of file locking state.

3174	3.2.13.  open_to_lock_owner4

3176	   struct open_to_lock_owner4 {
3177	       seqid4          open_seqid;
3178	       stateid4        open_stateid;
3179	       seqid4          lock_seqid;
3180	       lock_owner4     lock_owner;
3181	   };

3183	   This structure is used for the first LOCK operation done for an
3184	   open_owner4.  It provides both the open_stateid and lock_owner such
3185	   that the transition is made from a valid open_stateid sequence to
3186	   that of the new lock_stateid sequence.  Using this mechanism avoids
3187	   the confirmation of the lock_owner/lock_seqid pair since it is tied
3188	   to established state in the form of the open_stateid/open_seqid.

3190	3.2.14.  stateid4

3192	   struct stateid4 {
3193	       uint32_t        seqid;
3194	       opaque          other[12];
3195	   };

3197	   This structure is used for the various state sharing mechanisms
3198	   between the client and server.  For the client, this data structure
3199	   is read-only.  The starting value of the seqid field is undefined.
3200	   The server is required to increment the seqid field monotonically at
3201	   each transition of the stateid.  This is important since the client
3202	   will inspect the seqid in OPEN stateids to determine the order of
3203	   OPEN processing done by the server.

3205	3.2.15.  layouttype4

3207	   enum layouttype4 {
3208	           LAYOUT4_NFSV4_1_FILES   = 1,
3209	           LAYOUT4_OSD2_OBJECTS    = 2,
3210	           LAYOUT4_BLOCK_VOLUME    = 3
3211	   };

3213	   A layout type specifies the layout being used.  The implication is
3214	   that clients have "layout drivers" that support one or more layout
3215	   types.  The file server advertises the layout types it supports
3216	   through the fs_layout_type file system attribute (Section 5.13.1).  A
3217	   client asks for layouts of a particular type in LAYOUTGET, and passes
3218	   those layouts to its layout driver.

3220	   The layouttype4 structure is 32 bits in length.  The range
3221	   represented by the layout type is split into three parts.  Type 0x0
3222	   is reserved.  Types within the range 0x00000001-0x7FFFFFFF are
3223	   globally unique and are assigned according to the description in
3224	   Section 21.1; they are maintained by IANA.  Types within the range
3225	   0x80000000-0xFFFFFFFF are site specific and for "private use" only.

3227	   The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
3228	   layout type is to be used.  The LAYOUT4_OSD2_OBJECTS enumeration
3229	   specifies that the object layout, as defined in [23], is to be used.
3230	   Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume
3231	   layout, as defined in [24], is to be used.

3233	3.2.16.  deviceid4

3235	   typedef uint32_t deviceid4;  /* 32-bit device ID */

3237	   Layout information includes device IDs that specify a storage device
3238	   through a compact handle.  Addressing and type information is
3239	   obtained with the GETDEVICEINFO operation.  A client must not assume
3240	   that device IDs are valid across metadata server reboots.  The device
3241	   ID is qualified by the layout type and are unique per file system
3242	   (FSID).  This allows different layout drivers to generate device IDs
3243	   without the need for co-ordination.  See Section 12.2.12 for more
3244	   details.

3246	3.2.17.  device_addr4

3248	   struct device_addr4 {
3249	           layouttype4             da_layout_type;
3250	           opaque                  da_addr_body<>;
3251	   };

3253	   The device address is used to set up a communication channel with the
3254	   storage device.  Different layout types will require different types
3255	   of structures to define how they communicate with storage devices.
3256	   The opaque da_addr_body field must be interpreted based on the
3257	   specified da_layout_type field.

3259	   This document defines the device address for the NFSv4.1 file layout
3260	   ([[Comment.14: need xref]]), which identifies a storage device by
3261	   network IP address and port number.  This is sufficient for the
3262	   clients to communicate with the NFSv4.1 storage devices, and may be
3263	   sufficient for other layout types as well.  Device types for object
3264	   storage devices and block storage devices (e.g., SCSI volume labels)
3265	   will be defined by their respective layout specifications.

3267	3.2.18.  devlist_item4

3269	   struct devlist_item4 {
3270	           deviceid4          dli_id;
3271	           device_addr4       dli_device_addr<>;
3272	   };

3274	   An array of these values is returned by the GETDEVICELIST operation.
3275	   They define the set of devices associated with a file system for the
3276	   layout type specified in the GETDEVICELIST4args.

3278	3.2.19.  layout_content4

3280	   struct layout_content4 {
3281	           layouttype4 loc_type;
3282	           opaque      loc_body<>;
3283	   };

3285	   The loc_body field must be interpreted based on the layout type
3286	   (loc_type).  This document defines the loc_body for the NFSv4.1 file
3287	   layout type is defined; see Section 13.3 for its definition.

3289	3.2.20.  layout4

3291	   struct layout4 {
3292	       offset4                 lo_offset;
3293	       length4                 lo_length;
3294	       layoutiomode4           lo_iomode;
3295	       layout_content4         lo_content;
3296	   };

3298	   The layout4 structure defines a layout for a file.  The layout type
3299	   specific data is opaque within lo_content.  Since layouts are sub-
3300	   dividable, the offset and length together with the file's filehandle,
3301	   the client ID, iomode, and layout type, identifies the layout.

3303	3.2.21.  layoutupdate4

3305	   struct layoutupdate4 {
3306	       layouttype4             lou_type;
3307	       opaque                  lou_body<>;
3308	   };

3310	   The layoutupdate4 structure is used by the client to return 'updated'
3311	   layout information to the metadata server at LAYOUTCOMMIT time.  This
3312	   structure provides a channel to pass layout type specific information
3313	   (in field lou_body) back to the metadata server.  E.g., for block/
3314	   volume layout types this could include the list of reserved blocks
3315	   that were written.  The contents of the opaque lou_body argument are
3316	   determined by the layout type and are defined in their context.  The
3317	   NFSv4.1 file-based layout does not use this structure, thus the
3318	   lou_body field should have a zero length.

3320	3.2.22.  layouthint4

3322	   struct layouthint4 {
3323	       layouttype4           loh_type;
3324	       opaque                loh_body<>;
3325	   };

3327	   The layouthint4 structure is used by the client to pass in a hint
3328	   about the type of layout it would like created for a particular file.
3329	   It is the structure specified by the layout_hint attribute described
3330	   in Section 5.13.4.  The metadata server may ignore the hint, or may
3331	   selectively ignore fields within the hint.  This hint should be
3332	   provided at create time as part of the initial attributes within
3333	   OPEN.  The loh_body field is specific to the type of layout
3334	   (loh_type).  The NFSv4.1 file-based layout uses the
3335	   nfsv4_1_file_layouthint4 structure as defined in Section 13.3.

3337	3.2.23.  layoutiomode4

3339	   enum layoutiomode4 {
3340	       LAYOUTIOMODE4_READ          = 1,
3341	       LAYOUTIOMODE4_RW            = 2,
3342	       LAYOUTIOMODE4_ANY           = 3
3343	   };

3345	   The iomode specifies whether the client intends to read or write
3346	   (with the possibility of reading) the data represented by the layout.
3347	   The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be
3348	   used for LAYOUTRETURN and LAYOUTRECALL.  The ANY iomode specifies
3349	   that layouts pertaining to both READ and RW iomodes are being
3350	   returned or recalled, respectively.  The metadata server's use of the
3351	   iomode may depend on the layout type being used.  The storage devices
3352	   may validate I/O accesses against the iomode and reject invalid
3353	   accesses.

3355	3.2.24.  nfs_impl_id4

3357	   struct nfs_impl_id4 {
3358	       utf8str_cis   nii_domain;
3359	       utf8str_cs    nii_name;
3360	       nfstime4      nii_date;
3361	   };

3363	   This structure is used to identify client and server implementation
3364	   detail.  The nii_domain field is the DNS domain name that the
3365	   implementer is associated with.  The nii_name field is the product
3366	   name of the implementation and is completely free form.  It is
3367	   recommended that the nii_name be used to distinguish machine
3368	   architecture, machine platforms, revisions, versions, and patch
3369	   levels.  The nii_date field is the timestamp of when the software
3370	   instance was published or built.

3372	3.2.25.  threshold_item4

3374	   struct threshold_item4 {
3375	           layouttype4     thi_layout_type;
3376	           bitmap4         thi_hintset;
3377	           opaque          thi_hintlist<>;
3378	   };

3380	   This structure contains a list of hints specific to a layout type for
3381	   helping the client determine when it should issue I/O directly
3382	   through the metadata server vs. the data servers.  The hint structure
3383	   consists of the layout type (thi_layout_type), a bitmap (thi_hintset)
3384	   describing the set of hints supported by the server (they may differ
3385	   based on the layout type), and a list of hints (thi_hintlist), whose
3386	   structure is determined by the hintset bitmap.  See the mdsthreshold
3387	   attribute for more details.

3389	   The thi_hintset field is a bitmap of the following values:

3391	   +-------------------------+---+---------+---------------------------+
3392	   | name                    | # | Data    | Description               |
3393	   |                         |   | Type    |                           |
3394	   +-------------------------+---+---------+---------------------------+
3395	   | threshold4_read_size    | 0 | length4 | The file size below which |
3396	   |                         |   |         | it is recommended to read |
3397	   |                         |   |         | data through the MDS.     |
3398	   | threshold4_write_size   | 1 | length4 | The file size below which |
3399	   |                         |   |         | it is recommended to      |
3400	   |                         |   |         | write data through the    |
3401	   |                         |   |         | MDS.                      |
3402	   | threshold4_read_iosize  | 2 | length4 | For read I/O sizes below  |
3403	   |                         |   |         | this threshold it is      |
3404	   |                         |   |         | recommended to read data  |
3405	   |                         |   |         | through the MDS           |
3406	   | threshold4_write_iosize | 3 | length4 | For write I/O sizes below |
3407	   |                         |   |         | this threshold it is      |
3408	   |                         |   |         | recommended to write data |
3409	   |                         |   |         | through the MDS           |
3410	   +-------------------------+---+---------+---------------------------+

3412	3.2.26.  mdsthreshold4

3414	   struct mdsthreshold4 {
3415	           threshold_item4 mth_hints<>;
3416	   };

3418	   This structure holds an array of threshold_item4 structures each of
3419	   which is valid for a particular layout type.  An array is necessary
3420	   since a server can support multiple layout types for a single file.

3422	4.  Filehandles

3424	   The filehandle in the NFS protocol is a per server unique identifier
3425	   for a file system object.  The contents of the filehandle are opaque
3426	   to the client.  Therefore, the server is responsible for translating
3427	   the filehandle to an internal representation of the file system
3428	   object.

3430	4.1.  Obtaining the First Filehandle

3432	   The operations of the NFS protocol are defined in terms of one or
3433	   more filehandles.  Therefore, the client needs a filehandle to
3434	   initiate communication with the server.  With the NFS version 2
3435	   protocol RFC1094 [17] and the NFS version 3 protocol RFC1813 [18],
3436	   there exists an ancillary protocol to obtain this first filehandle.
3437	   The MOUNT protocol, RPC program number 100005, provides the mechanism
3438	   of translating a string based file system path name to a filehandle
3439	   which can then be used by the NFS protocols.

3441	   The MOUNT protocol has deficiencies in the area of security and use
3442	   via firewalls.  This is one reason that the use of the public
3443	   filehandle was introduced in RFC2054 [25] and RFC2055 [26].  With the
3444	   use of the public filehandle in combination with the LOOKUP operation
3445	   in the NFS version 2 and 3 protocols, it has been demonstrated that
3446	   the MOUNT protocol is unnecessary for viable interaction between NFS
3447	   client and server.

3449	   Therefore, the NFS version 4 protocol will not use an ancillary
3450	   protocol for translation from string based path names to a
3451	   filehandle.  Two special filehandles will be used as starting points
3452	   for the NFS client.

3454	4.1.1.  Root Filehandle

3456	   The first of the special filehandles is the ROOT filehandle.  The
3457	   ROOT filehandle is the "conceptual" root of the file system name
3458	   space at the NFS server.  The client uses or starts with the ROOT
3459	   filehandle by employing the PUTROOTFH operation.  The PUTROOTFH
3460	   operation instructs the server to set the "current" filehandle to the
3461	   ROOT of the server's file tree.  Once this PUTROOTFH operation is
3462	   used, the client can then traverse the entirety of the server's file
3463	   tree with the LOOKUP operation.  A complete discussion of the server
3464	   name space is in the section "NFS Server Name Space".

3466	4.1.2.  Public Filehandle

3468	   The second special filehandle is the PUBLIC filehandle.  Unlike the
3469	   ROOT filehandle, the PUBLIC filehandle may be bound or represent an
3470	   arbitrary file system object at the server.  The server is
3471	   responsible for this binding.  It may be that the PUBLIC filehandle
3472	   and the ROOT filehandle refer to the same file system object.
3473	   However, it is up to the administrative software at the server and
3474	   the policies of the server administrator to define the binding of the
3475	   PUBLIC filehandle and server file system object.  The client may not
3476	   make any assumptions about this binding.  The client uses the PUBLIC
3477	   filehandle via the PUTPUBFH operation.

3479	4.2.  Filehandle Types

3481	   In the NFS version 2 and 3 protocols, there was one type of
3482	   filehandle with a single set of semantics.  This type of filehandle
3483	   is termed "persistent" in NFS Version 4.  The semantics of a
3484	   persistent filehandle remain the same as before.  A new type of
3485	   filehandle introduced in NFS Version 4 is the "volatile" filehandle,
3486	   which attempts to accommodate certain server environments.

3488	   The volatile filehandle type was introduced to address server
3489	   functionality or implementation issues which make correct
3490	   implementation of a persistent filehandle infeasible.  Some server
3491	   environments do not provide a file system level invariant that can be
3492	   used to construct a persistent filehandle.  The underlying server
3493	   file system may not provide the invariant or the server's file system
3494	   programming interfaces may not provide access to the needed
3495	   invariant.  Volatile filehandles may ease the implementation of
3496	   server functionality such as hierarchical storage management or file
3497	   system reorganization or migration.  However, the volatile filehandle
3498	   increases the implementation burden for the client.

3500	   Since the client will need to handle persistent and volatile
3501	   filehandles differently, a file attribute is defined which may be
3502	   used by the client to determine the filehandle types being returned
3503	   by the server.

3505	4.2.1.  General Properties of a Filehandle

3507	   The filehandle contains all the information the server needs to
3508	   distinguish an individual file.  To the client, the filehandle is
3509	   opaque.  The client stores filehandles for use in a later request and
3510	   can compare two filehandles from the same server for equality by
3511	   doing an octet-by-octet comparison.  However, the client MUST NOT
3512	   otherwise interpret the contents of filehandles.  If two filehandles
3513	   from the same server are equal, they MUST refer to the same file.
3514	   Servers SHOULD try to maintain a one-to-one correspondence between
3515	   filehandles and files but this is not required.  Clients MUST use
3516	   filehandle comparisons only to improve performance, not for correct
3517	   behavior.  All clients need to be prepared for situations in which it
3518	   cannot be determined whether two filehandles denote the same object
3519	   and in such cases, avoid making invalid assumptions which might cause
3520	   incorrect behavior.  Further discussion of filehandle and attribute
3521	   comparison in the context of data caching is presented in the section
3522	   "Data Caching and File Identity".

3524	   As an example, in the case that two different path names when
3525	   traversed at the server terminate at the same file system object, the
3526	   server SHOULD return the same filehandle for each path.  This can
3527	   occur if a hard link is used to create two file names which refer to
3528	   the same underlying file object and associated data.  For example, if
3529	   paths /a/b/c and /a/d/c refer to the same file, the server SHOULD
3530	   return the same filehandle for both path names traversals.

3532	4.2.2.  Persistent Filehandle

3534	   A persistent filehandle is defined as having a fixed value for the
3535	   lifetime of the file system object to which it refers.  Once the
3536	   server creates the filehandle for a file system object, the server
3537	   MUST accept the same filehandle for the object for the lifetime of
3538	   the object.  If the server restarts or reboots the NFS server must
3539	   honor the same filehandle value as it did in the server's previous
3540	   instantiation.  Similarly, if the file system is migrated, the new
3541	   NFS server must honor the same filehandle as the old NFS server.

3543	   The persistent filehandle will be become stale or invalid when the
3544	   file system object is removed.  When the server is presented with a
3545	   persistent filehandle that refers to a deleted object, it MUST return
3546	   an error of NFS4ERR_STALE.  A filehandle may become stale when the
3547	   file system containing the object is no longer available.  The file
3548	   system may become unavailable if it exists on removable media and the
3549	   media is no longer available at the server or the file system in
3550	   whole has been destroyed or the file system has simply been removed
3551	   from the server's name space (i.e. unmounted in a UNIX environment).

3553	4.2.3.  Volatile Filehandle

3555	   A volatile filehandle does not share the same longevity
3556	   characteristics of a persistent filehandle.  The server may determine
3557	   that a volatile filehandle is no longer valid at many different
3558	   points in time.  If the server can definitively determine that a
3559	   volatile filehandle refers to an object that has been removed, the
3560	   server should return NFS4ERR_STALE to the client (as is the case for
3561	   persistent filehandles).  In all other cases where the server
3562	   determines that a volatile filehandle can no longer be used, it
3563	   should return an error of NFS4ERR_FHEXPIRED.

3565	   The mandatory attribute "fh_expire_type" is used by the client to
3566	   determine what type of filehandle the server is providing for a
3567	   particular file system.  This attribute is a bitmask with the
3568	   following values:

3570	   FH4_PERSISTENT  The value of FH4_PERSISTENT is used to indicate a
3571	      persistent filehandle, which is valid until the object is removed
3572	      from the file system.  The server will not return
3573	      NFS4ERR_FHEXPIRED for this filehandle.  FH4_PERSISTENT is defined
3574	      as a value in which none of the bits specified below are set.

3576	   FH4_VOLATILE_ANY  The filehandle may expire at any time, except as
3577	      specifically excluded (i.e.  FH4_NO_EXPIRE_WITH_OPEN).

3579	   FH4_NOEXPIRE_WITH_OPEN  May only be set when FH4_VOLATILE_ANY is set.
3580	      If this bit is set, then the meaning of FH4_VOLATILE_ANY is
3581	      qualified to exclude any expiration of the filehandle when it is
3582	      open.

3584	   FH4_VOL_MIGRATION  The filehandle will expire as a result of a file
3585	      system transition (migration or replication), in those case in
3586	      which the continuity of filehandle use is not specified by
3587	      _handle_ class information within the fs_locations_info attribute.
3588	      When this bit is set, clients without access to fs_locations_info
3589	      information should assume filehandles will expire on file system
3590	      transitions.

3592	   FH4_VOL_RENAME  The filehandle will expire during rename.  This
3593	      includes a rename by the requesting client or a rename by any
3594	      other client.  If FH4_VOL_ANY is set, FH4_VOL_RENAME is redundant.

3596	   Servers which provide volatile filehandles that may expire while open
3597	   (i.e. if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if
3598	   FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should
3599	   deny a RENAME or REMOVE that would affect an OPEN file of any of the
3600	   components leading to the OPEN file.  In addition, the server should
3601	   deny all RENAME or REMOVE requests during the grace period upon
3602	   server restart.

3604	   Servers which provide volatile filehandles that may expire while open
3605	   require special care as regards handling of RENAMESs and REMOVEs.
3606	   This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is
3607	   set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set,
3608	   or if a non-readonly file system has a transition target in a
3609	   different _handle _ class.  In these cases, the server should deny a
3610	   RENAME or REMOVE that would affect an OPEN file of any of the
3611	   components leading to the OPEN file.  In addition, the server should
3612	   deny all RENAME or REMOVE requests during the grace period, in order
3613	   to make sure that reclaims of files where filehandles may have
3614	   expired do not do a reclaim for the wrong file.

3616	4.3.  One Method of Constructing a Volatile Filehandle

3618	   A volatile filehandle, while opaque to the client could contain:

3620	   [volatile bit = 1 | server boot time | slot | generation number]
3621	   o  slot is an index in the server volatile filehandle table

3623	   o  generation number is the generation number for the table entry/
3624	      slot

3626	   When the client presents a volatile filehandle, the server makes the
3627	   following checks, which assume that the check for the volatile bit
3628	   has passed.  If the server boot time is less than the current server
3629	   boot time, return NFS4ERR_FHEXPIRED.  If slot is out of range, return
3630	   NFS4ERR_BADHANDLE.  If the generation number does not match, return
3631	   NFS4ERR_FHEXPIRED.

3633	   When the server reboots, the table is gone (it is volatile).

3635	   If volatile bit is 0, then it is a persistent filehandle with a
3636	   different structure following it.

3638	4.4.  Client Recovery from Filehandle Expiration

3640	   If possible, the client SHOULD recover from the receipt of an
3641	   NFS4ERR_FHEXPIRED error.  The client must take on additional
3642	   responsibility so that it may prepare itself to recover from the
3643	   expiration of a volatile filehandle.  If the server returns
3644	   persistent filehandles, the client does not need these additional
3645	   steps.

3647	   For volatile filehandles, most commonly the client will need to store
3648	   the component names leading up to and including the file system
3649	   object in question.  With these names, the client should be able to
3650	   recover by finding a filehandle in the name space that is still
3651	   available or by starting at the root of the server's file system name
3652	   space.

3654	   If the expired filehandle refers to an object that has been removed
3655	   from the file system, obviously the client will not be able to
3656	   recover from the expired filehandle.

3658	   It is also possible that the expired filehandle refers to a file that
3659	   has been renamed.  If the file was renamed by another client, again
3660	   it is possible that the original client will not be able to recover.
3661	   However, in the case that the client itself is renaming the file and
3662	   the file is open, it is possible that the client may be able to
3663	   recover.  The client can determine the new path name based on the
3664	   processing of the rename request.  The client can then regenerate the
3665	   new filehandle based on the new path name.  The client could also use
3666	   the compound operation mechanism to construct a set of operations
3667	   like:

3669	             RENAME A B
3670	             LOOKUP B
3671	             GETFH

3673	   Note that the COMPOUND procedure does not provide atomicity.  This
3674	   example only reduces the overhead of recovering from an expired
3675	   filehandle.

3677	5.  File Attributes

3679	   To meet the requirements of extensibility and increased
3680	   interoperability with non-UNIX platforms, attributes must be handled
3681	   in a flexible manner.  The NFS version 3 fattr3 structure contains a
3682	   fixed list of attributes that not all clients and servers are able to
3683	   support or care about.  The fattr3 structure can not be extended as
3684	   new needs arise and it provides no way to indicate non-support.  With
3685	   the NFS version 4 protocol, the client is able query what attributes
3686	   the server supports and construct requests with only those supported
3687	   attributes (or a subset thereof).

3689	   To this end, attributes are divided into three groups: mandatory,
3690	   recommended, and named.  Both mandatory and recommended attributes
3691	   are supported in the NFS version 4 protocol by a specific and well-
3692	   defined encoding and are identified by number.  They are requested by
3693	   setting a bit in the bit vector sent in the GETATTR request; the
3694	   server response includes a bit vector to list what attributes were
3695	   returned in the response.  New mandatory or recommended attributes
3696	   may be added to the NFS protocol between major revisions by
3697	   publishing a standards-track RFC which allocates a new attribute
3698	   number value and defines the encoding for the attribute.  See the
3699	   section "Minor Versioning" for further discussion.

3701	   Named attributes are accessed by the new OPENATTR operation, which
3702	   accesses a hidden directory of attributes associated with a file
3703	   system object.  OPENATTR takes a filehandle for the object and
3704	   returns the filehandle for the attribute hierarchy.  The filehandle
3705	   for the named attributes is a directory object accessible by LOOKUP
3706	   or READDIR and contains files whose names represent the named
3707	   attributes and whose data bytes are the value of the attribute.  For
3708	   example:

3710	        +----------+-----------+---------------------------------+
3711	        | LOOKUP   | "foo"     | ; look up file                  |
3712	        | GETATTR  | attrbits  |                                 |
3713	        | OPENATTR |           | ; access foo's named attributes |
3714	        | LOOKUP   | "x11icon" | ; look up specific attribute    |
3715	        | READ     | 0,4096    | ; read stream of bytes          |
3716	        +----------+-----------+---------------------------------+

3718	   Named attributes are intended for data needed by applications rather
3719	   than by an NFS client implementation.  NFS implementors are strongly
3720	   encouraged to define their new attributes as recommended attributes
3721	   by bringing them to the IETF standards-track process.

3723	   The set of attributes which are classified as mandatory is
3724	   deliberately small since servers must do whatever it takes to support
3725	   them.  A server should support as many of the recommended attributes
3726	   as possible but by their definition, the server is not required to
3727	   support all of them.  Attributes are deemed mandatory if the data is
3728	   both needed by a large number of clients and is not otherwise
3729	   reasonably computable by the client when support is not provided on
3730	   the server.

3732	   Note that the hidden directory returned by OPENATTR is a convenience
3733	   for protocol processing.  The client should not make any assumptions
3734	   about the server's implementation of named attributes and whether the
3735	   underlying file system at the server has a named attribute directory
3736	   or not.  Therefore, operations such as SETATTR and GETATTR on the
3737	   named attribute directory are undefined.

3739	5.1.  Mandatory Attributes

3741	   These MUST be supported by every NFS version 4 client and server in
3742	   order to ensure a minimum level of interoperability.  The server must
3743	   store and return these attributes and the client must be able to
3744	   function with an attribute set limited to these attributes.  With
3745	   just the mandatory attributes some client functionality may be
3746	   impaired or limited in some ways.  A client may ask for any of these
3747	   attributes to be returned by setting a bit in the GETATTR request and
3748	   the server must return their value.

3750	5.2.  Recommended Attributes

3752	   These attributes are understood well enough to warrant support in the
3753	   NFS version 4 protocol.  However, they may not be supported on all
3754	   clients and servers.  A client may ask for any of these attributes to
3755	   be returned by setting a bit in the GETATTR request but must handle
3756	   the case where the server does not return them.  A client may ask for
3757	   the set of attributes the server supports and should not request
3758	   attributes the server does not support.  A server should be tolerant
3759	   of requests for unsupported attributes and simply not return them
3760	   rather than considering the request an error.  It is expected that
3761	   servers will support all attributes they comfortably can and only
3762	   fail to support attributes which are difficult to support in their
3763	   operating environments.  A server should provide attributes whenever
3764	   they don't have to "tell lies" to the client.  For example, a file
3765	   modification time should be either an accurate time or should not be
3766	   supported by the server.  This will not always be comfortable to
3767	   clients but the client is better positioned decide whether and how to
3768	   fabricate or construct an attribute or whether to do without the
3769	   attribute.

3771	5.3.  Named Attributes

3773	   These attributes are not supported by direct encoding in the NFS
3774	   Version 4 protocol but are accessed by string names rather than
3775	   numbers and correspond to an uninterpreted stream of bytes which are
3776	   stored with the file system object.  The name space for these
3777	   attributes may be accessed by using the OPENATTR operation.  The
3778	   OPENATTR operation returns a filehandle for a virtual "attribute
3779	   directory" and further perusal of the name space may be done using
3780	   READDIR and LOOKUP operations on this filehandle.  Named attributes
3781	   may then be examined or changed by normal READ and WRITE and CREATE
3782	   operations on the filehandles returned from READDIR and LOOKUP.
3783	   Named attributes may have attributes.

3785	   It is recommended that servers support arbitrary named attributes.  A
3786	   client should not depend on the ability to store any named attributes
3787	   in the server's file system.  If a server does support named
3788	   attributes, a client which is also able to handle them should be able
3789	   to copy a file's data and meta-data with complete transparency from
3790	   one location to another; this would imply that names allowed for
3791	   regular directory entries are valid for named attribute names as
3792	   well.

3794	   Names of attributes will not be controlled by this document or other
3795	   IETF standards track documents.  See the section "IANA
3796	   Considerations" for further discussion.

3798	5.4.  Classification of Attributes

3800	   Each of the Mandatory and Recommended attributes can be classified in
3801	   one of three categories: per server, per file system, or per file
3802	   system object.  Note that it is possible that some per file system
3803	   attributes may vary within the file system.  See the "homogeneous"
3804	   attribute for its definition.  Note that the attributes
3805	   time_access_set and time_modify_set are not listed in this section
3806	   because they are write-only attributes corresponding to time_access
3807	   and time_modify, and are used in a special instance of SETATTR.

3809	   o  The per server attribute is:

3811	         lease_time

3813	   o  The per file system attributes are:

3815	         supp_attr, fh_expire_type, link_support, symlink_support,
3816	         unique_handles, aclsupport, cansettime, case_insensitive,
3817	         case_preserving, chown_restricted, files_avail, files_free,
3818	         files_total, fs_locations, homogeneous, maxfilesize, maxname,
3819	         maxread, maxwrite, no_trunc, space_avail, space_free,
3820	         space_total, time_delta, fs_status, fs_layout_type,
3821	         fs_locations_info

3823	   o  The per file system object attributes are:

3825	         type, change, size, named_attr, fsid, rdattr_error, filehandle,
3826	         ACL, archive, fileid, hidden, maxlink, mimetype, mode,
3827	         numlinks, owner, owner_group, rawdev, space_used, system,
3828	         time_access, time_backup, time_create, time_metadata,
3829	         time_modify, mounted_on_fileid, dir_notif_delay,
3830	         dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
3831	         layout_blksize, layout_alignment, mdsthreshold, retention_get,
3832	         retention_set, retentevt_get, retentevt_set, retention_hold,
3833	         mode_set_masked

3835	   For quota_avail_hard, quota_avail_soft, and quota_used see their
3836	   definitions below for the appropriate classification.

3838	5.5.  Mandatory Attributes - Definitions

3840	   +-----------------+----+------------+--------+----------------------+
3841	   | name            | #  | Data Type  | Access | Description          |
3842	   +-----------------+----+------------+--------+----------------------+
3843	   | supp_attr       | 0  | bitmap     | READ   | The bit vector which |
3844	   |                 |    |            |        | would retrieve all   |
3845	   |                 |    |            |        | mandatory and        |
3846	   |                 |    |            |        | recommended          |
3847	   |                 |    |            |        | attributes that are  |
3848	   |                 |    |            |        | supported for this   |
3849	   |                 |    |            |        | object.  The scope   |
3850	   |                 |    |            |        | of this attribute    |
3851	   |                 |    |            |        | applies to all       |
3852	   |                 |    |            |        | objects with a       |
3853	   |                 |    |            |        | matching fsid.       |
3854	   | type            | 1  | nfs4_ftype | READ   | The type of the      |
3855	   |                 |    |            |        | object (file,        |
3856	   |                 |    |            |        | directory, symlink,  |
3857	   |                 |    |            |        | etc.)                |
3858	   | fh_expire_type  | 2  | uint32     | READ   | Server uses this to  |
3859	   |                 |    |            |        | specify filehandle   |
3860	   |                 |    |            |        | expiration behavior  |
3861	   |                 |    |            |        | to the client.  See  |
3862	   |                 |    |            |        | the section          |
3863	   |                 |    |            |        | "Filehandles" for    |
3864	   |                 |    |            |        | additional           |
3865	   |                 |    |            |        | description.         |
3866	   | change          | 3  | uint64     | READ   | A value created by   |
3867	   |                 |    |            |        | the server that the  |
3868	   |                 |    |            |        | client can use to    |
3869	   |                 |    |            |        | determine if file    |
3870	   |                 |    |            |        | data, directory      |
3871	   |                 |    |            |        | contents or          |
3872	   |                 |    |            |        | attributes of the    |
3873	   |                 |    |            |        | object have been     |
3874	   |                 |    |            |        | modified.  The       |
3875	   |                 |    |            |        | server may return    |
3876	   |                 |    |            |        | the object's         |
3877	   |                 |    |            |        | time_metadata        |
3878	   |                 |    |            |        | attribute for this   |
3879	   |                 |    |            |        | attribute's value    |
3880	   |                 |    |            |        | but only if the file |
3881	   |                 |    |            |        | system object can    |
3882	   |                 |    |            |        | not be updated more  |
3883	   |                 |    |            |        | frequently than the  |
3884	   |                 |    |            |        | resolution of        |
3885	   |                 |    |            |        | time_metadata.       |
3886	   | size            | 4  | uint64     | R/W    | The size of the      |
3887	   |                 |    |            |        | object in bytes.     |
3888	   | link_support    | 5  | bool       | READ   | True, if the         |
3889	   |                 |    |            |        | object's file system |
3890	   |                 |    |            |        | supports hard links. |
3891	   | symlink_support | 6  | bool       | READ   | True, if the         |
3892	   |                 |    |            |        | object's file system |
3893	   |                 |    |            |        | supports symbolic    |
3894	   |                 |    |            |        | links.               |
3895	   | named_attr      | 7  | bool       | READ   | True, if this object |
3896	   |                 |    |            |        | has named            |
3897	   |                 |    |            |        | attributes.  In      |
3898	   |                 |    |            |        | other words, object  |
3899	   |                 |    |            |        | has a non-empty      |
3900	   |                 |    |            |        | named attribute      |
3901	   |                 |    |            |        | directory.           |
3902	   | fsid            | 8  | fsid4      | READ   | Unique file system   |
3903	   |                 |    |            |        | identifier for the   |
3904	   |                 |    |            |        | file system holding  |
3905	   |                 |    |            |        | this object. fsid    |
3906	   |                 |    |            |        | contains major and   |
3907	   |                 |    |            |        | minor components     |
3908	   |                 |    |            |        | each of which are    |
3909	   |                 |    |            |        | uint64.              |
3910	   | unique_handles  | 9  | bool       | READ   | True, if two         |
3911	   |                 |    |            |        | distinct filehandles |
3912	   |                 |    |            |        | guaranteed to refer  |
3913	   |                 |    |            |        | to two different     |
3914	   |                 |    |            |        | file system objects. |
3915	   | lease_time      | 10 | nfs_lease4 | READ   | Duration of leases   |
3916	   |                 |    |            |        | at server in         |
3917	   |                 |    |            |        | seconds.             |
3918	   | rdattr_error    | 11 | enum       | READ   | Error returned from  |
3919	   |                 |    |            |        | getattr during       |
3920	   |                 |    |            |        | readdir.             |
3921	   | filehandle      | 19 | nfs_fh4    | READ   | The filehandle of    |
3922	   |                 |    |            |        | this object          |
3923	   |                 |    |            |        | (primarily for       |
3924	   |                 |    |            |        | readdir requests).   |
3925	   +-----------------+----+------------+--------+----------------------+

3927	5.6.  Recommended Attributes - Definitions
3928	   +-------------------+----+----------------+--------+----------------+
3929	   | name              | #  | Data Type      | Access | Description    |
3930	   +-------------------+----+----------------+--------+----------------+
3931	   | ACL               | 12 | nfsace4<>      | R/W    | The access     |
3932	   |                   |    |                |        | control list   |
3933	   |                   |    |                |        | for the        |
3934	   |                   |    |                |        | object.        |
3935	   | aclsupport        | 13 | uint32         | READ   | Indicates what |
3936	   |                   |    |                |        | types of ACLs  |
3937	   |                   |    |                |        | are supported  |
3938	   |                   |    |                |        | on the current |
3939	   |                   |    |                |        | file system.   |
3940	   | archive           | 14 | bool           | R/W    | True, if this  |
3941	   |                   |    |                |        | file has been  |
3942	   |                   |    |                |        | archived since |
3943	   |                   |    |                |        | the time of    |
3944	   |                   |    |                |        | last           |
3945	   |                   |    |                |        | modification   |
3946	   |                   |    |                |        | (deprecated in |
3947	   |                   |    |                |        | favor of       |
3948	   |                   |    |                |        | time_backup).  |
3949	   | cansettime        | 15 | bool           | READ   | True, if the   |
3950	   |                   |    |                |        | server able to |
3951	   |                   |    |                |        | change the     |
3952	   |                   |    |                |        | times for a    |
3953	   |                   |    |                |        | file system    |
3954	   |                   |    |                |        | object as      |
3955	   |                   |    |                |        | specified in a |
3956	   |                   |    |                |        | SETATTR        |
3957	   |                   |    |                |        | operation.     |
3958	   | case_insensitive  | 16 | bool           | READ   | True, if       |
3959	   |                   |    |                |        | filename       |
3960	   |                   |    |                |        | comparisons on |
3961	   |                   |    |                |        | this file      |
3962	   |                   |    |                |        | system are     |
3963	   |                   |    |                |        | case           |
3964	   |                   |    |                |        | insensitive.   |
3965	   | case_preserving   | 17 | bool           | READ   | True, if       |
3966	   |                   |    |                |        | filename case  |
3967	   |                   |    |                |        | on this file   |
3968	   |                   |    |                |        | system are     |
3969	   |                   |    |                |        | preserved.     |
3970	   | chown_restricted  | 18 | bool           | READ   | If TRUE, the   |
3971	   |                   |    |                |        | server will    |
3972	   |                   |    |                |        | reject any     |
3973	   |                   |    |                |        | request to     |
3974	   |                   |    |                |        | change either  |
3975	   |                   |    |                |        | the owner or   |
3976	   |                   |    |                |        | the group      |
3977	   |                   |    |                |        | associated     |
3978	   |                   |    |                |        | with a file if |
3979	   |                   |    |                |        | the caller is  |
3980	   |                   |    |                |        | not a          |
3981	   |                   |    |                |        | privileged     |
3982	   |                   |    |                |        | user (for      |
3983	   |                   |    |                |        | example,       |
3984	   |                   |    |                |        | "root" in UNIX |
3985	   |                   |    |                |        | operating      |
3986	   |                   |    |                |        | environments   |
3987	   |                   |    |                |        | or in Windows  |
3988	   |                   |    |                |        | 2000 the "Take |
3989	   |                   |    |                |        | Ownership"     |
3990	   |                   |    |                |        | privilege).    |
3991	   | dacl              | 58 | nfsacl41       | R/W    | Automatically  |
3992	   |                   |    |                |        | inheritable    |
3993	   |                   |    |                |        | access control |
3994	   |                   |    |                |        | list used for  |
3995	   |                   |    |                |        | determining    |
3996	   |                   |    |                |        | access to file |
3997	   |                   |    |                |        | system         |
3998	   |                   |    |                |        | objects.       |
3999	   | dir_notif_delay   | 56 | nfstime4       | READ   | notification   |
4000	   |                   |    |                |        | delays on      |
4001	   |                   |    |                |        | directory      |
4002	   |                   |    |                |        | attributes     |
4003	   | dirent_           | 57 | nfstime4       | READ   | notification   |
4004	   | notif_delay       |    |                |        | delays on      |
4005	   |                   |    |                |        | child          |
4006	   |                   |    |                |        | attributes     |
4007	   | fileid            | 20 | uint64         | READ   | A number       |
4008	   |                   |    |                |        | uniquely       |
4009	   |                   |    |                |        | identifying    |
4010	   |                   |    |                |        | the file       |
4011	   |                   |    |                |        | within the     |
4012	   |                   |    |                |        | file system.   |
4013	   | files_avail       | 21 | uint64         | READ   | File slots     |
4014	   |                   |    |                |        | available to   |
4015	   |                   |    |                |        | this user on   |
4016	   |                   |    |                |        | the file       |
4017	   |                   |    |                |        | system         |
4018	   |                   |    |                |        | containing     |
4019	   |                   |    |                |        | this object -  |
4020	   |                   |    |                |        | this should be |
4021	   |                   |    |                |        | the smallest   |
4022	   |                   |    |                |        | relevant       |
4023	   |                   |    |                |        | limit.         |
4024	   | files_free        | 22 | uint64         | READ   | Free file      |
4025	   |                   |    |                |        | slots on the   |
4026	   |                   |    |                |        | file system    |
4027	   |                   |    |                |        | containing     |
4028	   |                   |    |                |        | this object -  |
4029	   |                   |    |                |        | this should be |
4030	   |                   |    |                |        | the smallest   |
4031	   |                   |    |                |        | relevant       |
4032	   |                   |    |                |        | limit.         |
4033	   | files_total       | 23 | uint64         | READ   | Total file     |
4034	   |                   |    |                |        | slots on the   |
4035	   |                   |    |                |        | file system    |
4036	   |                   |    |                |        | containing     |
4037	   |                   |    |                |        | this object.   |
4038	   | fs_absent         | 60 | bool           | READ   | Is current     |
4039	   |                   |    |                |        | file system    |
4040	   |                   |    |                |        | present or     |
4041	   |                   |    |                |        | absent.        |
4042	   | fs_layout_type    | 62 | layouttype4<>  | READ   | Layout types   |
4043	   |                   |    |                |        | available for  |
4044	   |                   |    |                |        | the file       |
4045	   |                   |    |                |        | system.        |
4046	   | fs_locations      | 24 | fs_locations   | READ   | Locations      |
4047	   |                   |    |                |        | where this     |
4048	   |                   |    |                |        | file system    |
4049	   |                   |    |                |        | may be found.  |
4050	   |                   |    |                |        | If the server  |
4051	   |                   |    |                |        | returns        |
4052	   |                   |    |                |        | NFS4ERR_MOVED  |
4053	   |                   |    |                |        | as an error,   |
4054	   |                   |    |                |        | this attribute |
4055	   |                   |    |                |        | MUST be        |
4056	   |                   |    |                |        | supported.     |
4057	   | fs_locations_info | 67 |                | READ   | Full function  |
4058	   |                   |    |                |        | file system    |
4059	   |                   |    |                |        | location.      |
4060	   | fs_status         | 61 | fs4_status     | READ   | Generic file   |
4061	   |                   |    |                |        | system type    |
4062	   |                   |    |                |        | information.   |
4063	   | hidden            | 25 | bool           | R/W    | True, if the   |
4064	   |                   |    |                |        | file is        |
4065	   |                   |    |                |        | considered     |
4066	   |                   |    |                |        | hidden with    |
4067	   |                   |    |                |        | respect to the |
4068	   |                   |    |                |        | Windows API?   |
4069	   | homogeneous       | 26 | bool           | READ   | True, if this  |
4070	   |                   |    |                |        | object's file  |
4071	   |                   |    |                |        | system is      |
4072	   |                   |    |                |        | homogeneous,   |
4073	   |                   |    |                |        | i.e. are per   |
4074	   |                   |    |                |        | file system    |
4075	   |                   |    |                |        | attributes the |
4076	   |                   |    |                |        | same for all   |
4077	   |                   |    |                |        | file system's  |
4078	   |                   |    |                |        | objects.       |
4079	   | layout_alignment  | 66 | uint32_t       | READ   | Preferred      |
4080	   |                   |    |                |        | alignment for  |
4081	   |                   |    |                |        | layout related |
4082	   |                   |    |                |        | I/O.           |
4083	   | layout_blksize    | 65 | uint32_t       | READ   | Preferred      |
4084	   |                   |    |                |        | block size for |
4085	   |                   |    |                |        | layout related |
4086	   |                   |    |                |        | I/O.           |
4087	   | layout_hint       | 63 | layouthint4    | WRITE  | Client         |
4088	   |                   |    |                |        | specified hint |
4089	   |                   |    |                |        | for file       |
4090	   |                   |    |                |        | layout.        |
4091	   | layout_type       | 64 | layouttype4<>  | READ   | Layout types   |
4092	   |                   |    |                |        | available for  |
4093	   |                   |    |                |        | the file.      |
4094	   | maxfilesize       | 27 | uint64         | READ   | Maximum        |
4095	   |                   |    |                |        | supported file |
4096	   |                   |    |                |        | size for the   |
4097	   |                   |    |                |        | file system of |
4098	   |                   |    |                |        | this object.   |
4099	   | maxlink           | 28 | uint32         | READ   | Maximum number |
4100	   |                   |    |                |        | of links for   |
4101	   |                   |    |                |        | this object.   |
4102	   | maxname           | 29 | uint32         | READ   | Maximum        |
4103	   |                   |    |                |        | filename size  |
4104	   |                   |    |                |        | supported for  |
4105	   |                   |    |                |        | this object.   |
4106	   | maxread           | 30 | uint64         | READ   | Maximum read   |
4107	   |                   |    |                |        | size supported |
4108	   |                   |    |                |        | for this       |
4109	   |                   |    |                |        | object.        |
4110	   | maxwrite          | 31 | uint64         | READ   | Maximum write  |
4111	   |                   |    |                |        | size supported |
4112	   |                   |    |                |        | for this       |
4113	   |                   |    |                |        | object.  This  |
4114	   |                   |    |                |        | attribute      |
4115	   |                   |    |                |        | SHOULD be      |
4116	   |                   |    |                |        | supported if   |
4117	   |                   |    |                |        | the file is    |
4118	   |                   |    |                |        | writable.      |
4119	   |                   |    |                |        | Lack of this   |
4120	   |                   |    |                |        | attribute can  |
4121	   |                   |    |                |        | lead to the    |
4122	   |                   |    |                |        | client either  |
4123	   |                   |    |                |        | wasting        |
4124	   |                   |    |                |        | bandwidth or   |
4125	   |                   |    |                |        | not receiving  |
4126	   |                   |    |                |        | the best       |
4127	   |                   |    |                |        | performance.   |
4128	   | mdsthreshold      | 68 | mdsthreshold4  | READ   | Hint to client |
4129	   |                   |    |                |        | as to when to  |
4130	   |                   |    |                |        | write through  |
4131	   |                   |    |                |        | the pnfs       |
4132	   |                   |    |                |        | metadata       |
4133	   |                   |    |                |        | server.        |
4134	   | mimetype          | 32 | utf8<>         | R/W    | MIME body      |
4135	   |                   |    |                |        | type/subtype   |
4136	   |                   |    |                |        | of this        |
4137	   |                   |    |                |        | object.        |
4138	   | mode              | 33 | mode4          | R/W    | UNIX-style     |
4139	   |                   |    |                |        | mode including |
4140	   |                   |    |                |        | permission     |
4141	   |                   |    |                |        | bits for this  |
4142	   |                   |    |                |        | object.        |
4143	   | mode_set_masked   | 74 | mode_masked4   | WRITE  | Allows setting |
4144	   |                   |    |                |        | or resetting a |
4145	   |                   |    |                |        | subset of the  |
4146	   |                   |    |                |        | bits in a      |
4147	   |                   |    |                |        | UNIX-style     |
4148	   |                   |    |                |        | mode           |
4149	   | mounted_on_fileid | 55 | uint64         | READ   | Like fileid,   |
4150	   |                   |    |                |        | but if the     |
4151	   |                   |    |                |        | target         |
4152	   |                   |    |                |        | filehandle is  |
4153	   |                   |    |                |        | the root of a  |
4154	   |                   |    |                |        | file system    |
4155	   |                   |    |                |        | return the     |
4156	   |                   |    |                |        | fileid of the  |
4157	   |                   |    |                |        | underlying     |
4158	   |                   |    |                |        | directory.     |
4159	   | no_trunc          | 34 | bool           | READ   | True, if a     |
4160	   |                   |    |                |        | name longer    |
4161	   |                   |    |                |        | than name_max  |
4162	   |                   |    |                |        | is used, an    |
4163	   |                   |    |                |        | error be       |
4164	   |                   |    |                |        | returned and   |
4165	   |                   |    |                |        | name is not    |
4166	   |                   |    |                |        | truncated.     |
4167	   | numlinks          | 35 | uint32         | READ   | Number of hard |
4168	   |                   |    |                |        | links to this  |
4169	   |                   |    |                |        | object.        |
4170	   | owner             | 36 | utf8<>         | R/W    | The string     |
4171	   |                   |    |                |        | name of the    |
4172	   |                   |    |                |        | owner of this  |
4173	   |                   |    |                |        | object.        |
4174	   | owner_group       | 37 | utf8<>         | R/W    | The string     |
4175	   |                   |    |                |        | name of the    |
4176	   |                   |    |                |        | group          |
4177	   |                   |    |                |        | ownership of   |
4178	   |                   |    |                |        | this object.   |
4179	   | quota_avail_hard  | 38 | uint64         | READ   | For definition |
4180	   |                   |    |                |        | see "Quota     |
4181	   |                   |    |                |        | Attributes"    |
4182	   |                   |    |                |        | section below. |
4183	   | quota_avail_soft  | 39 | uint64         | READ   | For definition |
4184	   |                   |    |                |        | see "Quota     |
4185	   |                   |    |                |        | Attributes"    |
4186	   |                   |    |                |        | section below. |
4187	   | quota_used        | 40 | uint64         | READ   | For definition |
4188	   |                   |    |                |        | see "Quota     |
4189	   |                   |    |                |        | Attributes"    |
4190	   |                   |    |                |        | section below. |
4191	   | rawdev            | 41 | specdata4      | READ   | Raw device     |
4192	   |                   |    |                |        | identifier.    |
4193	   |                   |    |                |        | UNIX device    |
4194	   |                   |    |                |        | major/minor    |
4195	   |                   |    |                |        | node           |
4196	   |                   |    |                |        | information.   |
4197	   |                   |    |                |        | If the value   |
4198	   |                   |    |                |        | of type is not |
4199	   |                   |    |                |        | NF4BLK or      |
4200	   |                   |    |                |        | NF4CHR, the    |
4201	   |                   |    |                |        | value return   |
4202	   |                   |    |                |        | SHOULD NOT be  |
4203	   |                   |    |                |        | considered     |
4204	   |                   |    |                |        | useful.        |
4205	   | retentevt_get     | 71 | retention_get4 | READ   | Get the        |
4206	   |                   |    |                |        | event-based    |
4207	   |                   |    |                |        | retention      |
4208	   |                   |    |                |        | duration, and  |
4209	   |                   |    |                |        | if enabled,    |
4210	   |                   |    |                |        | the            |
4211	   |                   |    |                |        | event-based    |
4212	   |                   |    |                |        | retention      |
4213	   |                   |    |                |        | begin time of  |
4214	   |                   |    |                |        | the file       |
4215	   |                   |    |                |        | object.        |
4216	   |                   |    |                |        | GETATTR use    |
4217	   |                   |    |                |        | only.          |
4218	   | retentevt_set     | 72 | retention_set4 | WRITE  | Set the        |
4219	   |                   |    |                |        | event-based    |
4220	   |                   |    |                |        | retention      |
4221	   |                   |    |                |        | duration, and  |
4222	   |                   |    |                |        | optionally     |
4223	   |                   |    |                |        | enable         |
4224	   |                   |    |                |        | event-based    |
4225	   |                   |    |                |        | retention on   |
4226	   |                   |    |                |        | the file       |
4227	   |                   |    |                |        | object.        |
4228	   |                   |    |                |        | SETATTR use    |
4229	   |                   |    |                |        | only.          |
4230	   | retention_get     | 69 | retention_get4 | READ   | Get the        |
4231	   |                   |    |                |        | retention      |
4232	   |                   |    |                |        | duration, and  |
4233	   |                   |    |                |        | if enabled,    |
4234	   |                   |    |                |        | the retention  |
4235	   |                   |    |                |        | begin time of  |
4236	   |                   |    |                |        | the file       |
4237	   |                   |    |                |        | object.        |
4238	   |                   |    |                |        | GETATTR use    |
4239	   |                   |    |                |        | only.          |
4240	   | retention_hold    | 73 | uint64_t       | R/W    | Get or set     |
4241	   |                   |    |                |        | administrative |
4242	   |                   |    |                |        | retention      |
4243	   |                   |    |                |        | holds, one     |
4244	   |                   |    |                |        | hold per bit   |
4245	   |                   |    |                |        | position.      |
4246	   | retention_set     | 70 | retention_set4 | WRITE  | Set the        |
4247	   |                   |    |                |        | retention      |
4248	   |                   |    |                |        | duration, and  |
4249	   |                   |    |                |        | optionally     |
4250	   |                   |    |                |        | enable         |
4251	   |                   |    |                |        | retention on   |
4252	   |                   |    |                |        | the file       |
4253	   |                   |    |                |        | object.        |
4254	   |                   |    |                |        | SETATTR use    |
4255	   |                   |    |                |        | only.          |
4256	   | sacl              | 59 | nfsacl41       | R/W    | Automatically  |
4257	   |                   |    |                |        | inheritable    |
4258	   |                   |    |                |        | access control |
4259	   |                   |    |                |        | list used for  |
4260	   |                   |    |                |        | auditing       |
4261	   |                   |    |                |        | access to      |
4262	   |                   |    |                |        | files.         |
4263	   | space_avail       | 42 | uint64         | READ   | Disk space in  |
4264	   |                   |    |                |        | bytes          |
4265	   |                   |    |                |        | available to   |
4266	   |                   |    |                |        | this user on   |
4267	   |                   |    |                |        | the file       |
4268	   |                   |    |                |        | system         |
4269	   |                   |    |                |        | containing     |
4270	   |                   |    |                |        | this object -  |
4271	   |                   |    |                |        | this should be |
4272	   |                   |    |                |        | the smallest   |
4273	   |                   |    |                |        | relevant       |
4274	   |                   |    |                |        | limit.         |
4275	   | space_free        | 43 | uint64         | READ   | Free disk      |
4276	   |                   |    |                |        | space in bytes |
4277	   |                   |    |                |        | on the file    |
4278	   |                   |    |                |        | system         |
4279	   |                   |    |                |        | containing     |
4280	   |                   |    |                |        | this object -  |
4281	   |                   |    |                |        | this should be |
4282	   |                   |    |                |        | the smallest   |
4283	   |                   |    |                |        | relevant       |
4284	   |                   |    |                |        | limit.         |
4285	   | space_total       | 44 | uint64         | READ   | Total disk     |
4286	   |                   |    |                |        | space in bytes |
4287	   |                   |    |                |        | on the file    |
4288	   |                   |    |                |        | system         |
4289	   |                   |    |                |        | containing     |
4290	   |                   |    |                |        | this object.   |
4291	   | space_used        | 45 | uint64         | READ   | Number of file |
4292	   |                   |    |                |        | system bytes   |
4293	   |                   |    |                |        | allocated to   |
4294	   |                   |    |                |        | this object.   |
4295	   | system            | 46 | bool           | R/W    | True, if this  |
4296	   |                   |    |                |        | file is a      |
4297	   |                   |    |                |        | "system" file  |
4298	   |                   |    |                |        | with respect   |
4299	   |                   |    |                |        | to the Windows |
4300	   |                   |    |                |        | API?           |
4301	   | time_access       | 47 | nfstime4       | READ   | The time of    |
4302	   |                   |    |                |        | last access to |
4303	   |                   |    |                |        | the object by  |
4304	   |                   |    |                |        | a read that    |
4305	   |                   |    |                |        | was satisfied  |
4306	   |                   |    |                |        | by the server. |
4307	   | time_access_set   | 48 | settime4       | WRITE  | Set the time   |
4308	   |                   |    |                |        | of last access |
4309	   |                   |    |                |        | to the object. |
4310	   |                   |    |                |        | SETATTR use    |
4311	   |                   |    |                |        | only.          |
4312	   | time_backup       | 49 | nfstime4       | R/W    | The time of    |
4313	   |                   |    |                |        | last backup of |
4314	   |                   |    |                |        | the object.    |
4315	   | time_create       | 50 | nfstime4       | R/W    | The time of    |
4316	   |                   |    |                |        | creation of    |
4317	   |                   |    |                |        | the object.    |
4318	   |                   |    |                |        | This attribute |
4319	   |                   |    |                |        | does not have  |
4320	   |                   |    |                |        | any relation   |
4321	   |                   |    |                |        | to the         |
4322	   |                   |    |                |        | traditional    |
4323	   |                   |    |                |        | UNIX file      |
4324	   |                   |    |                |        | attribute      |
4325	   |                   |    |                |        | "ctime" or     |
4326	   |                   |    |                |        | "change time". |
4327	   | time_delta        | 51 | nfstime4       | READ   | Smallest       |
4328	   |                   |    |                |        | useful server  |
4329	   |                   |    |                |        | time           |
4330	   |                   |    |                |        | granularity.   |
4331	   | time_metadata     | 52 | nfstime4       | READ   | The time of    |
4332	   |                   |    |                |        | last meta-data |
4333	   |                   |    |                |        | modification   |
4334	   |                   |    |                |        | of the object. |
4335	   | time_modify       | 53 | nfstime4       | READ   | The time of    |
4336	   |                   |    |                |        | last           |
4337	   |                   |    |                |        | modification   |
4338	   |                   |    |                |        | to the object. |
4339	   | time_modify_set   | 54 | settime4       | WRITE  | Set the time   |
4340	   |                   |    |                |        | of last        |
4341	   |                   |    |                |        | modification   |
4342	   |                   |    |                |        | to the object. |
4343	   |                   |    |                |        | SETATTR use    |
4344	   |                   |    |                |        | only.          |
4345	   +-------------------+----+----------------+--------+----------------+

4347	5.7.  Time Access

4349	   As defined above, the time_access attribute represents the time of
4350	   last access to the object by a read that was satisfied by the server.
4351	   The notion of what is an "access" depends on server's operating
4352	   environment and/or the server's file system semantics.  For example,
4353	   for servers obeying POSIX semantics, time_access would be updated
4354	   only by the READLINK, READ, and READDIR operations and not any of the
4355	   operations that modify the content of the object.  Of course, setting
4356	   the corresponding time_access_set attribute is another way to modify
4357	   the time_access attribute.

4359	   Whenever the file object resides on a writable file system, the
4360	   server should make best efforts to record time_access into stable
4361	   storage.  However, to mitigate the performance effects of doing so,
4362	   and most especially whenever the server is satisfying the read of the
4363	   object's content from its cache, the server MAY cache access time
4364	   updates and lazily write them to stable storage.  It is also
4365	   acceptable to give administrators of the server the option to disable
4366	   time_access updates.

4368	5.8.  Interpreting owner and owner_group

4370	   The recommended attributes "owner" and "owner_group" (and also users
4371	   and groups within the "acl" attribute) are represented in terms of a
4372	   UTF-8 string.  To avoid a representation that is tied to a particular
4373	   underlying implementation at the client or server, the use of the
4374	   UTF-8 string has been chosen.  Note that section 6.1 of RFC2624 [27]
4375	   provides additional rationale.  It is expected that the client and
4376	   server will have their own local representation of owner and
4377	   owner_group that is used for local storage or presentation to the end
4378	   user.  Therefore, it is expected that when these attributes are
4379	   transferred between the client and server that the local
4380	   representation is translated to a syntax of the form "user@
4381	   dns_domain".  This will allow for a client and server that do not use
4382	   the same local representation the ability to translate to a common
4383	   syntax that can be interpreted by both.

4385	   Similarly, security principals may be represented in different ways
4386	   by different security mechanisms.  Servers normally translate these
4387	   representations into a common format, generally that used by local
4388	   storage, to serve as a means of identifying the users corresponding
4389	   to these security principals.  When these local identifiers are
4390	   translated to the form of the owner attribute, associated with files
4391	   created by such principals they identify, in a common format, the
4392	   users associated with each corresponding set of security principals.

4394	   The translation used to interpret owner and group strings is not
4395	   specified as part of the protocol.  This allows various solutions to
4396	   be employed.  For example, a local translation table may be consulted
4397	   that maps between a numeric id to the user@dns_domain syntax.  A name
4398	   service may also be used to accomplish the translation.  A server may
4399	   provide a more general service, not limited by any particular
4400	   translation (which would only translate a limited set of possible
4401	   strings) by storing the owner and owner_group attributes in local
4402	   storage without any translation or it may augment a translation
4403	   method by storing the entire string for attributes for which no
4404	   translation is available while using the local representation for
4405	   those cases in which a translation is available.

4407	   Servers that do not provide support for all possible values of the
4408	   owner and owner_group attributes, should return an error
4409	   (NFS4ERR_BADOWNER) when a string is presented that has no
4410	   translation, as the value to be set for a SETATTR of the owner,
4411	   owner_group, or acl attributes.  When a server does accept an owner
4412	   or owner_group value as valid on a SETATTR (and similarly for the
4413	   owner and group strings in an acl), it is promising to return that
4414	   same string when a corresponding GETATTR is done.  Configuration
4415	   changes and ill-constructed name translations (those that contain
4416	   aliasing) may make that promise impossible to honor.  Servers should
4417	   make appropriate efforts to avoid a situation in which these
4418	   attributes have their values changed when no real change to ownership
4419	   has occurred.

4421	   The "dns_domain" portion of the owner string is meant to be a DNS
4422	   domain name.  For example, user@ietf.org.  Servers should accept as
4423	   valid a set of users for at least one domain.  A server may treat
4424	   other domains as having no valid translations.  A more general
4425	   service is provided when a server is capable of accepting users for
4426	   multiple domains, or for all domains, subject to security
4427	   constraints.

4429	   In the case where there is no translation available to the client or
4430	   server, the attribute value must be constructed without the "@".
4431	   Therefore, the absence of the @ from the owner or owner_group
4432	   attribute signifies that no translation was available at the sender
4433	   and that the receiver of the attribute should not use that string as
4434	   a basis for translation into its own internal format.  Even though
4435	   the attribute value can not be translated, it may still be useful.
4436	   In the case of a client, the attribute string may be used for local
4437	   display of ownership.

4439	   To provide a greater degree of compatibility with previous versions
4440	   of NFS (i.e. v2 and v3), which identified users and groups by 32-bit
4441	   unsigned uid's and gid's, owner and group strings that consist of
4442	   decimal numeric values with no leading zeros can be given a special
4443	   interpretation by clients and servers which choose to provide such
4444	   support.  The receiver may treat such a user or group string as
4445	   representing the same user as would be represented by a v2/v3 uid or
4446	   gid having the corresponding numeric value.  A server is not
4447	   obligated to accept such a string, but may return an NFS4ERR_BADOWNER
4448	   instead.  To avoid this mechanism being used to subvert user and
4449	   group translation, so that a client might pass all of the owners and
4450	   groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
4451	   error when there is a valid translation for the user or owner
4452	   designated in this way.  In that case, the client must use the
4453	   appropriate name@domain string and not the special form for
4454	   compatibility.

4456	   The owner string "nobody" may be used to designate an anonymous user,
4457	   which will be associated with a file created by a security principal
4458	   that cannot be mapped through normal means to the owner attribute.

4460	5.9.  Character Case Attributes

4462	   With respect to the case_insensitive and case_preserving attributes,
4463	   each UCS-4 character (which UTF-8 encodes) has a "long descriptive
4464	   name" RFC1345 [28] which may or may not included the word "CAPITAL"
4465	   or "SMALL".  The presence of SMALL or CAPITAL allows an NFS server to
4466	   implement unambiguous and efficient table driven mappings for case
4467	   insensitive comparisons, and non-case-preserving storage.  For
4468	   general character handling and internationalization issues, see the
4469	   section "Internationalization".

4471	5.10.  Quota Attributes

4473	   For the attributes related to file system quotas, the following
4474	   definitions apply:

4476	   quota_avail_soft  The value in bytes which represents the amount of
4477	      additional disk space that can be allocated to this file or
4478	      directory before the user may reasonably be warned.  It is
4479	      understood that this space may be consumed by allocations to other
4480	      files or directories though there is a rule as to which other
4481	      files or directories.

4483	   quota_avail_hard  The value in bytes which represent the amount of
4484	      additional disk space beyond the current allocation that can be
4485	      allocated to this file or directory before further allocations
4486	      will be refused.  It is understood that this space may be consumed
4487	      by allocations to other files or directories.

4489	   quota_used  The value in bytes which represent the amount of disc
4490	      space used by this file or directory and possibly a number of
4491	      other similar files or directories, where the set of "similar"
4492	      meets at least the criterion that allocating space to any file or
4493	      directory in the set will reduce the "quota_avail_hard" of every
4494	      other file or directory in the set.

4496	      Note that there may be a number of distinct but overlapping sets
4497	      of files or directories for which a quota_used value is
4498	      maintained.  E.g. "all files with a given owner", "all files with
4499	      a given group owner". etc.

4501	      The server is at liberty to choose any of those sets but should do
4502	      so in a repeatable way.  The rule may be configured per file
4503	      system or may be "choose the set with the smallest quota".

4505	5.11.  mounted_on_fileid

4507	   UNIX-based operating environments connect a file system into the
4508	   namespace by connecting (mounting) the file system onto the existing
4509	   file object (the mount point, usually a directory) of an existing
4510	   file system.  When the mount point's parent directory is read via an
4511	   API like readdir(), the return results are directory entries, each
4512	   with a component name and a fileid.  The fileid of the mount point's
4513	   directory entry will be different from the fileid that the stat()
4514	   system call returns.  The stat() system call is returning the fileid
4515	   of the root of the mounted file system, whereas readdir() is
4516	   returning the fileid stat() would have returned before any file
4517	   systems were mounted on the mount point.

4519	   Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request
4520	   to cross other file systems.  The client detects the file system
4521	   crossing whenever the filehandle argument of LOOKUP has an fsid
4522	   attribute different from that of the filehandle returned by LOOKUP.
4523	   A UNIX-based client will consider this a "mount point crossing".
4524	   UNIX has a legacy scheme for allowing a process to determine its
4525	   current working directory.  This relies on readdir() of a mount
4526	   point's parent and stat() of the mount point returning fileids as
4527	   previously described.  The mounted_on_fileid attribute corresponds to
4528	   the fileid that readdir() would have returned as described
4529	   previously.

4531	   While the NFS version 4 client could simply fabricate a fileid
4532	   corresponding to what mounted_on_fileid provides (and if the server
4533	   does not support mounted_on_fileid, the client has no choice), there
4534	   is a risk that the client will generate a fileid that conflicts with
4535	   one that is already assigned to another object in the file system.
4536	   Instead, if the server can provide the mounted_on_fileid, the
4537	   potential for client operational problems in this area is eliminated.

4539	   If the server detects that there is no mounted point at the target
4540	   file object, then the value for mounted_on_fileid that it returns is
4541	   the same as that of the fileid attribute.

4543	   The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD
4544	   provide it if possible, and for a UNIX-based server, this is
4545	   straightforward.  Usually, mounted_on_fileid will be requested during
4546	   a READDIR operation, in which case it is trivial (at least for UNIX-
4547	   based servers) to return mounted_on_fileid since it is equal to the
4548	   fileid of a directory entry returned by readdir().  If
4549	   mounted_on_fileid is requested in a GETATTR operation, the server
4550	   should obey an invariant that has it returning a value that is equal
4551	   to the file object's entry in the object's parent directory, i.e.
4552	   what readdir() would have returned.  Some operating environments
4553	   allow a series of two or more file systems to be mounted onto a
4554	   single mount point.  In this case, for the server to obey the
4555	   aforementioned invariant, it will need to find the base mount point,
4556	   and not the intermediate mount points.

4558	5.12.  Directory Notification Attributes

4560	   As described in Section 17.39, the client can request a minimum delay
4561	   for notifications of changes to attributes, but the server is free
4562	   ignore what the client requests.  The client can determine in advance
4563	   what notification delays the server will accept by issuing a GETATTR
4564	   for either or both of two directory notification attributes.  When
4565	   the client calls the GET_DIR_DELEGATION operation and asks^M for
4566	   attribute change notifications, it should request^M notification
4567	   delays that are no less than the values in the^M server-provided
4568	   attributes.

4570	5.12.1.  dir_notif_delay

4572	   The dir_notify_delay attribute is the minimum number of seconds the
4573	   server will delay before notifying the client of a change to the
4574	   directory's attributes.

4576	5.12.2.  dirent_notif_delay

4578	   The dirent_notif_delay attribute is the minimum number of seconds the
4579	   server will delay before notifying the client of a change to a file
4580	   object that has an entry in the directory.

4582	5.13.  PNFS Attributes

4584	5.13.1.  fs_layout_type

4586	   The fs_layout_type attribute (data type layouttype4, see
4587	   Section 3.2.15) applies to a file system and indicates what layout
4588	   types are supported by the file system.  This attribute is expected
4589	   be queried when a client encounters a new fsid.  This attribute is
4590	   used by the client to determine if it supports the layout type.

4592	5.13.2.  layout_alignment

4594	   The layout_alignment attribute indicates the preferred alignment for
4595	   I/O to files on the file system the client has layouts for.  Where
4596	   possible, the client should issue READ and WRITE operations with
4597	   offsets are whole multiples of the layout_alignment attribute.

4599	5.13.3.  layout_blksize

4601	   The layout_blksize attribute indicates the preferred block size for
4602	   I/O to files on the file system the client has layouts for.  Where
4603	   possible, the client should issue READ operations with a count
4604	   argument that is a whole multiple of layout_blksize, and WRITE
4605	   operations with a data argument of size that is a whole multiple of
4606	   layout_blksize.

4608	5.13.4.  layout_hint

4610	   The layout_hint attribute (data type layouthint4, see Section 3.2.22)
4611	   may be set on newly created files to influence the metadata server's
4612	   choice for the file's layout.  It is suggested that this attribute is
4613	   set as one of the initial attributes within the OPEN call.  The
4614	   metadata server may ignore this attribute.  This attribute is a sub-
4615	   set of the layout structure returned by LAYOUTGET.  For example,
4616	   instead of specifying particular devices, this would be used to
4617	   suggest the stripe width of a file.  It is up to the server
4618	   implementation to determine which fields within the layout it uses.

4620	5.13.5.  layout_type

4622	   This attribute indicates the particular layout type(s) used for a
4623	   file.  This is for informational purposes only.  The client needs to
4624	   use the LAYOUTGET operation in order to get enough information (e.g.,
4625	   specific device information) in order to perform I/O.

4627	5.13.6.  mdsthreshold

4629	   This attribute acts as a hint to the client to help it determine when
4630	   it is more efficient to issue read and write requests to the metadata
4631	   server vs. the data server.  Two types of thresholds are described:
4632	   file size thresholds and I/O size thresholds.  If a file's size is
4633	   smaller than the file size threshold, data accesses should be issued
4634	   to the metadata server.  If an I/O is below the I/O size threshold,
4635	   the I/O should be issued to the metadata server.  Each threshold can
4636	   be specified independently for read and write requests.  For either
4637	   threshold type, a value of 0 indicates no read or write should be
4638	   issued to the metadata server, while a value of all 1s indicates all
4639	   reads or writes should be issued to the metadata server.

4641	   The attribute is available on a per filehandle basis.  If the current
4642	   filehandle refers to a non-pNFS file or directory, the metadata
4643	   server should return an attribute that is representative of the
4644	   filehandle's file system.  It is suggested that this attribute is
4645	   queried as part of the OPEN operation.  Due to dynamic system
4646	   changes, the client should not assume that the attribute will remain
4647	   constant for any specific time period, thus it should be periodically
4648	   refreshed.

4650	5.14.  Retention Attributes

4652	   Retention is a concept whereby a file object can be placed in an
4653	   immutable, undeletable, unrenamable state for a fixed or infinite
4654	   duration of time.  Once in this "retained" state, the file cannot be
4655	   moved out of the state until the duration of retention has been
4656	   reached.

4658	   When retention is enabled, retention MUST extend to the data of the
4659	   file, and the name of file.  The server MAY extend retention any
4660	   other property of the file, including any subset of mandatory,
4661	   recommended, and named attributes, with the exceptions noted in this
4662	   section.

4664	   Servers MAY support or not support retention on any file object type.

4666	   There are five retention attributes:

4668	   o  retention_get.  This attribute is only readable via GETATTR and
4669	      not setable via SETATTR.  The value of the attribute consists of:

4671	   const RET4_DURATION_INFINITE    = 0xffffffffffffffff;
4672	   struct retention_get4 {
4673	           uint64_t        rg_duration;
4674	           nfstime4        rg_begin_time<1>;
4675	   };

4677	      The field rg_duration is duration in seconds indicating how long
4678	      the file will be retained once retention is enabled.  The field
4679	      rg_begin_time is an array of up to one absolute time value.  If
4680	      the array is zero length, no beginning retention time has been
4681	      established, and retention is not enabled.  If rg_duration is
4682	      equal to RET4_DURATION_INFINITE, the file, once retention is
4683	      enabled, will be retained for an infinite duration.

4685	   o  retention_set.  This attribute corresponds to retention_get.  This
4686	      attribute is only setable via SETATTR and not readable via
4687	      GETATTR.  The value of the attribute consists of:

4689	   struct retention_set4 {
4690	           bool            rs_enable;
4691	           uint64_t        rs_duration<1>;
4692	   };
4693	      If the client sets rs_enable to TRUE, then it is enabling
4694	      retention on the file object with the begin time of retention
4695	      commencing from the server's current time and date.  The duration
4696	      of the retention can also be provided if the rs_duration array is
4697	      of length one.  The duration is time is seconds from the begin
4698	      time of retention, and if set to RET4_DURATION_INFINITE, the file
4699	      is to be retained forever.  If retention is enabled, with no
4700	      duration specified in either this SETATTR or a previous SETATTR,
4701	      the duration defaults to zero seconds.  The server MAY restrict
4702	      the enabling of retention or the duration of retention on the
4703	      basis of the ACE4_WRITE_RETENTION ACL permission.  The enabling of
4704	      retention does not prevent the enabling of event-based retention
4705	      nor the modification of the retention_hold attribute.

4707	   o  retentevt_get.  This attribute is like retention_get, but refers
4708	      to event-based retention.  The event that triggers event-based
4709	      retention is not defined by the NFSv4.1 specification.

4711	   o  retentevt_set.  This attribute corresponds to retentevt_get, is
4712	      like retention_set, but refers to event-based retention.  When
4713	      event based retention is set, the file MUST be retained even if
4714	      non-event-based retention has been set, and the duration of non-
4715	      event-based retention has been reached.  Conversely, when non-
4716	      event-based retention has been set, the file MUST be retained even
4717	      the event-based retention has been set, and the duration of event-
4718	      based retention has been reached.  The server MAY restrict the
4719	      enabling of event-based retention or the duration of event-based
4720	      retention on the basis of the ACE4_WRITE_RETENTION ACL permission.
4721	      The enabling of event-based retention does not prevent the
4722	      enabling of non-event-based retention nor the modification of the
4723	      retention_hold attribute.

4725	   o  retention_hold.  This attribute allows one to 64 administrative
4726	      holds, one hold per bit on the attribute.  If retention_hold is
4727	      not zero, then the file MUST NOT be deleted, renamed, or modified,
4728	      even if the duration on enabled event or non-event-based retention
4729	      has been reached.  The server MAY restrict the modification of
4730	      retention_hold on the basis of the ACE4_WRITE_RETENTION_HOLD ACL
4731	      permission.  The enabling of administration retention holds does
4732	      not prevent the enabling of event-based or non-event-based
4733	      retention.

4735	6.  Access Control Lists

4737	   Access Control Lists (ACLs) are a file attribute that specify fine
4738	   grained access control.  This chapter covers the "acl", "dacl",
4739	   "sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and
4740	   their interactions.

4742	6.1.  Goals

4744	   ACLs and modes represent two well established but different models
4745	   for specifying permissions.  This chapter specifies requirements that
4746	   attempt to meet the following goals:

4748	   o  If a server supports the mode attribute, it should provide
4749	      reasonable semantics to clients that only set and retrieve the
4750	      mode attribute.

4752	   o  If a server supports the ACL attribute, it should provide
4753	      reasonable semantics to clients that only set and retrieve the ACL
4754	      attribute.

4756	   o  On servers that support the mode attribute, if the ACL attribute
4757	      has never been set on an object, via inheritance or explicitly,
4758	      the behavior should be traditional UNIX-like behavior.

4760	   o  On servers that support the mode attribute, if the ACL attribute
4761	      has been previously set on an object, either explicitly or via
4762	      inheritance:

4764	      *  Setting only the mode attribute should effectively control the
4765	         traditional UNIX-like permissions of read, write, and execute
4766	         on owner, owner_group, and other.

4768	      *  Setting only the mode attribute should provide reasonable
4769	         security.  For example, setting a mode of 000 should be enough
4770	         to ensure that future opens for read or write by any principal
4771	         should fail, regardless of a previously existing or inherited
4772	         ACL.

4774	   o  This minor version of NFSv4 should not introduce significantly
4775	      different semantics relating to the mode and ACL attributes, nor
4776	      should it render invalid any existing implementations.  Rather,
4777	      this chapter provides clarifications based on previous
4778	      implementations and discussions around them.

4780	   o  If a server supports the ACL attribute, then at any time, the
4781	      server can provide an ACL attribute when requested.  The ACL
4782	      attribute will describe all permissions on the file object, except
4783	      for the three high-order bits of the mode attribute (described in
4784	      Section 6.2.3).  The ACL attribute will not conflict with the mode
4785	      attribute, on servers that support the mode attribute.

4787	   o  If a server supports the mode attribute, then at any time, the
4788	      server can provide a mode attribute when requested.  The mode
4789	      attribute will not conflict with the ACL attribute, on servers
4790	      that support the ACL attribute.

4792	   o  When a mode attribute is set on an object, the ACL attribute may
4793	      need to be modified so as to not conflict with the new mode.  In
4794	      such cases, it is desirable that the ACL keep as much information
4795	      as possible.  This includes information about inheritance, AUDIT
4796	      and ALARM ACEs, and permissions granted and denied that do not
4797	      conflict with the new mode.

4799	6.2.  File Attributes Discussion

4801	6.2.1.  ACL Attribute

4803	   The NFS version 4 ACL attribute is an array of access control entries
4804	   (ACEs).  Although the client can read and write the ACL attribute,
4805	   the server is responsible for using the ACL to perform access
4806	   control.  The client can use the OPEN or ACCESS operations to check
4807	   access without modifying or reading data or metadata.

4809	   The NFS ACE attribute is defined as follows:

4811	                       typedef uint32_t   acetype4;
4812	                       typedef uint32_t   aceflag4;
4813	                       typedef uint32_t   acemask4;

4815	                       struct nfsace4 {
4816	                           acetype4       type;
4817	                           aceflag4       flag;
4818	                           acemask4       access_mask;
4819	                           utf8str_mixed  who;
4820	                       };

4822	   To determine if a request succeeds, the server processes each nfsace4
4823	   entry in order.  Only ACEs which have a "who" that matches the
4824	   requester are considered.  Each ACE is processed until all of the
4825	   bits of the requester's access have been ALLOWED.  Once a bit (see
4826	   below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer
4827	   considered in the processing of later ACEs.  If an ACCESS_DENIED_ACE
4828	   is encountered where the requester's access still has unALLOWED bits
4829	   in common with the "access_mask" of the ACE, the request is denied.
4830	   When the ACL is fully processed, if there are bits in the requester's
4831	   mask that have not been ALLOWED or DENIED, access is denied.

4833	   Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do
4834	   not affect a requester's access, and instead are for triggering
4835	   events as a result of a requester's access attempt.  Therefore, all
4836	   AUDIT and ALARM ACEs are processed until end of the ACL.

4838	   The NFS version 4 ACL model is quite rich.  Some server platforms may
4839	   provide access control functionality that goes beyond the UNIX-style
4840	   mode attribute, but which is not as rich as the NFS ACL model.  So
4841	   that users can take advantage of this more limited functionality, the
4842	   server may indicate that it supports ACLs as long as it follows the
4843	   guidelines for mapping between its ACL model and the NFS version 4
4844	   ACL model.

4846	   The situation is complicated by the fact that a server may have
4847	   multiple modules that enforce ACLs.  For example, the enforcement for
4848	   NFS version 4 access may be different from the enforcement for local
4849	   access, and both may be different from the enforcement for access
4850	   through other protocols such as SMB.  So it may be useful for a
4851	   server to accept an ACL even if not all of its modules are able to
4852	   support it.

4854	   The guiding principle in all cases is that the server must not accept
4855	   ACLs that appear to make the file more secure than it really is.

4857	6.2.1.1.  ACE Type

4859	   The constants used for the type field (acetype4) are as follows:

4861	                     const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000;
4862	                     const ACE4_ACCESS_DENIED_ACE_TYPE  = 0x00000001;
4863	                     const ACE4_SYSTEM_AUDIT_ACE_TYPE   = 0x00000002;
4864	                     const ACE4_SYSTEM_ALARM_ACE_TYPE   = 0x00000003;

4866	   +------------------------------+--------------+---------------------+
4867	   | Value                        | Abbreviation | Description         |
4868	   +------------------------------+--------------+---------------------+
4869	   | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW        | Explicitly grants   |
4870	   |                              |              | the access defined  |
4871	   |                              |              | in acemask4 to the  |
4872	   |                              |              | file or directory.  |
4873	   | ACE4_ACCESS_DENIED_ACE_TYPE  | DENY         | Explicitly denies   |
4874	   |                              |              | the access defined  |
4875	   |                              |              | in acemask4 to the  |
4876	   |                              |              | file or directory.  |
4877	   | ACE4_SYSTEM_AUDIT_ACE_TYPE   | AUDIT        | LOG (system         |
4878	   |                              |              | dependent) any      |
4879	   |                              |              | access attempt to a |
4880	   |                              |              | file or directory   |
4881	   |                              |              | which uses any of   |
4882	   |                              |              | the access methods  |
4883	   |                              |              | specified in        |
4884	   |                              |              | acemask4.           |
4885	   | ACE4_SYSTEM_ALARM_ACE_TYPE   | ALARM        | Generate a system   |
4886	   |                              |              | ALARM (system       |
4887	   |                              |              | dependent) when any |
4888	   |                              |              | access attempt is   |
4889	   |                              |              | made to a file or   |
4890	   |                              |              | directory for the   |
4891	   |                              |              | access methods      |
4892	   |                              |              | specified in        |
4893	   |                              |              | acemask4.           |
4894	   +------------------------------+--------------+---------------------+

4896	    The "Abbreviation" column denotes how the types will be referred to
4897	                   throughout the rest of this document.

4899	6.2.1.2.  The aclsupport Attribute

4901	   A server need not support all of the above ACE types.  The bitmask
4902	   constants used to represent the above definitions within the
4903	   aclsupport attribute are as follows:

4905	                     const ACL4_SUPPORT_ALLOW_ACL    = 0x00000001;
4906	                     const ACL4_SUPPORT_DENY_ACL     = 0x00000002;
4907	                     const ACL4_SUPPORT_AUDIT_ACL    = 0x00000004;
4908	                     const ACL4_SUPPORT_ALARM_ACL    = 0x00000008;

4910	   Clients should not attempt to set an ACE unless the server claims
4911	   support for that ACE type.  If the server receives a request to set
4912	   an ACE that it cannot store, it MUST reject the request with
4913	   NFS4ERR_ATTRNOTSUPP.  If the server receives a request to set an ACE
4914	   that it can store but cannot enforce, the server SHOULD reject the
4915	   request with NFS4ERR_ATTRNOTSUPP.

4917	   Example: suppose a server can enforce NFS ACLs for NFS access but
4918	   cannot enforce ACLs for local access.  If arbitrary processes can run
4919	   on the server, then the server SHOULD NOT indicate ACL support.  On
4920	   the other hand, if only trusted administrative programs run locally,
4921	   then the server may indicate ACL support.

4923	6.2.1.3.  ACE Access Mask

4925	   The bitmask constants used for the access mask field are as follows:

4927	              const ACE4_READ_DATA            = 0x00000001;
4928	              const ACE4_LIST_DIRECTORY       = 0x00000001;
4929	              const ACE4_WRITE_DATA           = 0x00000002;
4930	              const ACE4_ADD_FILE             = 0x00000002;
4931	              const ACE4_APPEND_DATA          = 0x00000004;
4932	              const ACE4_ADD_SUBDIRECTORY     = 0x00000004;
4933	              const ACE4_READ_NAMED_ATTRS     = 0x00000008;
4934	              const ACE4_WRITE_NAMED_ATTRS    = 0x00000010;
4935	              const ACE4_EXECUTE              = 0x00000020;
4936	              const ACE4_DELETE_CHILD         = 0x00000040;
4937	              const ACE4_READ_ATTRIBUTES      = 0x00000080;
4938	              const ACE4_WRITE_ATTRIBUTES     = 0x00000100;
4939	              const ACE4_WRITE_RETENTION      = 0x00000200;
4940	              const ACE4_WRITE_RETENTION_HOLD = 0x00000400;
4941	              const ACE4_DELETE               = 0x00010000;
4942	              const ACE4_READ_ACL             = 0x00020000;
4943	              const ACE4_WRITE_ACL            = 0x00040000;
4944	              const ACE4_WRITE_OWNER          = 0x00080000;
4945	              const ACE4_SYNCHRONIZE          = 0x00100000;

4947	6.2.1.3.1.  Discussion of Mask Attributes

4949	 ACE4_READ_DATA
4950	    Operation(s) affected:
4951	         READ
4952	         OPEN
4953	    Discussion:
4954	         Permission to read the data of the file.

4956	         Servers SHOULD allow a user the ability to read the data
4957	         of the file when only the ACE4_EXECUTE access mask bit is
4958	         allowed.

4960	 ACE4_LIST_DIRECTORY
4961	     Operation(s) affected:
4962	         READDIR
4963	     Discussion:
4964	         Permission to list the contents of a directory.

4966	 ACE4_WRITE_DATA
4967	     Operation(s) affected:
4968	         WRITE
4969	         OPEN
4970	         SETATTR of size

4972	     Discussion:
4973	         Permission to modify a file's data anywhere in the file's
4974	         offset range.  This includes the ability to write to any
4975	         arbitrary offset and as a result to grow the file.

4977	 ACE4_ADD_FILE
4978	     Operation(s) affected:
4979	         CREATE
4980	         OPEN
4981	     Discussion:
4982	         Permission to add a new file in a directory.  The CREATE
4983	         operation is affected when nfs_ftype4 is NF4LNK, NF4BLK,
4984	         NF4CHR, NF4SOCK, or NF4FIFO. (NF4DIR is not listed because
4985	         it is covered by ACE4_ADD_SUBDIRECTORY.) OPEN is affected
4986	         when used to create a regular file.

4988	 ACE4_APPEND_DATA
4989	     Operation(s) affected:
4990	         WRITE
4991	         OPEN
4992	         SETATTR of size
4993	     Discussion:
4994	          The ability to modify a file's data, but only starting at
4995	          EOF.  This allows for the notion of append-only files, by
4996	          allowing ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to
4997	          the same user or group.  If a file has an ACL such as the
4998	          one described above and a WRITE request is made for
4999	          somewhere other than EOF, the server SHOULD return
5000	          NFS4ERR_ACCESS.

5002	 ACE4_ADD_SUBDIRECTORY
5003	     Operation(s) affected:
5004	         CREATE
5005	     Discussion:
5006	         Permission to create a subdirectory in a directory.  The
5007	         CREATE operation is affected when nfs_ftype4 is NF4DIR.

5009	 ACE4_READ_NAMED_ATTRS
5010	     Operation(s) affected:
5011	         OPENATTR
5012	     Discussion:
5013	         Permission to read the named attributes of a file or to
5014	         lookup the named attributes directory.  OPENATTR is
5015	         affected when it is not used to create a named attribute
5016	         directory.  This is when 1.) createdir is TRUE, but a
5017	         named attribute directory already exists, or 2.) createdir
5018	         is FALSE.

5020	 ACE4_WRITE_NAMED_ATTRS
5021	     Operation(s) affected:
5022	         OPENATTR
5023	     Discussion:
5024	         Permission to write the named attributes of a file or
5025	         to create a named attribute directory.  OPENATTR is
5026	         affected when it is used to create a named attribute
5027	         directory.  This is when createdir is TRUE and no named
5028	         attribute directory exists.  The ability to check whether
5029	         or not a named attribute directory exists depends on the
5030	         ability to look it up, therefore, users also need the
5031	         ACE4_READ_NAMED_ATTRS permission in order to create a
5032	         named attribute directory.

5034	 ACE4_EXECUTE
5035	     Operation(s) affected:
5036	         LOOKUP
5037	         READ
5038	         OPEN
5039	     Discussion:
5040	         Permission to execute a file or traverse/search a
5041	         directory.

5043	         Servers SHOULD allow a user the ability to read the data
5044	         of the file when only the ACE4_EXECUTE access mask bit is
5045	         allowed.  This is because there is no way to execute a
5046	         file without reading the contents.  Though a server may
5047	         treat ACE4_EXECUTE and ACE4_READ_DATA bits identically
5048	         when deciding to permit a READ operation, it SHOULD still
5049	         allow the two bits to be set independently in ACLs, and
5050	         MUST distinguish between them when replying to ACCESS
5051	         operations.  In particular, servers SHOULD NOT silently
5052	         turn on one of the two bits when the other is set, as
5053	         that would make it impossible for the client to correctly
5054	         enforce the distinction between read and execute
5055	         permissions.

5057	          As an example, following a SETATTR of the following ACL:
5058	                  nfsuser:ACE4_EXECUTE:ALLOW

5060	          A subsequent GETATTR of ACL for that file SHOULD return:
5061	                  nfsuser:ACE4_EXECUTE:ALLOW
5062	          Rather than:
5063	                  nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW

5065	 ACE4_DELETE_CHILD
5066	     Operation(s) affected:
5067	         REMOVE

5069	     Discussion:
5070	         Permission to delete a file or directory within a
5071	         directory.  See section "ACE4_DELETE vs. ACE4_DELETE_CHILD"
5072	         for information on how these two access mask bits interact.

5074	 ACE4_READ_ATTRIBUTES
5075	     Operation(s) affected:
5076	         GETATTR of file system object attributes
5077	     Discussion:
5078	         The ability to read basic attributes (non-ACLs) of a file.
5079	         On a UNIX system, basic attributes can be thought of as
5080	         the stat level attributes.  Allowing this access mask bit
5081	         would mean the entity can execute "ls -l" and stat.

5083	 ACE4_WRITE_ATTRIBUTES
5084	     Operation(s) affected:
5085	         SETATTR of time_access_set, time_backup,
5086	         time_create, time_modify_set, mimetype, hidden, system
5087	     Discussion:
5088	         Permission to change the times associated with a file
5089	         or directory to an arbitrary value.  Also permission
5090	         to change the mimetype, hidden and system attributes.
5091	         A user having ACE4_WRITE_DATA permission, but lacking
5092	         ACE4_WRITE_ATTRIBUTES must be allowed to implicitly set
5093	         the times associated with a file.

5095	 ACE4_WRITE_RETENTION
5096	     Operation(s) affected:
5097	         SETATTR of retention_set, retentevt_set.
5098	     Discussion:
5099	         Permission to modify the durations of event and non-event-based
5100	         retention. Also permission to enable event and non-event-based
5101	         retention. A server MAY map ACE4_WRITE_ATTRIBUTES to
5102	         ACE_WRITE_RETENTION.

5104	 ACE4_WRITE_RETENTION_HOLD
5105	     Operation(s) affected:
5106	         SETATTR of retention_hold.
5107	     Discussion:
5108	         Permission to modify the administration retention holds.
5109	         A server MAY map ACE4_WRITE_ATTRIBUTES to
5110	         ACE_WRITE_RETENTION_HOLD.

5112	 ACE4_DELETE
5113	     Operation(s) affected:
5114	         REMOVE
5115	     Discussion:
5116	         Permission to delete the file or directory.  See section
5117	         "ACE4_DELETE vs. ACE4_DELETE_CHILD" for information on how
5118	         these two access mask bits interact.

5120	 ACE4_READ_ACL
5121	     Operation(s) affected:
5122	         GETATTR of acl
5123	     Discussion:
5124	         Permission to read the ACL.

5126	 ACE4_WRITE_ACL
5127	     Operation(s) affected:
5128	         SETATTR of acl and mode
5129	     Discussion:
5130	         Permission to write the acl and mode attributes.

5132	 ACE4_WRITE_OWNER
5133	     Operation(s) affected:
5134	         SETATTR of owner and owner_group
5135	     Discussions:
5136	         Permission to write the owner and owner_group attributes.
5137	         On UNIX systems, this is the ability to execute chown().

5139	 ACE4_SYNCHRONIZE
5140	     Operation(s) affected:
5141	         NONE
5142	     Discussion:
5143	         Permission to access file locally at the server with
5144	         synchronized reads and writes.

5146	   Server implementations need not provide the granularity of control
5147	   that is implied by this list of masks.  For example, POSIX-based
5148	   systems might not distinguish ACE4_APPEND_DATA (the ability to append
5149	   to a file) from ACE4_WRITE_DATA (the ability to modify existing
5150	   contents); both masks would be tied to a single "write" permission.
5151	   When such a server returns attributes to the client, it would show
5152	   both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write
5153	   permission is enabled.

5155	   If a server receives a SETATTR request that it cannot accurately
5156	   implement, it should error in the direction of more restricted
5157	   access.  For example, suppose a server cannot distinguish overwriting
5158	   data from appending new data, as described in the previous paragraph.
5159	   If a client submits an ACE where ACE4_APPEND_DATA is set but
5160	   ACE4_WRITE_DATA is not (or vice versa), the server should reject the
5161	   request with NFS4ERR_ATTRNOTSUPP.  Nonetheless, if the ACE has type
5162	   DENY, the server may silently turn on the other bit, so that both
5163	   ACE4_APPEND_DATA and ACE4_WRITE_DATA are denied.

5165	6.2.1.3.2.  ACE4_DELETE vs. ACE4_DELETE_CHILD

5167	   Two access mask bits govern the ability to delete a file or directory
5168	   object: ACE4_DELETE on the object itself, and ACE4_DELETE_CHILD on
5169	   the object's parent directory.

5171	   Many systems also consult the "sticky bit" (MODE4_SVTX) and write
5172	   mode bit on the parent directory when determining whether to allow a
5173	   file to be deleted.  The mode bit for write corresponds to
5174	   ACE4_WRITE_DATA, which is the same physical bit as ACE4_ADD_FILE.
5175	   Therefore, ACE4_ADD_FILE can come into play when determining
5176	   permission to delete.

5178	   In the algorithm below, the strategy is that ACE4_DELETE and
5179	   ACE4_DELETE_CHILD take precedence over the sticky bit, and the sticky
5180	   bit takes precedence over the "write" mode bits (reflected in
5181	   ACE4_ADD_FILE).

5183	   Server implementations SHOULD grant or deny permission to delete
5184	   based on the following algorithm.

5186	       if ACE4_EXECUTE is denied by the parent directory ACL:
5187	           deny delete
5188	       else if ACE4_DELETE is allowed by the target object ACL:
5189	           allow delete
5190	       else if ACE4_DELETE_CHILD is allowed by the parent
5191	       directory ACL:
5192	           allow delete
5193	       else if ACE4_DELETE_CHILD is denied by the
5194	       parent directory ACL:
5195	           deny delete
5196	       else if ACE4_ADD_FILE is allowed by the parent directory ACL:
5197	           if MODE4_SVTX is set for the parent directory:
5198	               if the principal owns the parent directory OR
5199	                   the principal owns the target object OR
5200	                   ACE4_WRITE_DATA is allowed by the target
5201	                   object ACL:
5202	                       allow delete
5203	                   else:
5204	                       deny delete
5205	           else:
5206	               allow delete
5207	       else:
5208	           deny delete

5210	6.2.1.4.  ACE flag

5212	   The bitmask constants used for the flag field are as follows:

5214	              const ACE4_FILE_INHERIT_ACE             = 0x00000001;
5215	              const ACE4_DIRECTORY_INHERIT_ACE        = 0x00000002;
5216	              const ACE4_NO_PROPAGATE_INHERIT_ACE     = 0x00000004;
5217	              const ACE4_INHERIT_ONLY_ACE             = 0x00000008;
5218	              const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG   = 0x00000010;
5219	              const ACE4_FAILED_ACCESS_ACE_FLAG       = 0x00000020;
5220	              const ACE4_IDENTIFIER_GROUP             = 0x00000040;
5221	              const ACE4_INHERITED_ACE                = 0x00000080;

5223	   A server need not support any of these flags.  If the server supports
5224	   flags that are similar to, but not exactly the same as, these flags,
5225	   the implementation may define a mapping between the protocol-defined
5226	   flags and the implementation-defined flags.  Again, the guiding
5227	   principle is that the file not appear to be more secure than it
5228	   really is.

5230	   For example, suppose a client tries to set an ACE with
5231	   ACE4_FILE_INHERIT_ACE set but not ACE4_DIRECTORY_INHERIT_ACE.  If the
5232	   server does not support any form of ACL inheritance, the server
5233	   should reject the request with NFS4ERR_ATTRNOTSUPP.  If the server
5234	   supports a single "inherit ACE" flag that applies to both files and
5235	   directories, the server may reject the request (i.e., requiring the
5236	   client to set both the file and directory inheritance flags).  The
5237	   server may also accept the request and silently turn on the
5238	   ACE4_DIRECTORY_INHERIT_ACE flag.

5240	6.2.1.4.1.  Discussion of Flag Bits

5242	   ACE4_FILE_INHERIT_ACE
5243	      Can be placed on a directory and indicates that this ACE should be
5244	      added to each new non-directory file created.

5246	   ACE4_DIRECTORY_INHERIT_ACE
5247	      Can be placed on a directory and indicates that this ACE should be
5248	      added to each new directory created.

5250	   ACE4_INHERIT_ONLY_ACE
5251	      Can be placed on a directory but does not apply to the directory;
5252	      ALLOW and DENY ACEs with this bit set do not affect access to the
5253	      directory, and AUDIT and ALARM ACEs with this bit set do not
5254	      trigger log or alarm events.  Such ACEs only take effect once they
5255	      are applied (with this bit cleared) to newly created files and
5256	      directories as specified by the above two flags.

5258	   ACE4_NO_PROPAGATE_INHERIT_ACE
5259	      Can be placed on a directory.  This flag tells the server that
5260	      inheritance of this ACE should stop at newly created child
5261	      directories.

5263	   ACE4_INHERITED_ACE
5264	      Indicates that this ACE is inherited from a parent directory.  A
5265	      server that supports automatic inheritance will place this flag on
5266	      any ACEs inherited from the parent directory when creating a new
5267	      object.  Client applications will use this to perform automatic
5268	      inheritance.  Clients and servers MUST clear this bit in the acl
5269	      attribute; it may only be used in the dacl and sacl attributes.

5271	   ACE4_SUCCESSFUL_ACCESS_ACE_FLAG

5273	   ACE4_FAILED_ACCESS_ACE_FLAG
5274	      The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and
5275	      ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits relate only to
5276	      ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE
5277	      (ALARM) ACE types.  If during the processing of the file's ACL,
5278	      the server encounters an AUDIT or ALARM ACE that matches the
5279	      principal attempting the OPEN, the server notes that fact, and the
5280	      presence, if any, of the SUCCESS and FAILED flags encountered in
5281	      the AUDIT or ALARM ACE.  Once the server completes the ACL
5282	      processing, it then notes if the operation succeeded or failed.
5283	      If the operation succeeded, and if the SUCCESS flag was set for a
5284	      matching AUDIT or ALARM ACE, then the appropriate AUDIT or ALARM
5285	      event occurs.  If the operation failed, and if the FAILED flag was
5286	      set for the matching AUDIT or ALARM ACE, then the appropriate
5287	      AUDIT or ALARM event occurs.  Either or both of the SUCCESS or
5288	      FAILED can be set, but if neither is set, the AUDIT or ALARM ACE
5289	      is not useful.

5291	      The previously described processing applies to that of the ACCESS
5292	      operation as well, the difference being that "success" or
5293	      "failure" does not mean whether ACCESS returns NFS4_OK or not.
5294	      Success means whether ACCESS returns all requested and supported
5295	      bits.  Failure means whether ACCESS failed to return a bit that
5296	      was requested and supported.

5298	   ACE4_IDENTIFIER_GROUP
5299	      Indicates that the "who" refers to a GROUP as defined under UNIX
5300	      or a GROUP ACCOUNT as defined under Windows.  Clients and servers
5301	      must ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who
5302	      value equal to one of the special identifiers outlined in
5303	      Section 6.2.1.5.

5305	6.2.1.5.  ACE Who

5307	   The "who" field of an ACE is an identifier that specifies the
5308	   principal or principals to whom the ACE applies.  It may refer to a
5309	   user or a group, with the flag bit ACE4_IDENTIFIER_GROUP specifying
5310	   which.

5312	   There are several special identifiers which need to be understood
5313	   universally, rather than in the context of a particular DNS domain.
5314	   Some of these identifiers cannot be understood when an NFS client
5315	   accesses the server, but have meaning when a local process accesses
5316	   the file.  The ability to display and modify these permissions is
5317	   permitted over NFS, even if none of the access methods on the server
5318	   understands the identifiers.

5320	   +---------------+--------------------------------------------------+
5321	   | Who           | Description                                      |
5322	   +---------------+--------------------------------------------------+
5323	   | OWNER         | The owner of the file                            |
5324	   | GROUP         | The group associated with the file.              |
5325	   | EVERYONE      | The world, including the owner and owning group. |
5326	   | INTERACTIVE   | Accessed from an interactive terminal.           |
5327	   | NETWORK       | Accessed via the network.                        |
5328	   | DIALUP        | Accessed as a dialup user to the server.         |
5329	   | BATCH         | Accessed from a batch job.                       |
5330	   | ANONYMOUS     | Accessed without any authentication.             |
5331	   | AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS)   |
5332	   | SERVICE       | Access from a system service.                    |
5333	   +---------------+--------------------------------------------------+

5335	                                  Table 7

5337	   To avoid conflict, these special identifiers are distinguish by an
5338	   appended "@" and should appear in the form "xxxx@" (note: no domain
5339	   name after the "@").  For example: ANONYMOUS@.

5341	6.2.1.5.1.  Discussion of EVERYONE@

5343	   It is important to note that "EVERYONE@" is not equivalent to the
5344	   UNIX "other" entity.  This is because, by definition, UNIX "other"
5345	   does not include the owner or owning group of a file.  "EVERYONE@"
5346	   means literally everyone, including the owner or owning group.

5348	6.2.2.  dacl and sacl Attributes

5350	   The dacl and sacl attributes are like the acl attribute, but dacl and
5351	   sacl each allow only certain types of ACEs.  The dacl attribute
5352	   allows just ALLOW and DENY ACEs.  The sacl attribute allows just
5353	   AUDIT and ALARM ACEs.  The dacl and sacl attributes also have
5354	   improved support for automatic inheritance (see Section 6.4.3.2).
5355	   The separation of ACE types and inheritance support make dacl and
5356	   sacl a better choice (over acl) for clients when setting ACEs on a
5357	   file.

5359	6.2.3.  mode Attribute

5361	   The NFS version 4 mode attribute is based on the UNIX mode bits.  The
5362	   following bits are defined:

5364	           const MODE4_SUID = 0x800;  /* set user id on execution */
5365	           const MODE4_SGID = 0x400;  /* set group id on execution */
5366	           const MODE4_SVTX = 0x200;  /* save text even after use */
5367	           const MODE4_RUSR = 0x100;  /* read permission: owner */
5368	           const MODE4_WUSR = 0x080;  /* write permission: owner */
5369	           const MODE4_XUSR = 0x040;  /* execute permission: owner */
5370	           const MODE4_RGRP = 0x020;  /* read permission: group */
5371	           const MODE4_WGRP = 0x010;  /* write permission: group */
5372	           const MODE4_XGRP = 0x008;  /* execute permission: group */
5373	           const MODE4_ROTH = 0x004;  /* read permission: other */
5374	           const MODE4_WOTH = 0x002;  /* write permission: other */
5375	           const MODE4_XOTH = 0x001;  /* execute permission: other */

5377	   Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal
5378	   identified in the owner attribute.  Bits MODE4_RGRP, MODE4_WGRP, and
5379	   MODE4_XGRP apply to principals identified in the owner_group
5380	   attribute but who are not identified in the owner attribute.  Bits
5381	   MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does
5382	   not match that in the owner attribute, and does not have a group
5383	   matching that of the owner_group attribute.

5385	   Bits within the mode other than those specified above are not defined
5386	   by this protocol.  A server MUST NOT return bits other than those
5387	   defined above in a GETATTR or READDIR operation, and it MUST return
5388	   NFS4ERR_INVAL if bits other than those defined above are set in a
5389	   SETATTR, CREATE, or OPEN operation.

5391	6.2.4.  mode_set_masked Attribute

5393	   The mode_set_masked attribute is a write-only attribute that allows
5394	   individual bits in the mode attribute to be set or reset, without
5395	   changing others.  It allows, for example, the bits MODE4_SUID,
5396	   MODE4_SGID, and MODE4_SVTX to be modified while leaving unmodified
5397	   any of the nine low-order mode bits devoted to permissions.

5399	   The mode_set_masked attribute consists of two words each in the form
5400	   of a mode4.  The first consists of the value to be applied to the
5401	   current mode value and the second is a mask.  Only bits set to one in
5402	   the mask word are changed (set or reset) in the file's mode.  All
5403	   other bits in the mode remain unchanged.  Bits in the first word that
5404	   correspond to bits which are zero in the mask are ignored, except
5405	   that undefined bits are checked for validity and can result in
5406	   NFSERR_INVAL as described below.

5408	   The mode_set_masked attribute is only valid in a SETATTR operation.
5409	   If it is used in a CREATE or OPEN operation, the server MUST return
5410	   NFS4ERR_INVAL.

5412	   Bits not defined as valid in the mode attribute are not valid in
5413	   either word of the mode_set_masked attribute.  The server MUST return
5414	   NFS4ERR_INVAL if any of those are on in a SETATTR.  If the mode and
5415	   mode_set_masked attributes are both specified in the same SETATTR,
5416	   the server MUST also return NFS4ERR_INVAL.

5418	6.3.  Common Methods

5420	   The requirements in this section will be referred to in future
5421	   sections, especially Section 6.4.

5423	6.3.1.  Interpreting an ACL

5425	6.3.1.1.  Server Considerations

5427	   The server uses the algorithm described in Section 6.2.1 to determine
5428	   whether an ACL allows access to an object.  However, the ACL may not
5429	   be the sole determiner of access.  For example:

5431	   o  In the case of a file system exported as read-only, the server may
5432	      deny write permissions even though an object's ACL grants it.

5434	   o  Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL
5435	      permissions in order to prevent the owner from getting into the
5436	      situation where they can't ever modify the ACL.

5438	   o  All servers will allow a user the ability to read the data of the
5439	      file when only the execute permission is granted (i.e.  If the ACL
5440	      denies the user the ACE4_READ_DATA access and allows the user
5441	      ACE4_EXECUTE, the server will allow the user to read the data of
5442	      the file).

5444	   o  Many servers have the notion of owner-override in which the owner
5445	      of the object is allowed to override accesses that are denied by
5446	      the ACL.  This may be helpful, for example, to allow users
5447	      continued access to open files on which the permissions have
5448	      changed.

5450	6.3.1.2.  Client Considerations

5452	   Clients SHOULD NOT do their own access checks based on their
5453	   interpretation the ACL, but rather use the OPEN and ACCESS operations
5454	   to do access checks.  This allows the client to act on the results of
5455	   having the server determine whether or not access should be granted
5456	   based on its interpretation of the ACL.

5458	   Clients must be aware of situations in which an object's ACL will
5459	   define a certain access even though the server will not enforce it.
5460	   In general, but especially in these situations, the client needs to
5461	   do its part in the enforcement of access as defined by the ACL.  To
5462	   do this, the client MAY issue the appropriate ACCESS operation prior
5463	   to servicing the request of the user or application in order to
5464	   determine whether the user or application should be granted the
5465	   access requested.  For examples in which the ACL may define accesses
5466	   that the server doesn't enforce see Section 6.3.1.1.

5468	6.3.2.  Computing a Mode Attribute from an ACL

5470	   The following method can be used to calculate the MODE4_R*, MODE4_W*
5471	   and MODE4_X* bits of a mode attribute, based upon an ACL.

5473	   1.  To determine MODE4_ROTH, MODE4_WOTH, and MODE4_XOTH:

5475	       1.  If the special identifier EVERYONE@ is granted
5476	           ACE4_READ_DATA, then the bit MODE4_ROTH SHOULD be set.
5477	           Otherwise, MODE4_ROTH SHOULD NOT be set.

5479	       2.  If the special identifier EVERYONE@ is granted
5480	           ACE4_WRITE_DATA or ACE4_APPEND_DATA, then the bit MODE4_WOTH
5481	           SHOULD be set.  Otherwise, MODE4_WOTH SHOULD NOT be set.

5483	       3.  If the special identifier EVERYONE@ is granted ACE4_EXECUTE,
5484	           then the bit MODE4_XOTH SHOULD be set.  Otherwise, MODE4_XOTH
5485	           SHOULD NOT be set.

5487	   2.  To determine MODE4_RGRP, MODE4_WGRP, and MODE4_XGRP, note that
5488	       the EVERYONE@ special identifier SHOULD be taken into account.
5489	       In other words, when determining if the GROUP@ special identifier
5490	       is granted a permission, ACEs with the identifier EVERYONE@
5491	       should take effect just as ACEs with the special identifier
5492	       GROUP@ would.

5494	       1.  If the special identifier GROUP@ is granted ACE4_READ_DATA,
5495	           then the bit MODE4_RGRP SHOULD be set.  Otherwise, MODE4_RGRP
5496	           SHOULD NOT be set.

5498	       2.  If the special identifier GROUP@ is granted ACE4_WRITE_DATA
5499	           or ACE4_APPEND_DATA, then the bit MODE4_WGRP SHOULD be set.
5500	           Otherwise, MODE4_WGRP SHOULD NOT be set.

5502	       3.  If the special identifier GROUP@ is granted ACE4_EXECUTE,
5503	           then the bit MODE4_XGRP SHOULD be set.  Otherwise, MODE4_XGRP
5504	           SHOULD NOT be set.

5506	   3.  To determine MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR, note that
5507	       the EVERYONE@ special identifier SHOULD be taken into account.
5508	       In other words, when determining if the OWNER@ special identifier
5509	       is granted a permission, ACEs with the identifier EVERYONE@
5510	       should take effect just as ACEs with the special identifer OWNER@
5511	       would.

5513	       1.  If the special identifier OWNER@ is granted ACE4_READ_DATA,
5514	           then the bit MODE4_RUSR SHOULD be set.  Otherwise, MODE4_RUSR
5515	           SHOULD NOT be set.

5517	       2.  If the special identifier OWNER@ is granted ACE4_WRITE_DATA
5518	           or ACE4_APPEND_DATA, then the bit MODE4_WUSR SHOULD be set.
5519	           Otherwise, MODE4_WUSR SHOULD NOT be set.

5521	       3.  If the special identifier OWNER@ is granted ACE4_EXECUTE,
5522	           then the bit MODE4_XUSR SHOULD be set.  Otherwise, MODE4_XUSR
5523	           SHOULD NOT be set.

5525	6.3.2.1.  Discussion

5527	   The nine low-order mode bits (MODE4_R*, MODE4_W*, MODE4_X*)
5528	   correspond to ACE4_READ_DATA, ACE4_WRITE_DATA/ACE4_APPEND_DATA, and
5529	   ACE4_EXECUTE for OWNER@, GROUP@, and EVERYONE@.  On some
5530	   implementations, mode bits may represent a superset of these
5531	   permissions, e.g. if a specific user is granted ACE4_WRITE_DATA, then
5532	   MODE4_WGRP will be set, even though the file's owner_group is not
5533	   granted ACE4_WRITE_DATA.

5535	   Server implementations are discouraged from doing this, as experience
5536	   has shown that this is confusing and annoying to end users.  The
5537	   specifications above also discourage this practice to enforce the
5538	   semantic that setting the mode attribute effectively specifies read,
5539	   write, and execute for owner, group, and other.

5541	6.4.  Requirements

5543	   The server that supports both mode and ACL must take care to
5544	   synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the
5545	   ACEs which have respective who fields of "OWNER@", "GROUP@", and
5546	   "EVERYONE@" so that the client can see semantically equivalent access
5547	   permissions exist whether the client asks for owner, owner_group and
5548	   mode attributes, or for just the ACL.

5550	   In this section, much is made of the methods in Section 6.3.2.  Many
5551	   requirements refer to this section.  But note that the methods have
5552	   behaviors specified with "SHOULD".  This is intentional, to avoid
5553	   invalidating existing implementations that compute the mode according
5554	   to the withdrawn POSIX ACL draft (1003.1e draft 17), rather than by
5555	   actual permissions on owner, group, and other.

5557	6.4.1.  Setting the mode and/or ACL Attributes

5559	6.4.1.1.  Setting mode and not ACL

5561	   When any mode permission bits are subject to change, either because
5562	   the mode attribute was set or because the mode_set_masked attribute
5563	   was set and the mask included one or more bits from the low-order
5564	   nine mode bits that control permissions, and the ACL attribute is not
5565	   explicitly set, the ACL attribute must be modified in accordance with
5566	   the updated value of the permissions bits within the mode.  This must
5567	   happen even if the value of the permission bits within the mode is
5568	   the same after the mode is set as before.

5570	   In cases in which the permissions bits are subject to change, the ACL
5571	   attribute MUST be modified such that the mode computed via the method
5572	   in Section 6.3.2 yields the low-order nine bits (MODE4_R*, MODE4_W*,
5573	   MODE4_X*) of the mode attribute as modified by the attribute change.
5574	   The ACL SHOULD also be modified such that:

5576	   1.  If MODE4_RGRP is not set, entities explicitly listed in the ACL
5577	       other than OWNER@ and EVERYONE@ SHOULD NOT be granted
5578	       ACE4_READ_DATA.

5580	   2.  If MODE4_WGRP is not set, entities explicitly listed in the ACL
5581	       other than OWNER@ and EVERYONE@ SHOULD NOT be granted
5582	       ACE4_WRITE_DATA or ACE4_APPEND_DATA.

5584	   3.  If MODE4_XGRP is not set, entities explicitly listed in the ACL
5585	       other than OWNER@ and EVERYONE@ SHOULD NOT be granted
5586	       ACE4_EXECUTE.

5588	   Access mask bits other those listed above, appearing in ALLOW ACEs,
5589	   MAY also be disabled.

5591	   Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do not affect
5592	   the permissions of the ACL itself, nor do ACEs of the type AUDIT and
5593	   ALARM.  As such, it is desirable to leave these ACEs unmodified when
5594	   modifying the ACL attribute.

5596	   Also note that the requirement may be met by discarding the ACL, in
5597	   favor of an ACL that represents the mode and only the mode.  This is
5598	   permitted, but it is preferable for a server to preserve as much of
5599	   the ACL as possible without violating the above requirements.
5600	   Discarding the ACL makes it effectively impossible for a file created
5601	   with a mode attribute to inherit an ACL (see Section 6.4.3).

5603	6.4.1.2.  Setting ACL and not mode

5605	   When setting an ACL attribute and not setting the mode or
5606	   mode_set_masked attributes, the permission bits of the mode need to
5607	   be derived from the ACL.  In this case, the ACL attribute SHOULD be
5608	   set as given.  The nine low-order bits of the mode attribute
5609	   (MODE4_R*, MODE4_W*, MODE4_X*) MUST be modified to match the result
5610	   of the method Section 6.3.2.  The three high-order bits of the mode
5611	   (MODE4_SUID, MODE4_SGID, MODE4_SVTX) SHOULD remain unchanged.

5613	6.4.1.3.  Setting both ACL and mode

5615	   When setting both the mode (includes use of either the mode attribute
5616	   or the mode_set_masked attribute) and the ACL attribute in the same
5617	   operation, the attributes MUST be applied in this order: mode (or
5618	   mode_set_masked), then ACL.  The mode-related attribute is set as
5619	   given, then the ACL attribute is set as given, possibly changing the
5620	   final mode, as described above in Section 6.4.1.2.

5622	6.4.2.  Retrieving the mode and/or ACL Attributes

5624	   This section applies only to servers that support both the mode and
5625	   the ACL attribute.

5627	   Some server implementations may have a concept of "objects without
5628	   ACLs", meaning that all permissions are granted and denied according
5629	   to the mode attribute, and that no ACL attribute is stored for that
5630	   object.  If an ACL attribute is requested of such a server, the
5631	   server SHOULD return an ACL that does not conflict with the mode;
5632	   that is to say, the ACL returned SHOULD represent the nine low-order
5633	   bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as
5634	   described in Section 6.3.2.

5636	   For other server implementations, the ACL attribute is always present
5637	   for every object.  Such servers SHOULD store at least the three high-
5638	   order bits of the mode attribute (MODE4_SUID, MODE4_SGID,
5639	   MODE4_SVTX).  The server SHOULD return a mode attribute if one is
5640	   requested, and the low-order nine bits of the mode (MODE4_R*,
5641	   MODE4_W*, MODE4_X*) MUST match the result of applying the method in
5642	   Section 6.3.2 to the ACL attribute.

5644	6.4.3.  Creating New Objects

5646	   If a server supports the ACL attribute, it may use the ACL attribute
5647	   on the parent directory to compute an initial ACL attribute for a
5648	   newly created object.  This will be referred to as the inherited ACL
5649	   within this section.  The act of adding one or more ACEs to the
5650	   inherited ACL that are based upon ACEs in the parent directory's ACL
5651	   will be referred to as inheriting an ACE within this section.

5653	   Implementors should standardize on what the behavior of CREATE and
5654	   OPEN must be depending on the presence or absence of the mode and ACL
5655	   attributes.

5657	   1.  If just mode is given:

5659	       In this case, inheritance SHOULD take place, but the mode MUST be
5660	       applied to the inherited ACL as described in Section 6.4.1.1,
5661	       thereby modifying the ACL.

5663	   2.  If just ACL is given:

5665	       In this case, inheritance SHOULD NOT take place, and the ACL as
5666	       defined in the CREATE or OPEN will be set without modification,
5667	       and the mode modified as in Section 6.4.1.2

5669	   3.  If both mode and ACL are given:

5671	       In this case, inheritance SHOULD NOT take place, and both
5672	       attributes will be set as described in Section 6.4.1.3.

5674	   4.  If neither mode nor ACL are given:

5676	       In the case where an object is being created without any initial
5677	       attributes at all, e.g. an OPEN operation with an opentype4 of
5678	       OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD
5679	       NOT take place.  Instead, the server SHOULD set permissions to
5680	       deny all access to the newly created object.  It is expected that
5681	       the appropriate client will set the desired attributes in a
5682	       subsequent SETATTR operation, and the server SHOULD allow that
5683	       operation to succeed, regardless of what permissions the object
5684	       is created with.  For example, an empty ACL denies all
5685	       permissions, but the server should allow the owner's SETATTR to
5686	       succeed even though WRITE_ACL is implicitly denied.

5688	       In other cases, inheritance SHOULD take place, and no
5689	       modifications to the ACL will happen.  The mode attribute, if
5690	       supported, MUST be as computed in Section 6.3.2, with the
5691	       MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear.  It is worth
5692	       noting that if no inheritable ACEs exist on the parent directory,
5693	       the file will be created with an empty ACL, thus granting no
5694	       access.

5696	6.4.3.1.  The Inherited ACL

5698	   If the object being created is not a directory, the inherited ACL
5699	   SHOULD NOT inherit ACEs from the parent directory ACL unless the
5700	   ACE4_FILE_INHERIT_FLAG is set.

5702	   If the object being created is a directory, the inherited ACL should
5703	   inherit all inheritable ACEs from the parent directory, those that
5704	   have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set.
5705	   If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but
5706	   ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly
5707	   created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to
5708	   prevent the directory from being affected by ACEs meant for non-
5709	   directories.

5711	   If when a new directory is created and it inherits ACEs from its
5712	   parent, for each inheritable ACE which affects the directory's
5713	   permissions, a server MAY create two ACEs on the directory being
5714	   created; one effective and one which is only inheritable (i.e. has
5715	   ACE4_INHERIT_ONLY_ACE flag set).  This gives the user and the server,
5716	   in the cases which it must mask certain permissions upon creation,
5717	   the ability to modify the effective permissions without modifying the
5718	   ACE which is to be inherited to the new directory's children.

5720	   When a newly created object is created with attributes, and those
5721	   attributes contain an ACL attribute and/or a mode attribute, the
5722	   server MUST apply those attributes to the newly created object, as
5723	   described in Section 6.4.1.

5725	6.4.3.2.  Automatic Inheritance

5727	   Unlike the acl attribute, the sacl and dacl (see Section 6.2.2)
5728	   attributes both have an additional flag field.  The flag field
5729	   applies to the entire sacl or dacl; three flag values are defined

5731	              const ACL4_AUTO_INHERIT                 = 0x00000001;
5732	              const ACL4_PROTECTED                    = 0x00000002;
5733	              const ACL4_DEFAULTED                    = 0x00000004;

5735	   and all other bits must be cleared.  The ACE4_INHERITED_ACE flag may
5736	   be set in the ACEs of the sacl or dacl (whereas it must always be
5737	   cleared in the acl).

5739	   Together these features allow a server to support automatic
5740	   inheritance, which we now explain in more detail.

5742	   Inheritable ACEs are normally inherited by child objects only at the
5743	   time that the child objects are created; later modifications to
5744	   inheritable ACEs do not result in modifications to inherited ACEs on
5745	   descendents.

5747	   However, the dacl and sacl provide an optional mechanism which allows
5748	   a client application to propagate changes to inheritable ACEs to an
5749	   entire directory hierarchy.

5751	   A server that supports this performs inheritance at object creation
5752	   time in the normal way, but also sets the ACE4_INHERITED_ACE flag on
5753	   any inherited ACEs as they are added to the new object.

5755	   A client application such as an ACL editor may then propagate changes
5756	   to inheritable ACEs on a directory by recursively traversing that
5757	   directory's descendants and modifying each ACL encountered to remove
5758	   any ACEs with the ACE4_INHERITED_ACE flag and to replace them by the
5759	   new inheritable ACEs (also with the ACE4_INHERITED_ACE flag set).  It
5760	   uses the existing ACE inheritance flags in the obvious way to decide
5761	   which ACEs to propagate.  (Note that it may encounter further
5762	   inheritable ACEs when descending the directory hierarchy, and that
5763	   those will also need to be taken into account when propagating
5764	   inheritable ACEs to further descendants.)

5766	   The reach of this propagation may be limited in two ways: first,
5767	   automatic inheritance is not performed from any directory ACL that
5768	   has the ACL4_AUTO_INHERIT flag cleared; and second, automatic
5769	   inheritance stops wherever an ACL with the ACL4_PROTECTED flag is
5770	   set, preventing modification of that ACL and also (if the ACL is set
5771	   on a directory) of the ACL on any of the object's descendants.

5773	   This propagation is performed independently for the sacl and the dacl
5774	   attributes; thus the ACL4_AUTO_INHERIT and ACL4_PROTECTED flags may
5775	   be independently set for the sacl and the dacl, and propagation of
5776	   one type of acl may continue down a hierarchy even where propagation
5777	   of the other acl has stopped.

5779	   New objects should be created with a dacl and a sacl that both have
5780	   the ACL4_PROTECTED flag cleared and the ACL4_AUTO_INHERIT flag set to
5781	   the same value as that on, respectively, the sacl or dacl of the
5782	   parent object.

5784	   Both the dacl and sacl attributes are RECOMMENDED, and a server may
5785	   support one without supporting the other.

5787	   A server that supports both the old acl attribute and one or both of
5788	   the new dacl or sacl attributes must do so in such a way as to keep
5789	   all three attributes consistent with each other.  Thus the ACEs
5790	   reported in the acl attribute should be the union of the ACEs
5791	   reported in the dacl and sacl attributes, except that the
5792	   ACE4_INHERITED_ACE flag must be cleared from the ACEs in the acl.
5793	   And of course a client that queries only the acl will be unable to
5794	   determine the values of the sacl or dacl flag fields.

5796	   When a client performs a SETATTR for the acl attribute, the server
5797	   SHOULD set the ACL4_PROTECTED flag to true on both the sacl and the
5798	   dacl.  By using the acl attribute, as opposed to the dacl or sacl
5799	   attributes, the client signals that it may not understand automatic
5800	   inheritance, and thus cannot be trusted to set an ACL for which
5801	   automatic inheritance would make sense.

5803	   When a client application queries an ACL, modifies it, and sets it
5804	   again, it should leave any ACEs marked with ACE4_INHERITED_ACE
5805	   unchanged, in their original order, at the end of the ACL.  If the
5806	   application is unable to do this, it should set the ACL4_PROTECTED
5807	   flag.  This behavior is not enforced by servers, but violations of
5808	   this rule may lead to unexpected results when applications perform
5809	   automatic inheritance.

5811	   If a server also supports the mode attribute, it SHOULD set the mode
5812	   in such a way that leaves inherited ACEs unchanged, in their original
5813	   order, at the end of the ACL.  If it is unable to do so, it SHOULD
5814	   set the ACL4_PROTECTED flag on the file's dacl.

5816	   Finally, in the case where the request that creates a new file or
5817	   directory does not also set permissions for that file or directory,
5818	   and there are also no ACEs to inherit from the parent's directory,
5819	   then the server's choice of ACL for the new object is implementation-
5820	   dependent.  In this case, the server SHOULD set the ACL4_DEFAULTED
5821	   flag on the ACL it chooses for the new object.  An application
5822	   performing automatic inheritance takes the ACL4_DEFAULTED flag as a
5823	   sign that the ACL should be completely replaced by one generated
5824	   using the automatic inheritance rules.

5826	7.  Single-server Name Space

5828	   This chapter describes the NFSv4 single-server name space.  Single-
5829	   server namespaces may be presented directly to clients, or they may
5830	   be used as a basis to form larger multi-server namespaces (e.g. site-
5831	   wide or organization-wide) to be presented to clients, as described
5832	   in Section 10.

5834	7.1.  Server Exports

5836	   On a UNIX server, the name space describes all the files reachable by
5837	   pathnames under the root directory or "/".  On a Windows NT server
5838	   the name space constitutes all the files on disks named by mapped
5839	   disk letters.  NFS server administrators rarely make the entire
5840	   server's file system name space available to NFS clients.  More often
5841	   portions of the name space are made available via an "export"
5842	   feature.  In previous versions of the NFS protocol, the root
5843	   filehandle for each export is obtained through the MOUNT protocol;
5844	   the client sends a string that identifies the export of name space
5845	   and the server returns the root filehandle for it.  The MOUNT
5846	   protocol supports an EXPORTS procedure that will enumerate the
5847	   server's exports.

5849	7.2.  Browsing Exports

5851	   The NFS version 4 protocol provides a root filehandle that clients
5852	   can use to obtain filehandles for the exports of a particular server,
5853	   via a series of LOOKUP operations within a COMPOUND, to traverse a
5854	   path.  A common user experience is to use a graphical user interface
5855	   (perhaps a file "Open" dialog window) to find a file via progressive
5856	   browsing through a directory tree.  The client must be able to move
5857	   from one export to another export via single-component, progressive
5858	   LOOKUP operations.

5860	   This style of browsing is not well supported by the NFS version 2 and
5861	   3 protocols.  The client expects all LOOKUP operations to remain
5862	   within a single server file system.  For example, the device
5863	   attribute will not change.  This prevents a client from taking name
5864	   space paths that span exports.

5866	   An automounter on the client can obtain a snapshot of the server's
5867	   name space using the EXPORTS procedure of the MOUNT protocol.  If it
5868	   understands the server's pathname syntax, it can create an image of
5869	   the server's name space on the client.  The parts of the name space
5870	   that are not exported by the server are filled in with a "pseudo file
5871	   system" that allows the user to browse from one mounted file system
5872	   to another.  There is a drawback to this representation of the
5873	   server's name space on the client: it is static.  If the server
5874	   administrator adds a new export the client will be unaware of it.

5876	7.3.  Server Pseudo File System

5878	   NFS version 4 servers avoid this name space inconsistency by
5879	   presenting all the exports for a given server within the framework of
5880	   a single namespace, for that server.  An NFS version 4 client uses
5881	   LOOKUP and READDIR operations to browse seamlessly from one export to
5882	   another.  Portions of the server name space that are not exported are
5883	   bridged via a "pseudo file system" that provides a view of exported
5884	   directories only.  A pseudo file system has a unique fsid and behaves
5885	   like a normal, read only file system.

5887	   Based on the construction of the server's name space, it is possible
5888	   that multiple pseudo file systems may exist.  For example,

5890	           /a              pseudo file system
5891	           /a/b            real file system
5892	           /a/b/c          pseudo file system
5893	           /a/b/c/d        real file system

5895	   Each of the pseudo file systems are considered separate entities and
5896	   therefore will have its own unique fsid.

5898	7.4.  Multiple Roots

5900	   The DOS and Windows operating environments are sometimes described as
5901	   having "multiple roots".  File Systems are commonly represented as
5902	   disk letters.  MacOS represents file systems as top level names.  NFS
5903	   version 4 servers for these platforms can construct a pseudo file
5904	   system above these root names so that disk letters or volume names
5905	   are simply directory names in the pseudo root.

5907	7.5.  Filehandle Volatility

5909	   The nature of the server's pseudo file system is that it is a logical
5910	   representation of file system(s) available from the server.
5911	   Therefore, the pseudo file system is most likely constructed
5912	   dynamically when the server is first instantiated.  It is expected
5913	   that the pseudo file system may not have an on disk counterpart from
5914	   which persistent filehandles could be constructed.  Even though it is
5915	   preferable that the server provide persistent filehandles for the
5916	   pseudo file system, the NFS client should expect that pseudo file
5917	   system filehandles are volatile.  This can be confirmed by checking
5918	   the associated "fh_expire_type" attribute for those filehandles in
5919	   question.  If the filehandles are volatile, the NFS client must be
5920	   prepared to recover a filehandle value (e.g. with a series of LOOKUP
5921	   operations) when receiving an error of NFS4ERR_FHEXPIRED.

5923	7.6.  Exported Root

5925	   If the server's root file system is exported, one might conclude that
5926	   a pseudo-file system is unneeded.  This not necessarily so.  Assume
5927	   the following file systems on a server:

5929	           /       disk1  (exported)
5930	           /a      disk2  (not exported)
5931	           /a/b    disk3  (exported)

5933	   Because disk2 is not exported, disk3 cannot be reached with simple
5934	   LOOKUPs.  The server must bridge the gap with a pseudo-file system.

5936	7.7.  Mount Point Crossing

5938	   The server file system environment may be constructed in such a way
5939	   that one file system contains a directory which is 'covered' or
5940	   mounted upon by a second file system.  For example:

5942	           /a/b            (file system 1)
5943	           /a/b/c/d        (file system 2)

5945	   The pseudo file system for this server may be constructed to look
5946	   like:

5948	           /               (place holder/not exported)
5949	           /a/b            (file system 1)
5950	           /a/b/c/d        (file system 2)

5952	   It is the server's responsibility to present the pseudo file system
5953	   that is complete to the client.  If the client sends a lookup request
5954	   for the path "/a/b/c/d", the server's response is the filehandle of
5955	   the file system "/a/b/c/d".  In previous versions of the NFS
5956	   protocol, the server would respond with the filehandle of directory
5957	   "/a/b/c/d" within the file system "/a/b".

5959	   The NFS client will be able to determine if it crosses a server mount
5960	   point by a change in the value of the "fsid" attribute.

5962	7.8.  Security Policy and Name Space Presentation

5964	   The application of the server's security policy needs to be carefully
5965	   considered by the implementor.  One may choose to limit the
5966	   viewability of portions of the pseudo file system based on the
5967	   server's perception of the client's ability to authenticate itself
5968	   properly.  However, with the support of multiple security mechanisms
5969	   and the ability to negotiate the appropriate use of these mechanisms,
5970	   the server is unable to properly determine if a client will be able
5971	   to authenticate itself.  If, based on its policies, the server
5972	   chooses to limit the contents of the pseudo file system, the server
5973	   may effectively hide file systems from a client that may otherwise
5974	   have legitimate access.

5976	   As suggested practice, the server should apply the security policy of
5977	   a shared resource in the server's namespace to the components of the
5978	   resource's ancestors.  For example:

5980	           /
5981	           /a/b
5982	           /a/b/c

5984	   The /a/b/c directory is a real file system and is the shared
5985	   resource.  The security policy for /a/b/c is Kerberos with integrity.
5986	   The server should apply the same security policy to /, /a, and /a/b.
5987	   This allows for the extension of the protection of the server's
5988	   namespace to the ancestors of the real shared resource.

5990	   For the case of the use of multiple, disjoint security mechanisms in
5991	   the server's resources, the security for a particular object in the
5992	   server's namespace should be the union of all security mechanisms of
5993	   all direct descendants.

5995	8.  File Locking and Share Reservations

5997	   Integrating locking into the NFS protocol necessarily causes it to be
5998	   stateful.  With the inclusion of such features as share reservations,
5999	   file and directory delegations, recallable layouts, and support for
6000	   mandatory record locking the protocol becomes substantially more
6001	   dependent on state than the traditional combination of NFS and NLM
6002	   [XNFS].  There are three components to making this state manageable:

6004	   o  Clear division between client and server

6006	   o  Ability to reliably detect inconsistency in state between client
6007	      and server

6009	   o  Simple and robust recovery mechanisms

6011	   In this model, the server owns the state information.  The client
6012	   requests changes in locks and the server responds with the changes
6013	   made.  Non-client-initiated changes in locking state are infrequent
6014	   and the client receives prompt notification of them and can adjust
6015	   its view of the locking state to reflect the server's changes.

6017	   To support Win32 share reservations it is necessary to provide
6018	   operations which atomically OPEN or CREATE files.  Having a separate
6019	   share/unshare operation would not allow correct implementation of the
6020	   Win32 OpenFile API.  In order to correctly implement share semantics,
6021	   the previous NFS protocol mechanisms used when a file is opened or
6022	   created (LOOKUP, CREATE, ACCESS) need to be replaced.  The NFS
6023	   version 4.1 protocol defines OPEN operation which looks up or creates
6024	   a file and establishes locking state on the server.

6026	8.1.  Locking

6028	   It is assumed that manipulating a lock is rare when compared to READ
6029	   and WRITE operations.  It is also assumed that crashes and network
6030	   partitions are relatively rare.  Therefore it is important that the
6031	   READ and WRITE operations have a lightweight mechanism to indicate if
6032	   they possess a held lock.  A lock request contains the heavyweight
6033	   information required to establish a lock and uniquely define the lock
6034	   owner.

6036	   The following sections describe the transition from the heavyweight
6037	   information to the eventual lightweight stateid used for most client
6038	   and server locking interactions.

6040	8.1.1.  Client and Session ID

6042	   A client must establish a client ID (see Section 2.4) and then one or
6043	   more sessionids (see Section 2.10) before performing any operations
6044	   to open, lock, or delegate a file object.  The sessionid services as
6045	   a shorthand referral to an NFSv4.1 client.

6047	8.1.2.  State-owner Definition

6049	   When opening a file or requesting a record lock, the client must
6050	   specify an identifier which represents the owner of the requested
6051	   lock.  This identifier is in the form of a state-owner, represented
6052	   in the protocol by a state_owner4, a variable-length opaque array
6053	   which, when concatenated with the current client ID uniquely defines
6054	   the owner of lock managed by the client.  This may be a thread id,
6055	   process id, or other unique value.

6057	   Owners of opens and owners of record locks are separate entities and
6058	   remain separate even if the same opaque arrays are used to designate
6059	   owners of each.  The protocol distinguishes between open-owners
6060	   (represented by open_owner4 structures) and lock-owners (represented
6061	   by lock_owner4 structures).

6063	   Each open is associated with a specific open-owner while each record
6064	   lock is associated with a lock-owner and an open-owner, the latter
6065	   being the open-owner associated with the open file under which the
6066	   LOCK operation was done.  Delegations and layouts, on the other hand,
6067	   are not associated with a specific owner but are associated the
6068	   client as a whole.

6070	8.1.3.  Stateid Definition

6072	   When the server grants a lock of any type (including opens, record
6073	   locks, delegations, and layouts) it responds with a unique stateid,
6074	   that represents a set of locks (often a single lock) for the same
6075	   file, of the same type, and sharing the same ownership
6076	   characteristics.  Thus opens of the same file by different open-
6077	   owners each have an identifying stateid.  Similarly, each set of
6078	   record locks on a file owned by a specific lock-owner and gotten via
6079	   an open for a specific open-owner, has its own identifying stateid.
6080	   Delegations and layouts also have associated stateids by which they
6081	   may be referenced.  The stateid is used as a shorthand reference to a
6082	   lock or set of locks and given a stateid the client can determine the
6083	   associated state-owner or state-owners (in the case of an open-owner/
6084	   lock-owner pair) and the associated filehandle.  When stateids are
6085	   used the current filehandle must be the one associated with that
6086	   stateid.

6088	   The server may assign stateids independently for different clients
6089	   and a stateid with the same bit pattern for one client may designate
6090	   an entirely different set of locks for a different client.  The
6091	   stateid is always interpreted with respect to the client ID
6092	   associated with the current session.  Stateids apply to all sessions
6093	   associated with the given client ID and the client may use a stateid
6094	   obtained from one session on another session associated with the same
6095	   client ID.

6097	8.1.3.1.  Stateid Structure

6099	   Stateids are divided into two fields, a 96-bit "other" field
6100	   identifying the specific set of locks and a 32-bit "seqid" sequence
6101	   value.  Except in the case of special stateids, to be discussed
6102	   below, the purpose of the sequence value within NFSv4.1 is to allow
6103	   the server to communicate to the client the order in which operations
6104	   that modified locking state associated with a stateid have been
6105	   processed.

6107	   In the case of stateids associated with opens, i.e. the stateids
6108	   returned by OPEN (the state for the open, rather than that for the
6109	   delegation), OPEN_DOWNGRADE, or CLOSE, the server MUST provide an
6110	   "seqid" value starting at one for the first use of a given "other"
6111	   value and incremented by one with each subsequent operation returning
6112	   a stateid.

6114	   In the case of other sorts of stateids (i.e. stateids associated with
6115	   record locks and delegations), the server MAY provide an incrementing
6116	   sequence value on successive stateids returned with same identifying
6117	   field, or it may return the value zero.  If it does return a non-zero
6118	   "seqid" value it MUST start at one and be incremented by one with
6119	   each subsequent operation returning a stateid with same "other"
6120	   value, just as is done with open state.

6122	   The client when using a stateid as a parameter to an operation, must,
6123	   except in the case of a special stateid, set the sequence value to
6124	   zero.  If the value is non-zero, the server MUST return the error
6125	   NFS4ERR_BAD_STATEID.

6127	8.1.3.2.  Special Stateids

6129	   Stateid values whose "other" field is either all zeros or all ones
6130	   are reserved.  They may not be assigned by the server but have
6131	   special meanings defined by the protocol.  The particular meaning
6132	   depends on whether the "other" field is all zeros or all ones and the
6133	   specific value of the "seqid" field.

6135	   The following combinations of "other" and "seqid" are defined in
6136	   NFSv4.1:

6138	   o  When "other" and "seqid" are both zero, the stateid is treated as
6139	      a special anonymous stateid, which can be used in READ, WRITE, and
6140	      SETATTR requests to indicate the absence of any open state
6141	      associated with the request.  When an anonymous stateid value is
6142	      used, and an existing open denies the form of access requested,
6143	      then access will be denied to the request.

6145	   o  When "other" and "seqid" are both all ones, the stateid is a
6146	      special read bypass stateid.  When this value is used in WRITE or
6147	      SETATTR, it is treated like the anonymous value.  When used in
6148	      READ, the server MAY grant access, even if access would normally
6149	      be denied to READ requests.

6151	   o  When "other" is zero and "seqid" is one, the stateid represents
6152	      the current stateid, which is whatever value is the last stateid
6153	      returned by an operation within the COMPOUND.  In the case of an
6154	      OPEN, the stateid returned for the open file, and not the
6155	      delegation is used.  The stateid passed to the operation in place
6156	      of the special value has its "seqid" value set to zero.  If there
6157	      is no operation in the COMPOUND which has returned a stateid
6158	      value, the server MUST return the error NFS4ERR_BAD_STATEID.

6160	   If a stateid value is used which has all zero or all ones in the
6161	   "other" field, but does not match one of the cases above, the server
6162	   MUST return the error NFS4ERR_BAD_STATEID.

6164	   Special stateids, unlike other stateids are not associated with
6165	   individual client ID's or filehandles and can be used with all valid
6166	   client ID's and filehandles.  In the case of a special stateid
6167	   designating the current current stateid, the current stateid value
6168	   substituted for the special stateid is associated with a particular
6169	   client ID and filehandle.

6171	8.1.3.3.  Stateid Lifetime and Validation

6173	   Stateids must remain valid until either a client reboot or a sever
6174	   reboot or until the client returns all of the locks associated with
6175	   the stateid by means of an operation such as CLOSE or DELEGRETURN.
6176	   If the locks are lost due to revocation the stateid remains a valid
6177	   designation of that revoked state until the client frees it by using
6178	   FREE_STATEID.  Stateids associated with record locks are an
6179	   exception.  They remain valid even if a LOCKU free all remaining
6180	   locks, so long as the open file with which they are associated
6181	   remains open, unless the client does a FREE_STATEID to cause the
6182	   stateid to be freed.

6184	   An "other" value must never be reused for a different purpose (i.e.
6185	   different filehandle, owner, or type of locks) within the context of
6186	   a single client ID.  A server may retain the "other" value for the
6187	   same purpose beyond the point where it may otherwise be freed but if
6188	   it does so, it must maintain "seqid" continuity with previous values,
6189	   in all case in which it is required to return incrementing "seqid"
6190	   values in general.

6192	   One mechanism that may be used to satisfy the requirement that the
6193	   server recognize invalid and out-of-date stateids is for the server
6194	   to divide the "other" field of the stateid into two fields.

6196	   o  An index into a table of locking-state structures.

6198	   o  A generation number which is incremented on each allocation of a
6199	      table entry for a particular use.

6201	   And then store in each table entry,

6203	   o  The current generation number.

6205	   o  The client ID with which the stateid is associated.

6207	   o  The filehandle of the file on which the locks are taken.

6209	   o  An indication of the type of stateid (open, record lock, file
6210	      delegation, directory delegation, layout).

6212	   o  The last "seqid" value returned corresponding to the current
6213	      "other" value.

6215	   With this information, the following procedure would be used to
6216	   validate an incoming stateid and return an appropriate error, when
6217	   necessary:

6219	   o  If the server has restarted resulting in loss of all lessed state
6220	      but the sessionid and clientID are still valid, return
6221	      NFS4ERR_STALE_STATEID.  (If server restart has resulted in an
6222	      invalid client ID or sessionid is invalid, SEQUENCE will return an
6223	      error - not NFS4ERR_STATE_STATEID - and the operation that takes a
6224	      stateid as an argument will never be processed.)

6226	   o  If the "other" field is all zeros or all ones, check that the
6227	      "other" and "seqid" match a defined combination for a special
6228	      stateid and that that stateid can be used in the current context.
6229	      If not, then return NFS4ERR_BAD_STATEID.

6231	   o  If the "seqid" field is not zero, return NFS4ERR_BAD_STATEID.

6233	   o  Otherwise divide the "other" into a table index and an entry
6234	      generation.

6236	   o  If the table index field is outside the range of the associated
6237	      table, return NFS4ERR_BAD_STATEID.

6239	   o  If the selected table entry is of a different generation than that
6240	      specified in the incoming stateid, return NFS4ERR_BAD_STATEID.

6242	   o  If the selected table entry does not match the current file
6243	      handle, return NFS4ERR_BAD_STATEID.

6245	   o  If the client ID in the table entry does not match the client ID
6246	      associated with the current session, return NFS4ERR_BAD_STATEID.

6248	   o  If the stateid type is not valid for the context in which the
6249	      stateid appears, return NFS4ERR_BAD_STATEID.

6251	   o  Otherwise, the stateid is valid and the table entry should contain
6252	      any additional information about the associated set of locks, such
6253	      as open-owner and lock-owner information, as well as information
6254	      on the specific locks, such as open modes and octet ranges.

6256	8.1.4.  Use of the Stateid and Locking

6258	   All READ, WRITE and SETATTR operations contain a stateid.  For the
6259	   purposes of this section, SETATTR operations which change the size
6260	   attribute of a file are treated as if they are writing the area
6261	   between the old and new size (i.e. the range truncated or added to
6262	   the file by means of the SETATTR), even where SETATTR is not
6263	   explicitly mentioned in the text.

6265	   If the state-owner performs a READ or WRITE in a situation in which
6266	   it has established a lock or share reservation on the server (any
6267	   OPEN constitutes a share reservation) the stateid (previously
6268	   returned by the server) must be used to indicate what locks,
6269	   including both record locks and share reservations, are held by the
6270	   state-owner.  If no state is established by the client, either record
6271	   lock or share reservation, a special stateid for anonymous state
6272	   (zero as "other" and "seqid") is used.  Regardless whether a stateid
6273	   for anonymous state or a stateid returned by the server is used, if
6274	   there is a conflicting share reservation or mandatory record lock
6275	   held on the file, the server MUST refuse to service the READ or WRITE
6276	   operation.

6278	   Share reservations are established by OPEN operations and by their
6279	   nature are mandatory in that when the OPEN denies READ or WRITE
6280	   operations, that denial results in such operations being rejected
6281	   with error NFS4ERR_LOCKED.  Record locks may be implemented by the
6282	   server as either mandatory or advisory, or the choice of mandatory or
6283	   advisory behavior may be determined by the server on the basis of the
6284	   file being accessed (for example, some UNIX-based servers support a
6285	   "mandatory lock bit" on the mode attribute such that if set, record
6286	   locks are required on the file before I/O is possible).  When record
6287	   locks are advisory, they only prevent the granting of conflicting
6288	   lock requests and have no effect on READs or WRITEs.  Mandatory
6289	   record locks, however, prevent conflicting I/O operations.  When they
6290	   are attempted, they are rejected with NFS4ERR_LOCKED.  When the
6291	   client gets NFS4ERR_LOCKED on a file it knows it has the proper share
6292	   reservation for, it will need to issue a LOCK request on the region
6293	   of the file that includes the region the I/O was to be performed on,
6294	   with an appropriate locktype (i.e.  READ*_LT for a READ operation,
6295	   WRITE*_LT for a WRITE operation).

6297	   Note that for UNIX environments that support mandatory file locking,
6298	   the distinction between advisory and mandatory locking is subtle.  In
6299	   fact, advisory and mandatory record locks are exactly the same in so
6300	   far as the APIs and requirements on implementation.  If the mandatory
6301	   lock attribute is set on the file, the server checks to see if the
6302	   lock-owner has an appropriate shared (read) or exclusive (write)
6303	   record lock on the region it wishes to read or write to.  If there is
6304	   no appropriate lock, the server checks if there is a conflicting lock
6305	   (which can be done by attempting to acquire the conflicting lock on
6306	   the behalf of the lock-owner, and if successful, release the lock
6307	   after the READ or WRITE is done), and if there is, the server returns
6308	   NFS4ERR_LOCKED.

6310	   For Windows environments, there are no advisory record locks, so the
6311	   server always checks for record locks during I/O requests.

6313	   Thus, the NFS version 4 LOCK operation does not need to distinguish
6314	   between advisory and mandatory record locks.  It is the NFS version 4
6315	   server's processing of the READ and WRITE operations that introduces
6316	   the distinction.

6318	   Every stateid with the exception of special stateid values, whether
6319	   returned by an OPEN-type operation (i.e.  OPEN, OPEN_DOWNGRADE), or
6320	   by a LOCK-type operation (i.e.  LOCK or LOCKU), defines an access
6321	   mode for the file (i.e.  READ, WRITE, or READ-WRITE) as established
6322	   by the original OPEN which caused the allocation of the open stateid
6323	   and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the same
6324	   open-owner/file pair.  Stateids returned by record lock operations
6325	   imply the access mode for the open stateid associated with the lock
6326	   set represented by the stateid.  Delegation stateids have an access
6327	   mode based on the type of delegation.  When a READ, WRITE, or SETATTR
6328	   which specifies the size attribute, is done, the operation is subject
6329	   to checking against the access mode to verify that the operation is
6330	   appropriate given the OPEN with which the operation is associated.

6332	   In the case of WRITE-type operations (i.e.  WRITEs and SETATTRs which
6333	   set size), the server must verify that the access mode allows writing
6334	   and return an NFS4ERR_OPENMODE error if it does not.  In the case, of
6335	   READ, the server may perform the corresponding check on the access
6336	   mode, or it may choose to allow READ on opens for WRITE only, to
6337	   accommodate clients whose write implementation may unavoidably do
6338	   reads (e.g. due to buffer cache constraints).  However, even if READs
6339	   are allowed in these circumstances, the server MUST still check for
6340	   locks that conflict with the READ (e.g. another open specify denial
6341	   of READs).  Note that a server which does enforce the access mode
6342	   check on READs need not explicitly check for conflicting share
6343	   reservations since the existence of OPEN for read access guarantees
6344	   that no conflicting share reservation can exist.

6346	   The read bypass special stateid (all bits of "other" and "seqid" set
6347	   to one) stateid indicates a desire to bypass locking checks.  The
6348	   server MAY allow READ operations to bypass locking checks at the
6349	   server, when this special stateid is used.  However, WRITE operations
6350	   with this special stateid value MUST NOT bypass locking checks and
6351	   are treated exactly the same as if a special stateid for anonymous
6352	   state were used.

6354	   A lock may not be granted while a READ or WRITE operation using one
6355	   of the special stateids is being performed and the range of the lock
6356	   request conflicts with the range of the READ or WRITE operation.  For
6357	   the purposes of this paragraph, a conflict occurs when a shared lock
6358	   is requested and a WRITE operation is being performed, or an
6359	   exclusive lock is requested and either a READ or a WRITE operation is
6360	   being performed.  A SETATTR that sets size is treated similarly to a
6361	   WRITE as discussed above.

6363	8.2.  Lock Ranges

6365	   The protocol allows a lock owner to request a lock with an octet
6366	   range and then either upgrade, downgrade, or unlock a sub-range of
6367	   the initial lock.  It is expected that this will be an uncommon type
6368	   of request.  In any case, servers or server filesystems may not be
6369	   able to support sub-range lock semantics.  In the event that a server
6370	   receives a locking request that represents a sub-range of current
6371	   locking state for the lock owner, the server is allowed to return the
6372	   error NFS4ERR_LOCK_RANGE to signify that it does not support sub-
6373	   range lock operations.  Therefore, the client should be prepared to
6374	   receive this error and, if appropriate, report the error to the
6375	   requesting application.

6377	   The client is discouraged from combining multiple independent locking
6378	   ranges that happen to be adjacent into a single request since the
6379	   server may not support sub-range requests and for reasons related to
6380	   the recovery of file locking state in the event of server failure.
6381	   As discussed in the section "Server Failure and Recovery" below, the
6382	   server may employ certain optimizations during recovery that work
6383	   effectively only when the client's behavior during lock recovery is
6384	   similar to the client's locking behavior prior to server failure.

6386	8.3.  Upgrading and Downgrading Locks

6388	   If a client has a write lock on a record, it can request an atomic
6389	   downgrade of the lock to a read lock via the LOCK request, by setting
6390	   the type to READ_LT.  If the server supports atomic downgrade, the
6391	   request will succeed.  If not, it will return NFS4ERR_LOCK_NOTSUPP.
6392	   The client should be prepared to receive this error, and if
6393	   appropriate, report the error to the requesting application.

6395	   If a client has a read lock on a record, it can request an atomic
6396	   upgrade of the lock to a write lock via the LOCK request by setting
6397	   the type to WRITE_LT or WRITEW_LT.  If the server does not support
6398	   atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP.  If the upgrade
6399	   can be achieved without an existing conflict, the request will
6400	   succeed.  Otherwise, the server will return either NFS4ERR_DENIED or
6401	   NFS4ERR_DEADLOCK.  The error NFS4ERR_DEADLOCK is returned if the
6402	   client issued the LOCK request with the type set to WRITEW_LT and the
6403	   server has detected a deadlock.  The client should be prepared to
6404	   receive such errors and if appropriate, report the error to the
6405	   requesting application.

6407	8.4.  Blocking Locks

6409	   Some clients require the support of blocking locks.  NFSv4.1 does not
6410	   provide a callback when a previously unavailable lock becomes
6411	   available.  Clients thus have no choice but to continually poll for
6412	   the lock.  This presents a fairness problem.  Two new lock types are
6413	   added, READW and WRITEW, and are used to indicate to the server that
6414	   the client is requesting a blocking lock.  The server should maintain
6415	   an ordered list of pending blocking locks.  When the conflicting lock
6416	   is released, the server may wait the lease period for the first
6417	   waiting client to re-request the lock.  After the lease period
6418	   expires the next waiting client request is allowed the lock.  Clients
6419	   are required to poll at an interval sufficiently small that it is
6420	   likely to acquire the lock in a timely manner.  The server is not
6421	   required to maintain a list of pending blocked locks as it is used to
6422	   increase fairness and not correct operation.  Because of the
6423	   unordered nature of crash recovery, storing of lock state to stable
6424	   storage would be required to guarantee ordered granting of blocking
6425	   locks.

6427	   Servers may also note the lock types and delay returning denial of
6428	   the request to allow extra time for a conflicting lock to be
6429	   released, allowing a successful return.  In this way, clients can
6430	   avoid the burden of needlessly frequent polling for blocking locks.
6431	   The server should take care in the length of delay in the event the
6432	   client retransmits the request.

6434	   If a server receives a blocking lock request, denies it, and then
6435	   later receives a nonblocking request for the same lock, which is also
6436	   denied, then it should remove the lock in question from its list of
6437	   pending blocking locks.  Clients should use such a nonblocking
6438	   request to indicate to the server that this is the last time they
6439	   intend to poll for the lock, as may happen when the process
6440	   requesting the lock is interrupted.  This is a courtesy to the
6441	   server, to prevent it from unnecessarily waiting a lease period
6442	   before granting other lock requests.  However, clients are not
6443	   required to perform this courtesy, and servers must not depend on
6444	   them doing so.  Also, clients must be prepared for the possibility
6445	   that this final locking request will be accepted.

6447	8.5.  Lease Renewal

6449	   The purpose of a lease is to allow a server to remove stale locks
6450	   that are held by a client that has crashed or is otherwise
6451	   unreachable.  It is not a mechanism for cache consistency and lease
6452	   renewals may not be denied if the lease interval has not expired.

6454	   Since each session is associated with a specific client, any
6455	   operation issued on that session is an indication that the associated
6456	   client is reachable.  When a request is issued for a given session,
6457	   execution of a SEQUENCE operation will result in all leases for the
6458	   associated client to be implicitly renewed.  This approach allows for
6459	   low overhead lease renewal which scales well.  In the typical case no
6460	   extra RPC calls are required for lease renewal and in the worst case
6461	   one RPC is required every lease period, via a COMPOUND that consists
6462	   solely of a single SEQUENCE operation.  The number of locks held by
6463	   the client is not a factor since all state for the client is involved
6464	   with the lease renewal action.

6466	   Since all operations that create a new lease also renew existing
6467	   leases, the server must maintain a common lease expiration time for
6468	   all valid leases for a given client.  This lease time can then be
6469	   easily updated upon implicit lease renewal actions.

6471	8.6.  Crash Recovery

6473	   The important requirement in crash recovery is that both the client
6474	   and the server know when the other has failed.  Additionally, it is
6475	   required that a client sees a consistent view of data across server
6476	   restarts or reboots.  All READ and WRITE operations that may have
6477	   been queued within the client or network buffers must wait until the
6478	   client has successfully recovered the locks protecting the READ and
6479	   WRITE operations.

6481	8.6.1.  Client Failure and Recovery

6483	   In the event that a client fails, the server may release the client's
6484	   locks when the associated leases have expired.  Conflicting locks
6485	   from another client may only be granted after this lease expiration.
6486	   When a client has not failed and re-establishes his lease before
6487	   expiration occurs, requests for conflicting locks will not be
6488	   granted.

6490	   To minimize client delay upon restart, lock requests are associated
6491	   with an instance of the client by a client supplied verifier.  This
6492	   verifier is part of the initial EXCHANGE_ID call made by the client.
6493	   The server returns a client ID as a result of the EXCHANGE_ID
6494	   operation.  The client then confirms the use of the client ID by
6495	   establishing a session associated with that client ID.  All locks,
6496	   including opens, record locks, delegations, and layout obtained by
6497	   sessions using that client ID are associated with that client ID.

6499	   Since the verifier will be changed by the client upon each
6500	   initialization, the server can compare a new verifier to the verifier
6501	   associated with currently held locks and determine that they do not
6502	   match.  This signifies the client's new instantiation and subsequent
6503	   loss of locking state.  As a result, the server is free to release
6504	   all locks held which are associated with the old client ID which was
6505	   derived from the old verifier.  At this point conflicting locks from
6506	   other clients, kept waiting while the leaser had not yet expired, can
6507	   be granted.

6509	   Note that the verifier must have the same uniqueness properties of
6510	   the verifier for the COMMIT operation.

6512	8.6.2.  Server Failure and Recovery

6514	   If the server loses locking state (usually as a result of a restart
6515	   or reboot), it must allow clients time to discover this fact and re-
6516	   establish the lost locking state.  The client must be able to re-
6517	   establish the locking state without having the server deny valid
6518	   requests because the server has granted conflicting access to another
6519	   client.  Likewise, if there is a possibility that clients have not
6520	   yet re-established their locking state for a file, the server must
6521	   disallow READ and WRITE operations for that file.

6523	   A client can determine that loss of locking state has occurred via
6524	   several methods.

6526	   1.  When a SEQUENCE succeeds, but sr_status_flags in the reply to
6527	       SEQUENCE indicates SEQ4_STATUS_RESTART_RECLAIM_NEEDED (see
6528	       Section 17.46.4).  The client's client ID and session are valid
6529	       (have persisted through server restart) and the client can now
6530	       re-establish its lock state (Section 8.6.2.1).

6532	   2.  When an operation returns NFS4ERR_STALE_STATEID, this indicates a
6533	       stateid invalidated by a server reboot or restart.  Since the
6534	       operation that returned NFS4ERR_STALE_STATEID MUST have been
6535	       preceded by SEQUENCE, and SEQUENCE did not return an error, this
6536	       means the client ID and session are valid.  The client can now
6537	       re-establish is lock state as described in Section 8.6.2.1.  Note
6538	       that the server should (MUST) have set
6539	       SEQ4_STATUS_RESTART_RECLAIM_NEEDED in the sr_status_flags of the
6540	       results of the SEQUENCE operation, and thus this situation should
6541	       be the same as that described above.

6543	   3.  When a SEQUENCE operation returns NFS4ERR_STALE_CLIENTID, this
6544	       means both sessionid SEQUENCE refers to (field sa_sessionid) and
6545	       the implied client ID are now invalid, where the client ID was
6546	       invalidated by server reboot or restart or by lease expiration.
6547	       When SEQUENCE returns NFS4ERR_STALE_CLIENTID, the client must
6548	       establish a new client ID (see Section 8.1.1) and re-establish
6549	       its lock state (Section 8.6.2.1).

6551	   4.  When a SEQUENCE operation returns NFS4ERR_BADSESSION, this may
6552	       mean the session has been destroyed, but the client ID is still
6553	       valid.  The client issues a CREATE_SESSION request with the
6554	       client ID to re-establish the session.  If CREATE_SESSION fails
6555	       with NFS4ERR_STALE_CLIENTID, the client must establish a new
6556	       client ID (see Section 8.1.1) and re-establish its lock state
6557	       (Section 8.6.2.1).  If CREATE_SESSION succeeds, the client must
6558	       then re-establish its lock state (Section 8.6.2.1).

6560	   5.  When a operation, neither SEQUENCE nor preceded by SEQUENCE (for
6561	       example, CREATE_SESSION, DESTROY_SESSION) returns
6562	       NFS4ERR_STALE_CLIENTID.  The client MUST establish a new client
6563	       ID (Section 8.1.1) and re-establish its lock state
6564	       (Section 8.6.2.1).

6566	8.6.2.1.  State Reclaim

6568	   Once a session is established using the new client ID, the client
6569	   will use reclaim-type locking requests (i.e.  LOCK requests with
6570	   reclaim set to true and OPEN operations with a claim type of
6571	   CLAIM_PREVIOUS) to re-establish its locking state.  Once this is
6572	   done, or if there is no such locking state to reclaim, the client
6573	   does a RECLAIM_COMPLETE operation to indicate that it has reclaimed
6574	   all of the locking state that it will reclaim.  Once a client does a
6575	   RECLAIM_COMPLETE operation, it may attempt non-reclaim locking
6576	   operations, although it may get NFS4ERR_GRACE errors on these until
6577	   the period of special handling is over.

6579	   The period of special handling of locking and READs and WRITEs, is
6580	   referred to as the "grace period".  During the grace period, clients
6581	   recover locks and the associated state using reclaim-type locking
6582	   requests.  During this period, the server must reject READ and WRITE
6583	   operations and non-reclaim locking requests (i.e. other LOCK and OPEN
6584	   operations) with an error of NFS4ERR_GRACE, unless it is able to
6585	   guarantee that these may be done safely, as described below.

6587	   The grace period may last until all clients who are known to possibly
6588	   have had locks have done a RECLAIM_COMPLETE operation, indicating
6589	   that they have finished reclaiming the locks they held before the
6590	   server reboot.  The server is assumed to maintain in stable storage a
6591	   list of clients who may have such locks.  The server may also
6592	   terminate the grace period before all clients have done
6593	   RECLAIM_COMPLETE.  The server SHOULD NOT terminate the grace period
6594	   before a time equal to the lease period in order to give clients an
6595	   opportunity to find out about the server reboot.  Some additional
6596	   time in order to allow time to establish a new client ID and session
6597	   and to effect lock reclaims may be added.

6599	   If the server can reliably determine that granting a non-reclaim
6600	   request will not conflict with reclamation of locks by other clients,
6601	   the NFS4ERR_GRACE error does not have to be returned even within the
6602	   grace period, although NFS4ERR_GRACE must always be returned to
6603	   clients attempting a non-reclaim lock request before doing their own
6604	   RECLAIM_COMPLETE.  For the server to be able to service READ and
6605	   WRITE operations during the grace period, it must again be able to
6606	   guarantee that no possible conflict could arise between a potential
6607	   reclaim locking request and the READ or WRITE operation.  If the
6608	   server is unable to offer that guarantee, the NFS4ERR_GRACE error
6609	   must be returned to the client.

6611	   For a server to provide simple, valid handling during the grace
6612	   period, the easiest method is to simply reject all non-reclaim
6613	   locking requests and READ and WRITE operations by returning the
6614	   NFS4ERR_GRACE error.  However, a server may keep information about
6615	   granted locks in stable storage.  With this information, the server
6616	   could determine if a regular lock or READ or WRITE operation can be
6617	   safely processed.

6619	   For example, if the server maintained on stable storage summary
6620	   information on whether mandatory locks exist, either mandatory record
6621	   locks, or share reservations specifying deny modes, many requests
6622	   could be allowed during the grace period.  If it is known that no
6623	   such share reservations exist, OPEN request that do not specify deny
6624	   modes may be safely granted.  If, in addition, it is known that no
6625	   mandatory record locks exist, either through information stored on
6626	   stable storage or simply because the server does not support such
6627	   locks, READ and WRITE requests may be safely processed during the
6628	   grace period.

6630	   To reiterate, for a server that allows non-reclaim lock and I/O
6631	   requests to be processed during the grace period, it MUST determine
6632	   that no lock subsequently reclaimed will be rejected and that no lock
6633	   subsequently reclaimed would have prevented any I/O operation
6634	   processed during the grace period.

6636	   Clients should be prepared for the return of NFS4ERR_GRACE errors for
6637	   non-reclaim lock and I/O requests.  In this case the client should
6638	   employ a retry mechanism for the request.  A delay (on the order of
6639	   several seconds) between retries should be used to avoid overwhelming
6640	   the server.  Further discussion of the general issue is included in
6641	   [Floyd].  The client must account for the server that is able to
6642	   perform I/O and non-reclaim locking requests within the grace period
6643	   as well as those that can not do so.

6645	   A reclaim-type locking request outside the server's grace period can
6646	   only succeed if the server can guarantee that no conflicting lock or
6647	   I/O request has been granted since reboot or restart.

6649	   A server may, upon restart, establish a new value for the lease
6650	   period.  Therefore, clients should, once a new client ID is
6651	   established, refetch the lease_time attribute and use it as the basis
6652	   for lease renewal for the lease associated with that server.
6653	   However, the server must establish, for this restart event, a grace
6654	   period at least as long as the lease period for the previous server
6655	   instantiation.  This allows the client state obtained during the
6656	   previous server instance to be reliably re-established.

6658	8.6.3.  Network Partitions and Recovery

6660	   If the duration of a network partition is greater than the lease
6661	   period provided by the server, the server will have not received a
6662	   lease renewal from the client.  If this occurs, the server may free
6663	   all locks held for the client, or it may allow the lock state to
6664	   remain for a considerable period, subject to the constraint that if a
6665	   request for a conflicting lock is made, locks associated with expired
6666	   leases do not prevent such a conflicting lock from being granted but
6667	   are revoked as necessary so as not to interfere with such conflicting
6668	   requests.

6670	   If the server chooses to delay freeing of lock state until there is a
6671	   conflict, it may either free all of the clients locks once there is a
6672	   conflict, or it may only revoke the minimum set of locks necessary to
6673	   allow conflicting requests.  When it adopts the finer-grained
6674	   approach, it must revoke all locks associated with a given stateid,
6675	   as long as it revokes a single such lock.

6677	   When the server chooses to free all of a client's lock state, either
6678	   immediately upon lease expiration, or a result of the first attempt
6679	   to get a lock, all stateids held by the client will become invalid or
6680	   stale.  Once the client is able to reach the server after such a
6681	   network partition, the status returned by the SEQUENCE operation will
6682	   indicate a loss of locking state.  In addition all I/O submitted by
6683	   the client with the now invalid stateids will fail with the server
6684	   returning the error NFS4ERR_EXPIRED.  Once the client learns of the
6685	   loss of locking state, it will suitably notify the applications that
6686	   held the invalidated locks.  The client should then take action to
6687	   free invalidated stateids, either by establishing a new client ID
6688	   using a new verifier or by doing a FREE_STATEID operation to release
6689	   each of the invalidated stateids.

6691	   When the server adopts a finer-grained approach to revocation of
6692	   locks when lease have expired, only a subset of stateids will
6693	   normally become invalid during a network partition.  When the client
6694	   is able to communicate with the server after such a network
6695	   partition, the status returned by the SEQUENCE operation will
6696	   indicate a partial loss of locking state.  In addition, operations,
6697	   including I/O submitted by the client with the now invalid stateids
6698	   will fail with the server returning the error NFS4ERR_EXPIRED.  Once
6699	   the client learns of the loss of locking state, it will use the
6700	   TEST_STATEID operation on all of its stateids to determine which
6701	   locks have been lost and them suitably notify the applications that
6702	   held the invalidated locks.  The client can then release the
6703	   invalidated locking state and acknowledge the revocation of the
6704	   associated locks by doing a FREE_STATEID operation on each of the
6705	   invalidated stateids.

6707	   When a network partition is combined with a server reboot, there are
6708	   edge conditions that place requirements on the server in order to
6709	   avoid silent data corruption following the server reboot.  Two of
6710	   these edge conditions are known, and are discussed below.

6712	   The first edge condition arises as a result of the scenarios such as
6713	   the following:

6715	   1.  Client A acquires a lock.

6717	   2.  Client A and server experience mutual network partition, such
6718	       that client A is unable to renew its lease.

6720	   3.  Client A's lease expires, and the server releases lock.

6722	   4.  Client B acquires a lock that would have conflicted with that of
6723	       Client A.

6725	   5.  Client B releases its lock.

6727	   6.  Server reboots.

6729	   7.  Network partition between client A and server heals.

6731	   8.  Client A connects to new server instance and finds out about
6732	       server reboot.

6734	   9.  Client A reclaims its lock within the server's grace period.

6736	   Thus, at the final step, the server has erroneously granted client
6737	   A's lock reclaim.  If client B modified the object the lock was
6738	   protecting, client A will experience object corruption.

6740	   The second known edge condition arises in situations such as the
6741	   following:

6743	   1.   Client A acquires one or more locks.

6745	   2.   Server reboots.

6747	   3.   Client A and server experience mutual network partition, such
6748	        that client A is unable to reclaim all of its locks within the
6749	        grace period.

6751	   4.   Server's reclaim grace period ends.  Client A has either no
6752	        locks or an incomplete set of locks known to the server.

6754	   5.   Client B acquires a lock that would have conflicted with a lock
6755	        of client A that was not reclaimed.

6757	   6.   Client B releases the lock.

6759	   7.   Server reboots a second time.

6761	   8.   Network partition between client A and server heals.

6763	   9.   Client A connects to new server instance and finds out about
6764	        server reboot.

6766	   10.  Client A reclaims its lock within the server's grace period.

6768	   As with the first edge condition, the final step of the scenario of
6769	   the second edge condition has the server erroneously granting client
6770	   A's lock reclaim.

6772	   Solving the first and second edge conditions requires that the server
6773	   either always assumes after it reboots that some edge condition
6774	   occurs, and thus return NFS4ERR_NO_GRACE for all reclaim attempts, or
6775	   that the server record some information in stable storage.  The
6776	   amount of information the server records in stable storage is in
6777	   inverse proportion to how harsh the server intends to be whenever
6778	   edge conditions arise.  The server that is completely tolerant of all
6779	   edge conditions will record in stable storage every lock that is
6780	   acquired, removing the lock record from stable storage only when the
6781	   lock is released.  For the two edge conditions discussed above, the
6782	   harshest a server can be, and still support a grace period for
6783	   reclaims, requires that the server record in stable storage
6784	   information some minimal information.  For example, a server
6785	   implementation could, for each client, save in stable storage a
6786	   record containing:

6788	   o  the client's id string
6789	   o  a boolean that indicates if the client's lease expired or if there
6790	      was administrative intervention (see Section 8.7) to revoke a
6791	      record lock, share reservation, or delegation and there has been
6792	      no acknowledgement (via FREE_STATEID) of such revocation.

6794	   o  a boolean that indicates whether the client may have locks that it
6795	      believes to be reclaimable in situations which the grace period
6796	      was terminated, making the server's view of lock reclaimability
6797	      suspect.  The server will set this for any client record in stable
6798	      storage where the client has not done a RECLAIM_COMPLETE, before
6799	      it grants any new (i.e. not reclaimed) lock to any client.

6801	   Assuming the above record keeping, for the first edge condition,
6802	   after the server reboots, the record that client A's lease expired
6803	   means that another client could have acquired a conflicting record
6804	   lock, share reservation, or delegation.  Hence the server must reject
6805	   a reclaim from client A with the error NFS4ERR_NO_GRACE.

6807	   For the second edge condition, after the server reboots for a second
6808	   time, the indication that the client had not completed its reclaims
6809	   at the time at which the grace period ended means that the server
6810	   must reject a reclaim from client A with the error NFS4ERR_NO_GRACE.

6812	   When either edge condition occurs, the client's attempt to reclaim
6813	   locks will result in the error NFS4ERR_NO_GRACE.  When this is
6814	   received, or after the client reboots with no lock state, the client
6815	   will issue a RECLAIM_COMPLETE.  When the RECLAIM_COMPLETE is
6816	   received, the server and client are again in agreement regarding
6817	   reclaimable locks and both booleans in persistent storage can be
6818	   reset, to be set again only when there is a subsequent event that
6819	   causes lock reclaim operations to be questionable.

6821	   Regardless of the level and approach to record keeping, the server
6822	   MUST implement one of the following strategies (which apply to
6823	   reclaims of share reservations, record locks, and delegations):

6825	   1.  Reject all reclaims with NFS4ERR_NO_GRACE.  This is extremely
6826	       unforgiving, but necessary if the server does not record lock
6827	       state in stable storage.

6829	   2.  Record sufficient state in stable storage such that all known
6830	       edge conditions involving server reboot, including the two noted
6831	       in this section, are detected.  False positives are acceptable.
6832	       Note that at this time, it is not known if there are other edge
6833	       conditions.

6835	       In the event that, after a server reboot, the server determines
6836	       that there is unrecoverable damage or corruption to the
6837	       information in stable storage, then for all clients and/or locks
6838	       which may be affected, the server MUST return NFS4ERR_NO_GRACE.

6840	   A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
6841	   outside the scope of this specification, since the strategies for
6842	   such handling are very dependent on the client's operating
6843	   environment.  However, one potential approach is described below.

6845	   When the client receives NFS4ERR_NO_GRACE, it could examine the
6846	   change attribute of the objects the client is trying to reclaim state
6847	   for, and use that to determine whether to re-establish the state via
6848	   normal OPEN or LOCK requests.  This is acceptable provided the
6849	   client's operating environment allows it.  In other words, the client
6850	   implementor is advised to document for his users the behavior.  The
6851	   client could also inform the application that its record lock or
6852	   share reservations (whether they were delegated or not) have been
6853	   lost, such as via a UNIX signal, a GUI pop-up window, etc.  See the
6854	   section, "Data Caching and Revocation" for a discussion of what the
6855	   client should do for dealing with unreclaimed delegations on client
6856	   state.

6858	   For further discussion of revocation of locks see Section 8.7.

6860	8.7.  Server Revocation of Locks

6862	   At any point, the server can revoke locks held by a client and the
6863	   client must be prepared for this event.  When the client detects that
6864	   its locks have been or may have been revoked, the client is
6865	   responsible for validating the state information between itself and
6866	   the server.  Validating locking state for the client means that it
6867	   must verify or reclaim state for each lock currently held.

6869	   The first occasion of lock revocation is upon server reboot or
6870	   restart.  In this instance the client will receive an error
6871	   (NFS4ERR_STALE_STATEID on an operation that takes a stateid as an
6872	   argument or NFS4ERR_STALE_CLIENTID on an operation that takes a
6873	   sessionid or client ID) and the client will proceed with normal crash
6874	   recovery as described in the Section 8.6.2.1.

6876	   The second occasion of lock revocation is the inability to renew the
6877	   lease before expiration, as discussed above.  While this is
6878	   considered a rare or unusual event, the client must be prepared to
6879	   recover.  The server is responsible for determining lease expiration,
6880	   and deciding exactly how to deal with it, informing the client of the
6881	   scope of the lock revocation.  The client then uses the status
6882	   information provided by the server in the SEQUENCE results (field
6883	   sr_status_flags, see Section 17.46.4) to synchronize its locking
6884	   state with that of the server, in order to recover.

6886	   The third occasion of lock revocation can occur as a result of
6887	   revocation of locks within the lease period, either because of
6888	   administrative intervention, or because a recallable lock (a
6889	   delegation or layout) was not returned within the lease period after
6890	   having been recalled.  While these are considered rare events, they
6891	   are possible and the client must be prepared to deal with them.  When
6892	   either of these events occur, the client finds out about the
6893	   situation through the status returned by the SEQUENCE operation.  Any
6894	   use of stateids associated with revoked locks will receive the error
6895	   NFS4ERR_ADMIN_REVOKED or NFS4ERR_DELEG_REVOKED, as appropriate.

6897	   In all situations in which a subset of locking state may have been
6898	   revoked, which include all cases in which locking state is revoked
6899	   within the lease period, it is up to the client to determine which
6900	   locks have been revoked and which have not.  It does this by using
6901	   the TEST_STATEID operation on the appropriate set of stateids.  Once
6902	   the set of revoked locks has been determined, the applications can be
6903	   notified, and the invalidated stateids can be freed and lock
6904	   revocation acknowledged by using FREE_STATEID.

6906	8.8.  Share Reservations

6908	   A share reservation is a mechanism to control access to a file.  It
6909	   is a separate and independent mechanism from record locking.  When a
6910	   client opens a file, it issues an OPEN operation to the server
6911	   specifying the type of access required (READ, WRITE, or BOTH) and the
6912	   type of access to deny others (deny NONE, READ, WRITE, or BOTH).  If
6913	   the OPEN fails the client will fail the application's open request.

6915	   Pseudo-code definition of the semantics:

6917	           if (request.access == 0)
6918	           return (NFS4ERR_INVAL)
6919	           else
6920	           if ((request.access & file_state.deny)) ||
6921	           (request.deny & file_state.access))
6922	           return (NFS4ERR_DENIED)

6924	   This checking of share reservations on OPEN is done with no exception
6925	   for an existing OPEN for the same open-owner.

6927	   The constants used for the OPEN and OPEN_DOWNGRADE operations for the
6928	   access and deny fields are as follows:

6930	           const OPEN4_SHARE_ACCESS_READ   = 0x00000001;
6931	           const OPEN4_SHARE_ACCESS_WRITE  = 0x00000002;
6932	           const OPEN4_SHARE_ACCESS_BOTH   = 0x00000003;

6934	           const OPEN4_SHARE_DENY_NONE     = 0x00000000;
6935	           const OPEN4_SHARE_DENY_READ     = 0x00000001;
6936	           const OPEN4_SHARE_DENY_WRITE    = 0x00000002;
6937	           const OPEN4_SHARE_DENY_BOTH     = 0x00000003;

6939	8.9.  OPEN/CLOSE Operations

6941	   To provide correct share semantics, a client MUST use the OPEN
6942	   operation to obtain the initial filehandle and indicate the desired
6943	   access and what access, if any, to deny.  Even if the client intends
6944	   to use a special stateid for anonymous state or read bypass, it must
6945	   still obtain the filehandle for the regular file with the OPEN
6946	   operation so the appropriate share semantics can be applied.  For
6947	   clients that do not have a deny mode built into their open
6948	   programming interfaces, deny equal to NONE should be used.

6950	   The OPEN operation with the CREATE flag, also subsumes the CREATE
6951	   operation for regular files as used in previous versions of the NFS
6952	   protocol.  This allows a create with a share to be done atomically.

6954	   The CLOSE operation removes all share reservations held by the open-
6955	   owner on that file.  If record locks are held, the client SHOULD
6956	   release all locks before issuing a CLOSE.  The server MAY free all
6957	   outstanding locks on CLOSE but some servers may not support the CLOSE
6958	   of a file that still has record locks held.  The server MUST return
6959	   failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the
6960	   CLOSE.

6962	   The LOOKUP operation will return a filehandle without establishing
6963	   any lock state on the server.  Without a valid stateid, the server
6964	   will assume the client has the least access.  For example, a file
6965	   opened with deny READ/WRITE using a filehandle obtained through
6966	   LOOKUP could only be read using the special read bypass stateid and
6967	   could not be written at all because it would not have a valid stateid
6968	   and the special anonymous stateid would not be allowed access.

6970	8.10.  Open Upgrade and Downgrade

6972	   When an OPEN is done for a file and the open-owner for which the open
6973	   is being done already has the file open, the result is to upgrade the
6974	   open file status maintained on the server to include the access and
6975	   deny bits specified by the new OPEN as well as those for the existing
6976	   OPEN.  The result is that there is one open file, as far as the
6977	   protocol is concerned, and it includes the union of the access and
6978	   deny bits for all of the OPEN requests completed.  Only a single
6979	   CLOSE will be done to reset the effects of both OPENs.  Note that the
6980	   client, when issuing the OPEN, may not know that the same file is in
6981	   fact being opened.  The above only applies if both OPENs result in
6982	   the OPENed object being designated by the same filehandle.

6984	   When the server chooses to export multiple filehandles corresponding
6985	   to the same file object and returns different filehandles on two
6986	   different OPENs of the same file object, the server MUST NOT "OR"
6987	   together the access and deny bits and coalesce the two open files.
6988	   Instead the server must maintain separate OPENs with separate
6989	   stateids and will require separate CLOSEs to free them.

6991	   When multiple open files on the client are merged into a single open
6992	   file object on the server, the close of one of the open files (on the
6993	   client) may necessitate change of the access and deny status of the
6994	   open file on the server.  This is because the union of the access and
6995	   deny bits for the remaining opens may be smaller (i.e. a proper
6996	   subset) than previously.  The OPEN_DOWNGRADE operation is used to
6997	   make the necessary change and the client should use it to update the
6998	   server so that share reservation requests by other clients are
6999	   handled properly.

7001	8.11.  Short and Long Leases

7003	   When determining the time period for the server lease, the usual
7004	   lease tradeoffs apply.  Short leases are good for fast server
7005	   recovery at a cost of increased operations to effect lease renewal
7006	   (when there are no other operations during the period to effect lease
7007	   renewal as a side-effect).  Long leases are certainly kinder and
7008	   gentler to servers trying to handle very large numbers of clients.
7009	   The number of extra requests to effect lock renewal drop in inverse
7010	   proportion to the lease time.  The disadvantages of long leases
7011	   include the possibility of slower recovery after certain failures.
7012	   After server failure, a longer grace period may be required when some
7013	   clients do not promptly reclaim their locks and do a
7014	   RECLAIM_COMPLETE.  In the event of client failure, it can longer
7015	   period for leases to expire thus forcing conflicting requests to
7016	   wait.

7018	   Long leases are usable if the server is able to store lease state in
7019	   non-volatile memory.  Upon recovery, the server can reconstruct the
7020	   lease state from its non-volatile memory and continue operation with
7021	   its clients and therefore long leases would not be an issue.

7023	8.12.  Clocks, Propagation Delay, and Calculating Lease Expiration

7025	   To avoid the need for synchronized clocks, lease times are granted by
7026	   the server as a time delta.  However, there is a requirement that the
7027	   client and server clocks do not drift excessively over the duration
7028	   of the lock.  There is also the issue of propagation delay across the
7029	   network which could easily be several hundred milliseconds as well as
7030	   the possibility that requests will be lost and need to be
7031	   retransmitted.

7033	   To take propagation delay into account, the client should subtract it
7034	   from lease times (e.g. if the client estimates the one-way
7035	   propagation delay as 200 msec, then it can assume that the lease is
7036	   already 200 msec old when it gets it).  In addition, it will take
7037	   another 200 msec to get a response back to the server.  So the client
7038	   must send a lock renewal or write data back to the server 400 msec
7039	   before the lease would expire.

7041	   The server's lease period configuration should take into account the
7042	   network distance of the clients that will be accessing the server's
7043	   resources.  It is expected that the lease period will take into
7044	   account the network propagation delays and other network delay
7045	   factors for the client population.  Since the protocol does not allow
7046	   for an automatic method to determine an appropriate lease period, the
7047	   server's administrator may have to tune the lease period.

7049	8.13.  Vestigial Locking Infrastructure From V4.0

7051	   There are a number of operations and fields within existing
7052	   operations that no longer have a function in minor version one.  In
7053	   one way or another, these changes are all due to the implementation
7054	   of sessions which provides client context and replay protection as a
7055	   base feature of the protocol, separate from locking itself.

7057	   The following operations have become mandatory-to-not-implement.  The
7058	   server should return NFS4ERR_NOTSUPP if these operations are found in
7059	   an NFSv4.1 COMPOUND.

7061	   o  SETCLIENTID since its function has been replaced by EXCHANGE_ID.

7063	   o  SETCLIENTID_CONFIRM since client ID confirmation now happens by
7064	      means of CREATE_SESSION.

7066	   o  OPEN_CONFIRM because OPEN's no longer require confirmation to
7067	      establish an owner-based sequence value.

7069	   o  RELEASE_LOCKOWNER because lock-owners with no associated locks
7070	      have any sequence-related state and so can be deleted by the
7071	      server at will.

7073	   o  RENEW because every SEQUENCE operation for a session causes lease
7074	      renewal, making a separate operation useless.

7076	   Also, there are a number of fields, present in existing operations
7077	   related to locking that have no use in minor version one.  They were
7078	   used in minor version zero to perform functions now provided in a
7079	   different fashion.

7081	   o  Sequence id's used to sequence requests for a given state-owner
7082	      and to provide replay protection, now provided via sessions.

7084	   o  Client IDs used to identify the client associated with a given
7085	      request.  Client identification is now available using the client
7086	      ID associated with the current session, without needing an
7087	      explicit client ID field.

7089	   Such vestigial fields in existing operations should be set by the
7090	   client to zero.  When they are not, the server MUST return an
7091	   NFS4ERR_INVAL error.

7093	9.  Client-Side Caching

7095	   Client-side caching of data, of file attributes, and of file names is
7096	   essential to providing good performance with the NFS protocol.
7097	   Providing distributed cache coherence is a difficult problem and
7098	   previous versions of the NFS protocol have not attempted it.
7099	   Instead, several NFS client implementation techniques have been used
7100	   to reduce the problems that a lack of coherence poses for users.
7101	   These techniques have not been clearly defined by earlier protocol
7102	   specifications and it is often unclear what is valid or invalid
7103	   client behavior.

7105	   The NFS version 4 protocol uses many techniques similar to those that
7106	   have been used in previous protocol versions.  The NFS version 4
7107	   protocol does not provide distributed cache coherence.  However, it
7108	   defines a more limited set of caching guarantees to allow locks and
7109	   share reservations to be used without destructive interference from
7110	   client side caching.

7112	   In addition, the NFS version 4 protocol introduces a delegation
7113	   mechanism which allows many decisions normally made by the server to
7114	   be made locally by clients.  This mechanism provides efficient
7115	   support of the common cases where sharing is infrequent or where
7116	   sharing is read-only.

7118	9.1.  Performance Challenges for Client-Side Caching

7120	   Caching techniques used in previous versions of the NFS protocol have
7121	   been successful in providing good performance.  However, several
7122	   scalability challenges can arise when those techniques are used with
7123	   very large numbers of clients.  This is particularly true when
7124	   clients are geographically distributed which classically increases
7125	   the latency for cache revalidation requests.

7127	   The previous versions of the NFS protocol repeat their file data
7128	   cache validation requests at the time the file is opened.  This
7129	   behavior can have serious performance drawbacks.  A common case is
7130	   one in which a file is only accessed by a single client.  Therefore,
7131	   sharing is infrequent.

7133	   In this case, repeated reference to the server to find that no
7134	   conflicts exist is expensive.  A better option with regards to
7135	   performance is to allow a client that repeatedly opens a file to do
7136	   so without reference to the server.  This is done until potentially
7137	   conflicting operations from another client actually occur.

7139	   A similar situation arises in connection with file locking.  Sending
7140	   file lock and unlock requests to the server as well as the read and
7141	   write requests necessary to make data caching consistent with the
7142	   locking semantics (see the section "Data Caching and File Locking")
7143	   can severely limit performance.  When locking is used to provide
7144	   protection against infrequent conflicts, a large penalty is incurred.
7145	   This penalty may discourage the use of file locking by applications.

7147	   The NFS version 4 protocol provides more aggressive caching
7148	   strategies with the following design goals:

7150	   .IP o Compatibility with a large range of server semantics. .IP o
7151	   Provide the same caching benefits as previous versions of the NFS
7152	   protocol when unable to provide the more aggressive model. .IP o
7153	   Requirements for aggressive caching are organized so that a large
7154	   portion of the benefit can be obtained even when not all of the
7155	   requirements can be met. .LP The appropriate requirements for the
7156	   server are discussed in later sections in which specific forms of
7157	   caching are covered. (see the section "Open Delegation").

7159	9.2.  Delegation and Callbacks

7161	   Recallable delegation of server responsibilities for a file to a
7162	   client improves performance by avoiding repeated requests to the
7163	   server in the absence of inter-client conflict.  With the use of a
7164	   "callback" RPC from server to client, a server recalls delegated
7165	   responsibilities when another client engages in sharing of a
7166	   delegated file.

7168	   A delegation is passed from the server to the client, specifying the
7169	   object of the delegation and the type of delegation.  There are
7170	   different types of delegations but each type contains a stateid to be
7171	   used to represent the delegation when performing operations that
7172	   depend on the delegation.  This stateid is similar to those
7173	   associated with locks and share reservations but differs in that the
7174	   stateid for a delegation is associated with a client ID and may be
7175	   used on behalf of all the open_owners for the given client.  A
7176	   delegation is made to the client as a whole and not to any specific
7177	   process or thread of control within it.

7179	   The callback path or backchannel is established by CREATE_SESSION and
7180	   BIND_CONN_TO_SESSION, and the client is required to maintain it.
7181	   Because the backchannel may be down, even temporarily, correct
7182	   protocol operation does not depend on them.  Preliminary testing of
7183	   callback functionality by means of a CB_COMPOUND procedure with a
7184	   single operation, CB_SEQUENCE, can be used to check the continuity of
7185	   the backchannel.  A server avoids delegating responsibilities until
7186	   it has determined that the backchannel exists.  Because the granting
7187	   of a delegation is always conditional upon the absence of conflicting
7188	   access, clients must not assume that a delegation will be granted and
7189	   they must always be prepared for OPENs, WANT_DELEGATIONs, and
7190	   GET_DIR_DELEGATIONs to be processed without any delegations being
7191	   granted.

7193	   Once granted, a delegation behaves in many ways like a lock.  There
7194	   is an associated lease that is subject to renewal together with all
7195	   of the other leases held by that client.

7197	   Unlike locks, an operation by a second client to a delegated file
7198	   will cause the server to recall a delegation through a callback.

7200	   On recall, the client holding the delegation must flush modified
7201	   state (such as modified data) to the server and return the
7202	   delegation.  The conflicting request will not receive a response
7203	   until the recall is complete.  The recall is considered complete when
7204	   the client returns the delegation or the server times out on the
7205	   recall and revokes the delegation as a result of the timeout.
7206	   Following the resolution of the recall, the server has the
7207	   information necessary to grant or deny the second client's request.

7209	   At the time the client receives a delegation recall, it may have
7210	   substantial state that needs to be flushed to the server.  Therefore,
7211	   the server should allow sufficient time for the delegation to be
7212	   returned since it may involve numerous RPCs to the server.  If the
7213	   server is able to determine that the client is diligently flushing
7214	   state to the server as a result of the recall, the server may extend
7215	   the usual time allowed for a recall.  However, the time allowed for
7216	   recall completion should not be unbounded.

7218	   An example of this is when responsibility to mediate opens on a given
7219	   file is delegated to a client (see the section "Open Delegation").
7220	   The server will not know what opens are in effect on the client.
7221	   Without this knowledge the server will be unable to determine if the
7222	   access and deny state for the file allows any particular open until
7223	   the delegation for the file has been returned.

7225	   A client failure or a network partition can result in failure to
7226	   respond to a recall callback.  In this case, the server will revoke
7227	   the delegation which in turn will render useless any modified state
7228	   still on the client.

7230	9.2.1.  Delegation Recovery

7232	   There are three situations that delegation recovery must deal with:

7234	   o  Client reboot or restart

7236	   o  Server reboot or restart

7238	   o  Network partition (full or callback-only)

7240	   In the event the client reboots or restarts, the failure to renew
7241	   leases will result in the revocation of record locks and share
7242	   reservations.  Delegations, however, may be treated a bit
7243	   differently.

7245	   There will be situations in which delegations will need to be
7246	   reestablished after a client reboots or restarts.  The reason for
7247	   this is the client may have file data stored locally and this data
7248	   was associated with the previously held delegations.  The client will
7249	   need to reestablish the appropriate file state on the server.

7251	   To allow for this type of client recovery, the server MAY extend the
7252	   period for delegation recovery beyond the typical lease expiration
7253	   period.  This implies that requests from other clients that conflict
7254	   with these delegations will need to wait.  Because the normal recall
7255	   process may require significant time for the client to flush changed
7256	   state to the server, other clients need be prepared for delays that
7257	   occur because of a conflicting delegation.  This longer interval
7258	   would increase the window for clients to reboot and consult stable
7259	   storage so that the delegations can be reclaimed.  For open
7260	   delegations, such delegations are reclaimed using OPEN with a claim
7261	   type of CLAIM_DELEGATE_PREV.  (See the sections on "Data Caching and
7262	   Revocation" and "Operation 18: OPEN" for discussion of open
7263	   delegation and the details of OPEN respectively).

7265	   A server MAY support a claim type of CLAIM_DELEGATE_PREV, but if it
7266	   does, it MUST NOT remove delegations upon a CREATE_SESSION that
7267	   confirms a client ID created by EXCHANGE_ID, and instead MUST, for a
7268	   period of time no less than that of the value of the lease_time
7269	   attribute, maintain the client's delegations to allow time for the
7270	   client to issue CLAIM_DELEGATE_PREV requests.  The server that
7271	   supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE operation.

7273	   When the server reboots or restarts, delegations are reclaimed (using
7274	   the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to
7275	   record locks and share reservations.  However, there is a slight
7276	   semantic difference.  In the normal case if the server decides that a
7277	   delegation should not be granted, it performs the requested action
7278	   (e.g.  OPEN) without granting any delegation.  For reclaim, the
7279	   server grants the delegation but a special designation is applied so
7280	   that the client treats the delegation as having been granted but
7281	   recalled by the server.  Because of this, the client has the duty to
7282	   write all modified state to the server and then return the
7283	   delegation.  This process of handling delegation reclaim reconciles
7284	   three principles of the NFS version 4 protocol:

7286	   o  Upon reclaim, a client reporting resources assigned to it by an
7287	      earlier server instance must be granted those resources.

7289	   o  The server has unquestionable authority to determine whether
7290	      delegations are to be granted and, once granted, whether they are
7291	      to be continued.

7293	   o  The use of callbacks is not to be depended upon until the client
7294	      has proven its ability to receive them.

7296	   When a network partition occurs, delegations are subject to freeing
7297	   by the server when the lease renewal period expires.  This is similar
7298	   to the behavior for locks and share reservations.  For delegations,
7299	   however, the server may extend the period in which conflicting
7300	   requests are held off.  Eventually the occurrence of a conflicting
7301	   request from another client will cause revocation of the delegation.
7302	   A loss of the callback path (e.g. by later network configuration
7303	   change) will have the same effect.  A recall request will fail and
7304	   revocation of the delegation will result.

7306	   A client normally finds out about revocation of a delegation when it
7307	   uses a stateid associated with a delegation and receives the error
7308	   NFS4ERR_EXPIRED.  It also may find out about delegation revocation
7309	   after a client reboot when it attempts to reclaim a delegation and
7310	   receives that same error.  Note that in the case of a revoked write
7311	   open delegation, there are issues because data may have been modified
7312	   by the client whose delegation is revoked and separately by other
7313	   clients.  See the section "Revocation Recovery for Write Open
7314	   Delegation" for a discussion of such issues.  Note also that when
7315	   delegations are revoked, information about the revoked delegation
7316	   will be written by the server to stable storage (as described in the
7317	   section "Crash Recovery").  This is done to deal with the case in
7318	   which a server reboots after revoking a delegation but before the
7319	   client holding the revoked delegation is notified about the
7320	   revocation.

7322	9.3.  Data Caching

7324	   When applications share access to a set of files, they need to be
7325	   implemented so as to take account of the possibility of conflicting
7326	   access by another application.  This is true whether the applications
7327	   in question execute on different clients or reside on the same
7328	   client.

7330	   Share reservations and record locks are the facilities the NFS
7331	   version 4 protocol provides to allow applications to coordinate
7332	   access by providing mutual exclusion facilities.  The NFS version 4
7333	   protocol's data caching must be implemented such that it does not
7334	   invalidate the assumptions that those using these facilities depend
7335	   upon.

7337	9.3.1.  Data Caching and OPENs

7339	   In order to avoid invalidating the sharing assumptions that
7340	   applications rely on, NFS version 4 clients should not provide cached
7341	   data to applications or modify it on behalf of an application when it
7342	   would not be valid to obtain or modify that same data via a READ or
7343	   WRITE operation.

7345	   Furthermore, in the absence of open delegation (see the section "Open
7346	   Delegation") two additional rules apply.  Note that these rules are
7347	   obeyed in practice by many NFS version 2 and version 3 clients.

7349	   o  First, cached data present on a client must be revalidated after
7350	      doing an OPEN.  Revalidating means that the client fetches the
7351	      change attribute from the server, compares it with the cached
7352	      change attribute, and if different, declares the cached data (as
7353	      well as the cached attributes) as invalid.  This is to ensure that
7354	      the data for the OPENed file is still correctly reflected in the
7355	      client's cache.  This validation must be done at least when the
7356	      client's OPEN operation includes DENY=WRITE or BOTH thus
7357	      terminating a period in which other clients may have had the
7358	      opportunity to open the file with WRITE access.  Clients may
7359	      choose to do the revalidation more often (i.e. at OPENs specifying
7360	      DENY=NONE) to parallel the NFS version 3 protocol's practice for
7361	      the benefit of users assuming this degree of cache revalidation.

7363	      Since the change attribute is updated for data and metadata
7364	      modifications, some client implementors may be tempted to use the
7365	      time_modify attribute and not change to validate cached data, so
7366	      that metadata changes do not spuriously invalidate clean data.
7367	      The implementor is cautioned in this approach.  The change
7368	      attribute is guaranteed to change for each update to the file,
7369	      whereas time_modify is guaranteed to change only at the
7370	      granularity of the time_delta attribute.  Use by the client's data
7371	      cache validation logic of time_modify and not change runs the risk
7372	      of the client incorrectly marking stale data as valid.

7374	   o  Second, modified data must be flushed to the server before closing
7375	      a file OPENed for write.  This is complementary to the first rule.
7376	      If the data is not flushed at CLOSE, the revalidation done after
7377	      client OPENs as file is unable to achieve its purpose.  The other
7378	      aspect to flushing the data before close is that the data must be
7379	      committed to stable storage, at the server, before the CLOSE
7380	      operation is requested by the client.  In the case of a server
7381	      reboot or restart and a CLOSEd file, it may not be possible to
7382	      retransmit the data to be written to the file.  Hence, this
7383	      requirement.

7385	9.3.2.  Data Caching and File Locking

7387	   For those applications that choose to use file locking instead of
7388	   share reservations to exclude inconsistent file access, there is an
7389	   analogous set of constraints that apply to client side data caching.
7390	   These rules are effective only if the file locking is used in a way
7391	   that matches in an equivalent way the actual READ and WRITE
7392	   operations executed.  This is as opposed to file locking that is
7393	   based on pure convention.  For example, it is possible to manipulate
7394	   a two-megabyte file by dividing the file into two one-megabyte
7395	   regions and protecting access to the two regions by file locks on
7396	   octets zero and one.  A lock for write on octet zero of the file
7397	   would represent the right to do READ and WRITE operations on the
7398	   first region.  A lock for write on octet one of the file would
7399	   represent the right to do READ and WRITE operations on the second
7400	   region.  As long as all applications manipulating the file obey this
7401	   convention, they will work on a local file system.  However, they may
7402	   not work with the NFS version 4 protocol unless clients refrain from
7403	   data caching.

7405	   The rules for data caching in the file locking environment are:

7407	   o  First, when a client obtains a file lock for a particular region,
7408	      the data cache corresponding to that region (if any cache data
7409	      exists) must be revalidated.  If the change attribute indicates
7410	      that the file may have been updated since the cached data was
7411	      obtained, the client must flush or invalidate the cached data for
7412	      the newly locked region.  A client might choose to invalidate all
7413	      of non-modified cached data that it has for the file but the only
7414	      requirement for correct operation is to invalidate all of the data
7415	      in the newly locked region.

7417	   o  Second, before releasing a write lock for a region, all modified
7418	      data for that region must be flushed to the server.  The modified
7419	      data must also be written to stable storage.

7421	   Note that flushing data to the server and the invalidation of cached
7422	   data must reflect the actual octet ranges locked or unlocked.
7423	   Rounding these up or down to reflect client cache block boundaries
7424	   will cause problems if not carefully done.  For example, writing a
7425	   modified block when only half of that block is within an area being
7426	   unlocked may cause invalid modification to the region outside the
7427	   unlocked area.  This, in turn, may be part of a region locked by
7428	   another client.  Clients can avoid this situation by synchronously
7429	   performing portions of write operations that overlap that portion
7430	   (initial or final) that is not a full block.  Similarly, invalidating
7431	   a locked area which is not an integral number of full buffer blocks
7432	   would require the client to read one or two partial blocks from the
7433	   server if the revalidation procedure shows that the data which the
7434	   client possesses may not be valid.

7436	   The data that is written to the server as a prerequisite to the
7437	   unlocking of a region must be written, at the server, to stable
7438	   storage.  The client may accomplish this either with synchronous
7439	   writes or by following asynchronous writes with a COMMIT operation.
7440	   This is required because retransmission of the modified data after a
7441	   server reboot might conflict with a lock held by another client.

7443	   A client implementation may choose to accommodate applications which
7444	   use record locking in non-standard ways (e.g. using a record lock as
7445	   a global semaphore) by flushing to the server more data upon an LOCKU
7446	   than is covered by the locked range.  This may include modified data
7447	   within files other than the one for which the unlocks are being done.
7448	   In such cases, the client must not interfere with applications whose
7449	   READs and WRITEs are being done only within the bounds of record
7450	   locks which the application holds.  For example, an application locks
7451	   a single octet of a file and proceeds to write that single octet.  A
7452	   client that chose to handle a LOCKU by flushing all modified data to
7453	   the server could validly write that single octet in response to an
7454	   unrelated unlock.  However, it would not be valid to write the entire
7455	   block in which that single written octet was located since it
7456	   includes an area that is not locked and might be locked by another
7457	   client.  Client implementations can avoid this problem by dividing
7458	   files with modified data into those for which all modifications are
7459	   done to areas covered by an appropriate record lock and those for
7460	   which there are modifications not covered by a record lock.  Any
7461	   writes done for the former class of files must not include areas not
7462	   locked and thus not modified on the client.

7464	9.3.3.  Data Caching and Mandatory File Locking

7466	   Client side data caching needs to respect mandatory file locking when
7467	   it is in effect.  The presence of mandatory file locking for a given
7468	   file is indicated when the client gets back NFS4ERR_LOCKED from a
7469	   READ or WRITE on a file it has an appropriate share reservation for.
7470	   When mandatory locking is in effect for a file, the client must check
7471	   for an appropriate file lock for data being read or written.  If a
7472	   lock exists for the range being read or written, the client may
7473	   satisfy the request using the client's validated cache.  If an
7474	   appropriate file lock is not held for the range of the read or write,
7475	   the read or write request must not be satisfied by the client's cache
7476	   and the request must be sent to the server for processing.  When a
7477	   read or write request partially overlaps a locked region, the request
7478	   should be subdivided into multiple pieces with each region (locked or
7479	   not) treated appropriately.

7481	9.3.4.  Data Caching and File Identity

7483	   When clients cache data, the file data needs to be organized
7484	   according to the file system object to which the data belongs.  For
7485	   NFS version 3 clients, the typical practice has been to assume for
7486	   the purpose of caching that distinct filehandles represent distinct
7487	   file system objects.  The client then has the choice to organize and
7488	   maintain the data cache on this basis.

7490	   In the NFS version 4 protocol, there is now the possibility to have
7491	   significant deviations from a "one filehandle per object" model
7492	   because a filehandle may be constructed on the basis of the object's
7493	   pathname.  Therefore, clients need a reliable method to determine if
7494	   two filehandles designate the same file system object.  If clients
7495	   were simply to assume that all distinct filehandles denote distinct
7496	   objects and proceed to do data caching on this basis, caching
7497	   inconsistencies would arise between the distinct client side objects
7498	   which mapped to the same server side object.

7500	   By providing a method to differentiate filehandles, the NFS version 4
7501	   protocol alleviates a potential functional regression in comparison
7502	   with the NFS version 3 protocol.  Without this method, caching
7503	   inconsistencies within the same client could occur and this has not
7504	   been present in previous versions of the NFS protocol.  Note that it
7505	   is possible to have such inconsistencies with applications executing
7506	   on multiple clients but that is not the issue being addressed here.

7508	   For the purposes of data caching, the following steps allow an NFS
7509	   version 4 client to determine whether two distinct filehandles denote
7510	   the same server side object:

7512	   o  If GETATTR directed to two filehandles returns different values of
7513	      the fsid attribute, then the filehandles represent distinct
7514	      objects.

7516	   o  If GETATTR for any file with an fsid that matches the fsid of the
7517	      two filehandles in question returns a unique_handles attribute
7518	      with a value of TRUE, then the two objects are distinct.

7520	   o  If GETATTR directed to the two filehandles does not return the
7521	      fileid attribute for both of the handles, then it cannot be
7522	      determined whether the two objects are the same.  Therefore,
7523	      operations which depend on that knowledge (e.g. client side data
7524	      caching) cannot be done reliably.  Note that if GETATTR does not
7525	      return the fileid attribute for both filehandles, it will return
7526	      it for neither of the filehandles, since the fsid for both
7527	      filehandles is the same.

7529	   o  If GETATTR directed to the two filehandles returns different
7530	      values for the fileid attribute, then they are distinct objects.

7532	   o  Otherwise they are the same object.

7534	9.4.  Open Delegation

7536	   When a file is being OPENed, the server may delegate further handling
7537	   of opens and closes for that file to the opening client.  Any such
7538	   delegation is recallable, since the circumstances that allowed for
7539	   the delegation are subject to change.  In particular, the server may
7540	   receive a conflicting OPEN from another client, the server must
7541	   recall the delegation before deciding whether the OPEN from the other
7542	   client may be granted.  Making a delegation is up to the server and
7543	   clients should not assume that any particular OPEN either will or
7544	   will not result in an open delegation.  The following is a typical
7545	   set of conditions that servers might use in deciding whether OPEN
7546	   should be delegated:

7548	   o  The client must be able to respond to the server's callback
7549	      requests.  The server will use the CB_NULL procedure for a test of
7550	      callback ability.

7552	   o  The client must have responded properly to previous recalls.

7554	   o  There must be no current open conflicting with the requested
7555	      delegation.

7557	   o  There should be no current delegation that conflicts with the
7558	      delegation being requested.

7560	   o  The probability of future conflicting open requests should be low
7561	      based on the recent history of the file.

7563	   o  The existence of any server-specific semantics of OPEN/CLOSE that
7564	      would make the required handling incompatible with the prescribed
7565	      handling that the delegated client would apply (see below).

7567	   There are two types of open delegations, read and write.  A read open
7568	   delegation allows a client to handle, on its own, requests to open a
7569	   file for reading that do not deny read access to others.  Multiple
7570	   read open delegations may be outstanding simultaneously and do not
7571	   conflict.  A write open delegation allows the client to handle, on
7572	   its own, all opens.  Only one write open delegation may exist for a
7573	   given file at a given time and it is inconsistent with any read open
7574	   delegations.

7576	   When a client has a read open delegation, it may not make any changes
7577	   to the contents or attributes of the file but it is assured that no
7578	   other client may do so.  When a client has a write open delegation,
7579	   it may modify the file data since no other client will be accessing
7580	   the file's data.  The client holding a write delegation may only
7581	   affect file attributes which are intimately connected with the file
7582	   data: size, time_modify, change.

7584	   When a client has an open delegation, it does not send OPENs or
7585	   CLOSEs to the server but updates the appropriate status internally.
7586	   For a read open delegation, opens that cannot be handled locally
7587	   (opens for write or that deny read access) must be sent to the
7588	   server.

7590	   When an open delegation is made, the response to the OPEN contains an
7591	   open delegation structure which specifies the following:

7593	   o  the type of delegation (read or write)

7595	   o  space limitation information to control flushing of data on close
7596	      (write open delegation only, see the section "Open Delegation and
7597	      Data Caching")

7599	   o  an nfsace4 specifying read and write permissions

7601	   o  a stateid to represent the delegation for READ and WRITE

7603	   The delegation stateid is separate and distinct from the stateid for
7604	   the OPEN proper.  The standard stateid, unlike the delegation
7605	   stateid, is associated with a particular lock_owner and will continue
7606	   to be valid after the delegation is recalled and the file remains
7607	   open.

7609	   When a request internal to the client is made to open a file and open
7610	   delegation is in effect, it will be accepted or rejected solely on
7611	   the basis of the following conditions.  Any requirement for other
7612	   checks to be made by the delegate should result in open delegation
7613	   being denied so that the checks can be made by the server itself.

7615	   o  The access and deny bits for the request and the file as described
7616	      in the section "Share Reservations".

7618	   o  The read and write permissions as determined below.

7620	   The nfsace4 passed with delegation can be used to avoid frequent
7621	   ACCESS calls.  The permission check should be as follows:

7623	   o  If the nfsace4 indicates that the open may be done, then it should
7624	      be granted without reference to the server.

7626	   o  If the nfsace4 indicates that the open may not be done, then an
7627	      ACCESS request must be sent to the server to obtain the definitive
7628	      answer.

7630	   The server may return an nfsace4 that is more restrictive than the
7631	   actual ACL of the file.  This includes an nfsace4 that specifies
7632	   denial of all access.  Note that some common practices such as
7633	   mapping the traditional user "root" to the user "nobody" may make it
7634	   incorrect to return the actual ACL of the file in the delegation
7635	   response.

7637	   The use of delegation together with various other forms of caching
7638	   creates the possibility that no server authentication will ever be
7639	   performed for a given user since all of the user's requests might be
7640	   satisfied locally.  Where the client is depending on the server for
7641	   authentication, the client should be sure authentication occurs for
7642	   each user by use of the ACCESS operation.  This should be the case
7643	   even if an ACCESS operation would not be required otherwise.  As
7644	   mentioned before, the server may enforce frequent authentication by
7645	   returning an nfsace4 denying all access with every open delegation.

7647	9.4.1.  Open Delegation and Data Caching

7649	   OPEN delegation allows much of the message overhead associated with
7650	   the opening and closing files to be eliminated.  An open when an open
7651	   delegation is in effect does not require that a validation message be
7652	   sent to the server.  The continued endurance of the "read open
7653	   delegation" provides a guarantee that no OPEN for write and thus no
7654	   write has occurred.  Similarly, when closing a file opened for write
7655	   and if write open delegation is in effect, the data written does not
7656	   have to be flushed to the server until the open delegation is
7657	   recalled.  The continued endurance of the open delegation provides a
7658	   guarantee that no open and thus no read or write has been done by
7659	   another client.

7661	   For the purposes of open delegation, READs and WRITEs done without an
7662	   OPEN are treated as the functional equivalents of a corresponding
7663	   type of OPEN.  This refers to the READs and WRITEs that use the
7664	   special stateids consisting of all zero bits or all one bits.
7665	   Therefore, READs or WRITEs with a special stateid done by another
7666	   client will force the server to recall a write open delegation.  A
7667	   WRITE with a special stateid done by another client will force a
7668	   recall of read open delegations.

7670	   With delegations, a client is able to avoid writing data to the
7671	   server when the CLOSE of a file is serviced.  The file close system
7672	   call is the usual point at which the client is notified of a lack of
7673	   stable storage for the modified file data generated by the
7674	   application.  At the close, file data is written to the server and
7675	   through normal accounting the server is able to determine if the
7676	   available file system space for the data has been exceeded (i.e.
7677	   server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT).  This accounting
7678	   includes quotas.  The introduction of delegations requires that a
7679	   alternative method be in place for the same type of communication to
7680	   occur between client and server.

7682	   In the delegation response, the server provides either the limit of
7683	   the size of the file or the number of modified blocks and associated
7684	   block size.  The server must ensure that the client will be able to
7685	   flush data to the server of a size equal to that provided in the
7686	   original delegation.  The server must make this assurance for all
7687	   outstanding delegations.  Therefore, the server must be careful in
7688	   its management of available space for new or modified data taking
7689	   into account available file system space and any applicable quotas.
7690	   The server can recall delegations as a result of managing the
7691	   available file system space.  The client should abide by the server's
7692	   state space limits for delegations.  If the client exceeds the stated
7693	   limits for the delegation, the server's behavior is undefined.

7695	   Based on server conditions, quotas or available file system space,
7696	   the server may grant write open delegations with very restrictive
7697	   space limitations.  The limitations may be defined in a way that will
7698	   always force modified data to be flushed to the server on close.

7700	   With respect to authentication, flushing modified data to the server
7701	   after a CLOSE has occurred may be problematic.  For example, the user
7702	   of the application may have logged off the client and unexpired
7703	   authentication credentials may not be present.  In this case, the
7704	   client may need to take special care to ensure that local unexpired
7705	   credentials will in fact be available.  This may be accomplished by
7706	   tracking the expiration time of credentials and flushing data well in
7707	   advance of their expiration or by making private copies of
7708	   credentials to assure their availability when needed.

7710	9.4.2.  Open Delegation and File Locks

7712	   When a client holds a write open delegation, lock operations are
7713	   performed locally.  This includes those required for mandatory file
7714	   locking.  This can be done since the delegation implies that there
7715	   can be no conflicting locks.  Similarly, all of the revalidations
7716	   that would normally be associated with obtaining locks and the
7717	   flushing of data associated with the releasing of locks need not be
7718	   done.

7720	   When a client holds a read open delegation, lock operations are not
7721	   performed locally.  All lock operations, including those requesting
7722	   non-exclusive locks, are sent to the server for resolution.

7724	9.4.3.  Handling of CB_GETATTR

7726	   The server needs to employ special handling for a GETATTR where the
7727	   target is a file that has a write open delegation in effect.  The
7728	   reason for this is that the client holding the write delegation may
7729	   have modified the data and the server needs to reflect this change to
7730	   the second client that submitted the GETATTR.  Therefore, the client
7731	   holding the write delegation needs to be interrogated.  The server
7732	   will use the CB_GETATTR operation.  The only attributes that the
7733	   server can reliably query via CB_GETATTR are size and change.

7735	   Since CB_GETATTR is being used to satisfy another client's GETATTR
7736	   request, the server only needs to know if the client holding the
7737	   delegation has a modified version of the file.  If the client's copy
7738	   of the delegated file is not modified (data or size), the server can
7739	   satisfy the second client's GETATTR request from the attributes
7740	   stored locally at the server.  If the file is modified, the server
7741	   only needs to know about this modified state.  If the server
7742	   determines that the file is currently modified, it will respond to
7743	   the second client's GETATTR as if the file had been modified locally
7744	   at the server.

7746	   Since the form of the change attribute is determined by the server
7747	   and is opaque to the client, the client and server need to agree on a
7748	   method of communicating the modified state of the file.  For the size
7749	   attribute, the client will report its current view of the file size.
7750	   For the change attribute, the handling is more involved.

7752	   For the client, the following steps will be taken when receiving a
7753	   write delegation:

7755	   o  The value of the change attribute will be obtained from the server
7756	      and cached.  Let this value be represented by c.

7758	   o  The client will create a value greater than c that will be used
7759	      for communicating modified data is held at the client.  Let this
7760	      value be represented by d.

7762	   o  When the client is queried via CB_GETATTR for the change
7763	      attribute, it checks to see if it holds modified data.  If the
7764	      file is modified, the value d is returned for the change attribute
7765	      value.  If this file is not currently modified, the client returns
7766	      the value c for the change attribute.

7768	   For simplicity of implementation, the client MAY for each CB_GETATTR
7769	   return the same value d.  This is true even if, between successive
7770	   CB_GETATTR operations, the client again modifies in the file's data
7771	   or metadata in its cache.  The client can return the same value
7772	   because the only requirement is that the client be able to indicate
7773	   to the server that the client holds modified data.  Therefore, the
7774	   value of d may always be c + 1.

7776	   While the change attribute is opaque to the client in the sense that
7777	   it has no idea what units of time, if any, the server is counting
7778	   change with, it is not opaque in that the client has to treat it as
7779	   an unsigned integer, and the server has to be able to see the results
7780	   of the client's changes to that integer.  Therefore, the server MUST
7781	   encode the change attribute in network order when sending it to the
7782	   client.  The client MUST decode it from network order to its native
7783	   order when receiving it and the client MUST encode it network order
7784	   when sending it to the server.  For this reason, change is defined as
7785	   an unsigned integer rather than an opaque array of octets.

7787	   For the server, the following steps will be taken when providing a
7788	   write delegation:

7790	   o  Upon providing a write delegation, the server will cache a copy of
7791	      the change attribute in the data structure it uses to record the
7792	      delegation.  Let this value be represented by sc.

7794	   o  When a second client sends a GETATTR operation on the same file to
7795	      the server, the server obtains the change attribute from the first
7796	      client.  Let this value be cc.

7798	   o  If the value cc is equal to sc, the file is not modified and the
7799	      server returns the current values for change, time_metadata, and
7800	      time_modify (for example) to the second client.

7802	   o  If the value cc is NOT equal to sc, the file is currently modified
7803	      at the first client and most likely will be modified at the server
7804	      at a future time.  The server then uses its current time to
7805	      construct attribute values for time_metadata and time_modify.  A
7806	      new value of sc, which we will call nsc, is computed by the
7807	      server, such that nsc >= sc + 1.  The server then returns the
7808	      constructed time_metadata, time_modify, and nsc values to the
7809	      requester.  The server replaces sc in the delegation record with
7810	      nsc.  To prevent the possibility of time_modify, time_metadata,
7811	      and change from appearing to go backward (which would happen if
7812	      the client holding the delegation fails to write its modified data
7813	      to the server before the delegation is revoked or returned), the
7814	      server SHOULD update the file's metadata record with the
7815	      constructed attribute values.  For reasons of reasonable
7816	      performance, committing the constructed attribute values to stable
7817	      storage is OPTIONAL.

7819	   As discussed earlier in this section, the client MAY return the same
7820	   cc value on subsequent CB_GETATTR calls, even if the file was
7821	   modified in the client's cache yet again between successive
7822	   CB_GETATTR calls.  Therefore, the server must assume that the file
7823	   has been modified yet again, and MUST take care to ensure that the
7824	   new nsc it constructs and returns is greater than the previous nsc it
7825	   returned.  An example implementation's delegation record would
7826	   satisfy this mandate by including a boolean field (let us call it
7827	   "modified") that is set to false when the delegation is granted, and
7828	   an sc value set at the time of grant to the change attribute value.
7829	   The modified field would be set to true the first time cc != sc, and
7830	   would stay true until the delegation is returned or revoked.  The
7831	   processing for constructing nsc, time_modify, and time_metadata would
7832	   use this pseudo code:

7834	   if (!modified) {
7835	       do CB_GETATTR for change and size;

7837	       if (cc != sc)
7838	           modified = TRUE;
7839	   } else {
7840	       do CB_GETATTR for size;
7841	   }

7843	   if (modified) {
7844	       sc = sc + 1;
7845	       time_modify = time_metadata = current_time;
7846	       update sc, time_modify, time_metadata into file's metadata;
7847	   }

7849	   return to client (that sent GETATTR) the attributes
7850	   it requested, but make sure size comes from what
7851	   CB_GETATTR returned. Do not update the file's metadata
7852	   with the client's modified size.

7854	   In the case that the file attribute size is different than the
7855	   server's current value, the server treats this as a modification
7856	   regardless of the value of the change attribute retrieved via
7857	   CB_GETATTR and responds to the second client as in the last step.

7859	   This methodology resolves issues of clock differences between client
7860	   and server and other scenarios where the use of CB_GETATTR break
7861	   down.

7863	   It should be noted that the server is under no obligation to use
7864	   CB_GETATTR and therefore the server MAY simply recall the delegation
7865	   to avoid its use.

7867	9.4.4.  Recall of Open Delegation

7869	   The following events necessitate recall of an open delegation:

7871	   o  Potentially conflicting OPEN request (or READ/WRITE done with
7872	      "special" stateid)

7874	   o  SETATTR issued by another client

7876	   o  REMOVE request for the file

7878	   o  RENAME request for the file as either source or target of the
7879	      RENAME

7881	   Whether a RENAME of a directory in the path leading to the file
7882	   results in recall of an open delegation depends on the semantics of
7883	   the server file system.  If that file system denies such RENAMEs when
7884	   a file is open, the recall must be performed to determine whether the
7885	   file in question is, in fact, open.

7887	   In addition to the situations above, the server may choose to recall
7888	   open delegations at any time if resource constraints make it
7889	   advisable to do so.  Clients should always be prepared for the
7890	   possibility of recall.

7892	   When a client receives a recall for an open delegation, it needs to
7893	   update state on the server before returning the delegation.  These
7894	   same updates must be done whenever a client chooses to return a
7895	   delegation voluntarily.  The following items of state need to be
7896	   dealt with:

7898	   o  If the file associated with the delegation is no longer open and
7899	      no previous CLOSE operation has been sent to the server, a CLOSE
7900	      operation must be sent to the server.

7902	   o  If a file has other open references at the client, then OPEN
7903	      operations must be sent to the server.  The appropriate stateids
7904	      will be provided by the server for subsequent use by the client
7905	      since the delegation stateid will not longer be valid.  These OPEN
7906	      requests are done with the claim type of CLAIM_DELEGATE_CUR.  This
7907	      will allow the presentation of the delegation stateid so that the
7908	      client can establish the appropriate rights to perform the OPEN.
7909	      (see the section "Operation 18: OPEN" for details.)

7911	   o  If there are granted file locks, the corresponding LOCK operations
7912	      need to be performed.  This applies to the write open delegation
7913	      case only.

7915	   o  For a write open delegation, if at the time of recall the file is
7916	      not open for write, all modified data for the file must be flushed
7917	      to the server.  If the delegation had not existed, the client
7918	      would have done this data flush before the CLOSE operation.

7920	   o  For a write open delegation when a file is still open at the time
7921	      of recall, any modified data for the file needs to be flushed to
7922	      the server.

7924	   o  With the write open delegation in place, it is possible that the
7925	      file was truncated during the duration of the delegation.  For
7926	      example, the truncation could have occurred as a result of an OPEN
7927	      UNCHECKED with a size attribute value of zero.  Therefore, if a
7928	      truncation of the file has occurred and this operation has not
7929	      been propagated to the server, the truncation must occur before
7930	      any modified data is written to the server.

7932	   In the case of write open delegation, file locking imposes some
7933	   additional requirements.  To precisely maintain the associated
7934	   invariant, it is required to flush any modified data in any region
7935	   for which a write lock was released while the write delegation was in
7936	   effect.  However, because the write open delegation implies no other
7937	   locking by other clients, a simpler implementation is to flush all
7938	   modified data for the file (as described just above) if any write
7939	   lock has been released while the write open delegation was in effect.

7941	   An implementation need not wait until delegation recall (or deciding
7942	   to voluntarily return a delegation) to perform any of the above
7943	   actions, if implementation considerations (e.g. resource availability
7944	   constraints) make that desirable.  Generally, however, the fact that
7945	   the actual open state of the file may continue to change makes it not
7946	   worthwhile to send information about opens and closes to the server,
7947	   except as part of delegation return.  Only in the case of closing the
7948	   open that resulted in obtaining the delegation would clients be
7949	   likely to do this early, since, in that case, the close once done
7950	   will not be undone.  Regardless of the client's choices on scheduling
7951	   these actions, all must be performed before the delegation is
7952	   returned, including (when applicable) the close that corresponds to
7953	   the open that resulted in the delegation.  These actions can be
7954	   performed either in previous requests or in previous operations in
7955	   the same COMPOUND request.

7957	9.4.5.  Clients that Fail to Honor Delegation Recalls

7959	   A client may fail to respond to a recall for various reasons, such as
7960	   a failure of the callback path from server to the client.  The client
7961	   may be unaware of a failure in the callback path.  This lack of
7962	   awareness could result in the client finding out long after the
7963	   failure that its delegation has been revoked, and another client has
7964	   modified the data for which the client had a delegation.  This is
7965	   especially a problem for the client that held a write delegation.

7967	   The server also has a dilemma in that the client that fails to
7968	   respond to the recall might also be sending other NFS requests,
7969	   including those that renew the lease before the lease expires.
7970	   Without returning an error for those lease renewing operations, the
7971	   server leads the client to believe that the delegation it has is in
7972	   force.

7974	   This difficulty is solved by the following rules:

7976	   o  When the callback path is down, the server MUST NOT revoke the
7977	      delegation if one of the following occurs:

7979	      *  The client has issued a RENEW operation and the server has
7980	         returned an NFS4ERR_CB_PATH_DOWN error.  The server MUST renew
7981	         the lease for any record locks and share reservations the
7982	         client has that the server has known about (as opposed to those
7983	         locks and share reservations the client has established but not
7984	         yet sent to the server, due to the delegation).  The server
7985	         SHOULD give the client a reasonable time to return its
7986	         delegations to the server before revoking the client's
7987	         delegations.

7989	      *  The client has not issued a RENEW operation for some period of
7990	         time after the server attempted to recall the delegation.  This
7991	         period of time MUST NOT be less than the value of the
7992	         lease_time attribute.

7994	   o  When the client holds a delegation, it can not rely on operations,
7995	      except for RENEW, that take a stateid, to renew delegation leases
7996	      across callback path failures.  The client that wants to keep
7997	      delegations in force across callback path failures must use RENEW
7998	      to do so.

8000	9.4.6.  Delegation Revocation

8002	   At the point a delegation is revoked, if there are associated opens
8003	   on the client, the applications holding these opens need to be
8004	   notified.  This notification usually occurs by returning errors for
8005	   READ/WRITE operations or when a close is attempted for the open file.

8007	   If no opens exist for the file at the point the delegation is
8008	   revoked, then notification of the revocation is unnecessary.
8009	   However, if there is modified data present at the client for the
8010	   file, the user of the application should be notified.  Unfortunately,
8011	   it may not be possible to notify the user since active applications
8012	   may not be present at the client.  See the section "Revocation
8013	   Recovery for Write Open Delegation" for additional details.

8015	9.5.  Data Caching and Revocation

8017	   When locks and delegations are revoked, the assumptions upon which
8018	   successful caching depend are no longer guaranteed.  For any locks or
8019	   share reservations that have been revoked, the corresponding owner
8020	   needs to be notified.  This notification includes applications with a
8021	   file open that has a corresponding delegation which has been revoked.
8022	   Cached data associated with the revocation must be removed from the
8023	   client.  In the case of modified data existing in the client's cache,
8024	   that data must be removed from the client without it being written to
8025	   the server.  As mentioned, the assumptions made by the client are no
8026	   longer valid at the point when a lock or delegation has been revoked.

8028	   For example, another client may have been granted a conflicting lock
8029	   after the revocation of the lock at the first client.  Therefore, the
8030	   data within the lock range may have been modified by the other
8031	   client.  Obviously, the first client is unable to guarantee to the
8032	   application what has occurred to the file in the case of revocation.

8034	   Notification to a lock owner will in many cases consist of simply
8035	   returning an error on the next and all subsequent READs/WRITEs to the
8036	   open file or on the close.  Where the methods available to a client
8037	   make such notification impossible because errors for certain
8038	   operations may not be returned, more drastic action such as signals
8039	   or process termination may be appropriate.  The justification for
8040	   this is that an invariant for which an application depends on may be
8041	   violated.  Depending on how errors are typically treated for the
8042	   client operating environment, further levels of notification
8043	   including logging, console messages, and GUI pop-ups may be
8044	   appropriate.

8046	9.5.1.  Revocation Recovery for Write Open Delegation

8048	   Revocation recovery for a write open delegation poses the special
8049	   issue of modified data in the client cache while the file is not
8050	   open.  In this situation, any client which does not flush modified
8051	   data to the server on each close must ensure that the user receives
8052	   appropriate notification of the failure as a result of the
8053	   revocation.  Since such situations may require human action to
8054	   correct problems, notification schemes in which the appropriate user
8055	   or administrator is notified may be necessary.  Logging and console
8056	   messages are typical examples.

8058	   If there is modified data on the client, it must not be flushed
8059	   normally to the server.  A client may attempt to provide a copy of
8060	   the file data as modified during the delegation under a different
8061	   name in the file system name space to ease recovery.  Note that when
8062	   the client can determine that the file has not been modified by any
8063	   other client, or when the client has a complete cached copy of file
8064	   in question, such a saved copy of the client's view of the file may
8065	   be of particular value for recovery.  In other case, recovery using a
8066	   copy of the file based partially on the client's cached data and
8067	   partially on the server copy as modified by other clients, will be
8068	   anything but straightforward, so clients may avoid saving file
8069	   contents in these situations or mark the results specially to warn
8070	   users of possible problems.

8072	   Saving of such modified data in delegation revocation situations may
8073	   be limited to files of a certain size or might be used only when
8074	   sufficient disk space is available within the target file system.
8075	   Such saving may also be restricted to situations when the client has
8076	   sufficient buffering resources to keep the cached copy available
8077	   until it is properly stored to the target file system.

8079	9.6.  Attribute Caching

8081	   The attributes discussed in this section do not include named
8082	   attributes.  Individual named attributes are analogous to files and
8083	   caching of the data for these needs to be handled just as data
8084	   caching is for ordinary files.  Similarly, LOOKUP results from an
8085	   OPENATTR directory are to be cached on the same basis as any other
8086	   pathnames and similarly for directory contents.

8088	   Clients may cache file attributes obtained from the server and use
8089	   them to avoid subsequent GETATTR requests.  Such caching is write
8090	   through in that modification to file attributes is always done by
8091	   means of requests to the server and should not be done locally and
8092	   cached.  The exception to this are modifications to attributes that
8093	   are intimately connected with data caching.  Therefore, extending a
8094	   file by writing data to the local data cache is reflected immediately
8095	   in the size as seen on the client without this change being
8096	   immediately reflected on the server.  Normally such changes are not
8097	   propagated directly to the server but when the modified data is
8098	   flushed to the server, analogous attribute changes are made on the
8099	   server.  When open delegation is in effect, the modified attributes
8100	   may be returned to the server in the response to a CB_RECALL call.

8102	   The result of local caching of attributes is that the attribute
8103	   caches maintained on individual clients will not be coherent.
8104	   Changes made in one order on the server may be seen in a different
8105	   order on one client and in a third order on a different client.

8107	   The typical file system application programming interfaces do not
8108	   provide means to atomically modify or interrogate attributes for
8109	   multiple files at the same time.  The following rules provide an
8110	   environment where the potential incoherences mentioned above can be
8111	   reasonably managed.  These rules are derived from the practice of
8112	   previous NFS protocols.

8114	   o  All attributes for a given file (per-fsid attributes excepted) are
8115	      cached as a unit at the client so that no non-serializability can
8116	      arise within the context of a single file.

8118	   o  An upper time boundary is maintained on how long a client cache
8119	      entry can be kept without being refreshed from the server.

8121	   o  When operations are performed that change attributes at the
8122	      server, the updated attribute set is requested as part of the
8123	      containing RPC.  This includes directory operations that update
8124	      attributes indirectly.  This is accomplished by following the
8125	      modifying operation with a GETATTR operation and then using the
8126	      results of the GETATTR to update the client's cached attributes.

8128	   Note that if the full set of attributes to be cached is requested by
8129	   READDIR, the results can be cached by the client on the same basis as
8130	   attributes obtained via GETATTR.

8132	   A client may validate its cached version of attributes for a file by
8133	   fetching just both the change and time_access attributes and assuming
8134	   that if the change attribute has the same value as it did when the
8135	   attributes were cached, then no attributes other than time_access
8136	   have changed.  The reason why time_access is also fetched is because
8137	   many servers operate in environments where the operation that updates
8138	   change does not update time_access.  For example, POSIX file
8139	   semantics do not update access time when a file is modified by the
8140	   write system call.  Therefore, the client that wants a current
8141	   time_access value should fetch it with change during the attribute
8142	   cache validation processing and update its cached time_access.

8144	   The client may maintain a cache of modified attributes for those
8145	   attributes intimately connected with data of modified regular files
8146	   (size, time_modify, and change).  Other than those three attributes,
8147	   the client MUST NOT maintain a cache of modified attributes.
8148	   Instead, attribute changes are immediately sent to the server.

8150	   In some operating environments, the equivalent to time_access is
8151	   expected to be implicitly updated by each read of the content of the
8152	   file object.  If an NFS client is caching the content of a file
8153	   object, whether it is a regular file, directory, or symbolic link,
8154	   the client SHOULD NOT update the time_access attribute (via SETATTR
8155	   or a small READ or READDIR request) on the server with each read that
8156	   is satisfied from cache.  The reason is that this can defeat the
8157	   performance benefits of caching content, especially since an explicit
8158	   SETATTR of time_access may alter the change attribute on the server.
8159	   If the change attribute changes, clients that are caching the content
8160	   will think the content has changed, and will re-read unmodified data
8161	   from the server.  Nor is the client encouraged to maintain a modified
8162	   version of time_access in its cache, since this would mean that the
8163	   client will either eventually have to write the access time to the
8164	   server with bad performance effects, or it would never update the
8165	   server's time_access, thereby resulting in a situation where an
8166	   application that caches access time between a close and open of the
8167	   same file observes the access time oscillating between the past and
8168	   present.  The time_access attribute always means the time of last
8169	   access to a file by a read that was satisfied by the server.  This
8170	   way clients will tend to see only time_access changes that go forward
8171	   in time.

8173	9.7.  Data and Metadata Caching and Memory Mapped Files

8175	   Some operating environments include the capability for an application
8176	   to map a file's content into the application's address space.  Each
8177	   time the application accesses a memory location that corresponds to a
8178	   block that has not been loaded into the address space, a page fault
8179	   occurs and the file is read (or if the block does not exist in the
8180	   file, the block is allocated and then instantiated in the
8181	   application's address space).

8183	   As long as each memory mapped access to the file requires a page
8184	   fault, the relevant attributes of the file that are used to detect
8185	   access and modification (time_access, time_metadata, time_modify, and
8186	   change) will be updated.  However, in many operating environments,
8187	   when page faults are not required these attributes will not be
8188	   updated on reads or updates to the file via memory access (regardless
8189	   whether the file is local file or is being access remotely).  A
8190	   client or server MAY fail to update attributes of a file that is
8191	   being accessed via memory mapped I/O. This has several implications:

8193	   o  If there is an application on the server that has memory mapped a
8194	      file that a client is also accessing, the client may not be able
8195	      to get a consistent value of the change attribute to determine
8196	      whether its cache is stale or not.  A server that knows that the
8197	      file is memory mapped could always pessimistically return updated
8198	      values for change so as to force the application to always get the
8199	      most up to date data and metadata for the file.  However, due to
8200	      the negative performance implications of this, such behavior is
8201	      OPTIONAL.

8203	   o  If the memory mapped file is not being modified on the server, and
8204	      instead is just being read by an application via the memory mapped
8205	      interface, the client will not see an updated time_access
8206	      attribute.  However, in many operating environments, neither will
8207	      any process running on the server.  Thus NFS clients are at no
8208	      disadvantage with respect to local processes.

8210	   o  If there is another client that is memory mapping the file, and if
8211	      that client is holding a write delegation, the same set of issues
8212	      as discussed in the previous two bullet items apply.  So, when a
8213	      server does a CB_GETATTR to a file that the client has modified in
8214	      its cache, the response from CB_GETATTR will not necessarily be
8215	      accurate.  As discussed earlier, the client's obligation is to
8216	      report that the file has been modified since the delegation was
8217	      granted, not whether it has been modified again between successive
8218	      CB_GETATTR calls, and the server MUST assume that any file the
8219	      client has modified in cache has been modified again between
8220	      successive CB_GETATTR calls.  Depending on the nature of the
8221	      client's memory management system, this weak obligation may not be
8222	      possible.  A client MAY return stale information in CB_GETATTR
8223	      whenever the file is memory mapped.

8225	   o  The mixture of memory mapping and file locking on the same file is
8226	      problematic.  Consider the following scenario, where a page size
8227	      on each client is 8192 octets.

8229	      *  Client A memory maps first page (8192 octets) of file X

8231	      *  Client B memory maps first page (8192 octets) of file X

8233	      *  Client A write locks first 4096 octets

8235	      *  Client B write locks second 4096 octets

8237	      *  Client A, via a STORE instruction modifies part of its locked
8238	         region.

8240	      *  Simultaneous to client A, client B issues a STORE on part of
8241	         its locked region.

8243	   Here the challenge is for each client to resynchronize to get a
8244	   correct view of the first page.  In many operating environments, the
8245	   virtual memory management systems on each client only know a page is
8246	   modified, not that a subset of the page corresponding to the
8247	   respective lock regions has been modified.  So it is not possible for
8248	   each client to do the right thing, which is to only write to the
8249	   server that portion of the page that is locked.  For example, if
8250	   client A simply writes out the page, and then client B writes out the
8251	   page, client A's data is lost.

8253	   Moreover, if mandatory locking is enabled on the file, then we have a
8254	   different problem.  When clients A and B issue the STORE
8255	   instructions, the resulting page faults require a record lock on the
8256	   entire page.  Each client then tries to extend their locked range to
8257	   the entire page, which results in a deadlock.  Communicating the
8258	   NFS4ERR_DEADLOCK error to a STORE instruction is difficult at best.

8260	   If a client is locking the entire memory mapped file, there is no
8261	   problem with advisory or mandatory record locking, at least until the
8262	   client unlocks a region in the middle of the file.

8264	   Given the above issues the following are permitted:

8266	   o  Clients and servers MAY deny memory mapping a file they know there
8267	      are record locks for.

8269	   o  Clients and servers MAY deny a record lock on a file they know is
8270	      memory mapped.

8272	   o  A client MAY deny memory mapping a file that it knows requires
8273	      mandatory locking for I/O. If mandatory locking is enabled after
8274	      the file is opened and mapped, the client MAY deny the application
8275	      further access to its mapped file.

8277	9.8.  Name Caching

8279	   The results of LOOKUP and READDIR operations may be cached to avoid
8280	   the cost of subsequent LOOKUP operations.  Just as in the case of
8281	   attribute caching, inconsistencies may arise among the various client
8282	   caches.  To mitigate the effects of these inconsistencies and given
8283	   the context of typical file system APIs, an upper time boundary is
8284	   maintained on how long a client name cache entry can be kept without
8285	   verifying that the entry has not been made invalid by a directory
8286	   change operation performed by another client. .LP When a client is
8287	   not making changes to a directory for which there exist name cache
8288	   entries, the client needs to periodically fetch attributes for that
8289	   directory to ensure that it is not being modified.  After determining
8290	   that no modification has occurred, the expiration time for the
8291	   associated name cache entries may be updated to be the current time
8292	   plus the name cache staleness bound.

8294	   When a client is making changes to a given directory, it needs to
8295	   determine whether there have been changes made to the directory by
8296	   other clients.  It does this by using the change attribute as
8297	   reported before and after the directory operation in the associated
8298	   change_info4 value returned for the operation.  The server is able to
8299	   communicate to the client whether the change_info4 data is provided
8300	   atomically with respect to the directory operation.  If the change
8301	   values are provided atomically, the client is then able to compare
8302	   the pre-operation change value with the change value in the client's
8303	   name cache.  If the comparison indicates that the directory was
8304	   updated by another client, the name cache associated with the
8305	   modified directory is purged from the client.  If the comparison
8306	   indicates no modification, the name cache can be updated on the
8307	   client to reflect the directory operation and the associated timeout
8308	   extended.  The post-operation change value needs to be saved as the
8309	   basis for future change_info4 comparisons.

8311	   As demonstrated by the scenario above, name caching requires that the
8312	   client revalidate name cache data by inspecting the change attribute
8313	   of a directory at the point when the name cache item was cached.
8314	   This requires that the server update the change attribute for
8315	   directories when the contents of the corresponding directory is
8316	   modified.  For a client to use the change_info4 information
8317	   appropriately and correctly, the server must report the pre and post
8318	   operation change attribute values atomically.  When the server is
8319	   unable to report the before and after values atomically with respect
8320	   to the directory operation, the server must indicate that fact in the
8321	   change_info4 return value.  When the information is not atomically
8322	   reported, the client should not assume that other clients have not
8323	   changed the directory.

8325	9.9.  Directory Caching

8327	   The results of READDIR operations may be used to avoid subsequent
8328	   READDIR operations.  Just as in the cases of attribute and name
8329	   caching, inconsistencies may arise among the various client caches.
8330	   To mitigate the effects of these inconsistencies, and given the
8331	   context of typical file system APIs, the following rules should be
8332	   followed:

8334	   o  Cached READDIR information for a directory which is not obtained
8335	      in a single READDIR operation must always be a consistent snapshot
8336	      of directory contents.  This is determined by using a GETATTR
8337	      before the first READDIR and after the last of READDIR that
8338	      contributes to the cache.

8340	   o  An upper time boundary is maintained to indicate the length of
8341	      time a directory cache entry is considered valid before the client
8342	      must revalidate the cached information.

8344	   The revalidation technique parallels that discussed in the case of
8345	   name caching.  When the client is not changing the directory in
8346	   question, checking the change attribute of the directory with GETATTR
8347	   is adequate.  The lifetime of the cache entry can be extended at
8348	   these checkpoints.  When a client is modifying the directory, the
8349	   client needs to use the change_info4 data to determine whether there
8350	   are other clients modifying the directory.  If it is determined that
8351	   no other client modifications are occurring, the client may update
8352	   its directory cache to reflect its own changes.

8354	   As demonstrated previously, directory caching requires that the
8355	   client revalidate directory cache data by inspecting the change
8356	   attribute of a directory at the point when the directory was cached.
8357	   This requires that the server update the change attribute for
8358	   directories when the contents of the corresponding directory is
8359	   modified.  For a client to use the change_info4 information
8360	   appropriately and correctly, the server must report the pre and post
8361	   operation change attribute values atomically.  When the server is
8362	   unable to report the before and after values atomically with respect
8363	   to the directory operation, the server must indicate that fact in the
8364	   change_info4 return value.  When the information is not atomically
8365	   reported, the client should not assume that other clients have not
8366	   changed the directory.

8368	10.  Multi-Server Name Space

8370	   NFSv4.1 supports attributes that allow a namespace to extend beyond
8371	   the boundaries of a single server.  Use of such multi-server
8372	   namespaces is optional, and for many purposes, single-server
8373	   namespace are perfectly acceptable.  Use of multi-server namespaces
8374	   can provide many advantages, however, by separating a file system's
8375	   logical position in a name space from the (possibly changing)
8376	   logistical and administrative considerations that result in
8377	   particular file systems being located on particular servers.

8379	10.1.  Location attributes

8381	   NFSv4 contains recommended attributes that allow file systems on one
8382	   server to be associated with one or more instances of that file
8383	   system on other servers.  These attributes specify such file systems
8384	   by specifying a server name (either a DNS name or an IP address)
8385	   together with the path of that file system within that server's
8386	   single-server name space.

8388	   The fs_locations_info recommended attribute allows specification of
8389	   one more file systems instance locations where the data corresponding
8390	   to a given file system may be found.  This attribute provides to the
8391	   client, in addition to information about file system instance
8392	   locations, extensive information about the various file system
8393	   instance choices (e.g. priority for use, writability, currency, etc.)
8394	   as well as information to help the client efficiently effect as
8395	   seamless a transition as possible among multiple file system
8396	   instances, when and if that should be necessary.

8398	   The fs_locations recommended attribute is inherited from NFSv4.0 and
8399	   only allows specification of the file system locations where the data
8400	   corresponding to a given file system may be found.  Servers should
8401	   make this attribute available whenever fs_locations_info is
8402	   supported, but client use of fs_locations_info is to be preferred.

8404	10.2.  File System Presence or Absence

8406	   A given location in an NFSv4 namespace (typically but not necessarily
8407	   a multi-server namespace) can have a number of file system instance
8408	   locations associated with it (via the fs_locations or
8409	   fs_locations_info attribute).  There may also be an actual current
8410	   file system at that location, accessible via normal namespace
8411	   operations (e.g.  LOOKUP).  In this case, the file system is said to
8412	   be "present" at that position in the namespace and clients will
8413	   typically use it, reserving use of additional locations specified via
8414	   the location-related attributes to situations in which the principal
8415	   location is no longer available.

8417	   When there is no actual file system at the namespace location in
8418	   question, the file system is said to be "absent".  An absent file
8419	   system contains no files or directories other than the root and any
8420	   reference to it, except to access a small set of attributes useful in
8421	   determining alternate locations, will result in an error,
8422	   NFS4ERR_MOVED.  Note that if the server ever returns NFS4ERR_MOVED
8423	   (i.e. file systems may be absent), it MUST support the fs_locations
8424	   attribute and SHOULD support the fs_locations_info and fs_absent
8425	   attributes.

8427	   While the error name suggests that we have a case of a file system
8428	   which once was present, and has only become absent later, this is
8429	   only one possibility.  A position in the namespace may be permanently
8430	   absent with the file system(s) designated by the location attributes
8431	   the only realization.  The name NFS4ERR_MOVED reflects an earlier,
8432	   more limited conception of its function, but this error will be
8433	   returned whenever the referenced file system is absent, whether it
8434	   has moved or not.

8436	   Except in the case of GETATTR-type operations (to be discussed
8437	   later), when the current filehandle at the start of an operation is
8438	   within an absent file system, that operation is not performed and the
8439	   error NFS4ERR_MOVED returned, to indicate that the file system is
8440	   absent on the current server.

8442	   Because a GETFH cannot succeed if the current filehandle is within an
8443	   absent file system, filehandles within an absent file system cannot
8444	   be transferred to the client.  When a client does have filehandles
8445	   within an absent file system, it is the result of obtaining them when
8446	   the file system was present, and having the file system become absent
8447	   subsequently.

8449	   It should be noted that because the check for the current filehandle
8450	   being within an absent file system happens at the start of every
8451	   operation, operations which change the current filehandle so that it
8452	   is within an absent file system will not result in an error.  This
8453	   allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be
8454	   used to get attribute information, particularly location attribute
8455	   information, as discussed below.

8457	   The recommended file system attribute fs_absent can used to
8458	   interrogate the present/absent status of a given file system.

8460	10.3.  Getting Attributes for an Absent File System

8462	   When a file system is absent, most attributes are not available, but
8463	   it is necessary to allow the client access to the small set of
8464	   attributes that are available, and most particularly those that give
8465	   information about the correct current locations for this file system,
8466	   fs_locations and fs_locations_info.

8468	10.3.1.  GETATTR Within an Absent File System

8470	   As mentioned above, an exception is made for GETATTR in that
8471	   attributes may be obtained for a filehandle within an absent file
8472	   system.  This exception only applies if the attribute mask contains
8473	   at least one attribute bit that indicates the client is interested in
8474	   a result regarding an absent file system: fs_locations,
8475	   fs_locations_info, or fs_absent.  If none of these attributes is
8476	   requested, GETATTR will result in an NFS4ERR_MOVED error.

8478	   When a GETATTR is done on an absent file system, the set of supported
8479	   attributes is very limited.  Many attributes, including those that
8480	   are normally mandatory will not be available on an absent file
8481	   system.  In addition to the attributes mentioned above (fs_locations,
8482	   fs_locations_info, fs_absent), the following attributes SHOULD be
8483	   available on absent file systems, in the case of recommended
8484	   attributes at least to the same degree that they are available on
8485	   present file systems.

8487	   change:  This attribute is useful for absent file systems and can be
8488	      helpful in summarizing to the client when any of the location-
8489	      related attributes changes.

8491	   fsid:  This attribute should be provided so that the client can
8492	      determine file system boundaries, including, in particular, the
8493	      boundary between present and absent file systems.

8495	   mounted_on_fileid:  For objects at the top of an absent file system
8496	      this attribute needs to be available.  Since the fileid is one
8497	      which is within the present parent file system, there should be no
8498	      need to reference the absent file system to provide this
8499	      information.

8501	   Other attributes SHOULD NOT be made available for absent file
8502	   systems, even when it is possible to provide them.  The server should
8503	   not assume that more information is always better and should avoid
8504	   gratuitously providing additional information.

8506	   When a GETATTR operation includes a bit mask for one of the
8507	   attributes fs_locations, fs_locations_info, or absent, but where the
8508	   bit mask includes attributes which are not supported, GETATTR will
8509	   not return an error, but will return the mask of the actual
8510	   attributes supported with the results.

8512	   Handling of VERIFY/NVERIFY is similar to GETATTR in that if the
8513	   attribute mask does not include fs_locations, fs_locations_info, or
8514	   fs_absent, the error NFS4ERR_MOVED will result.  It differs in that
8515	   any appearance in the attribute mask of an attribute not supported
8516	   for an absent file system (and note that this will include some
8517	   normally mandatory attributes), will also cause an NFS4ERR_MOVED
8518	   result.

8520	10.3.2.  READDIR and Absent File Systems

8522	   A READDIR performed when the current filehandle is within an absent
8523	   file system will result in an NFS4ERR_MOVED error, since, unlike the
8524	   case of GETATTR, no such exception is made for READDIR.

8526	   Attributes for an absent file system may be fetched via a READDIR for
8527	   a directory in a present file system, when that directory contains
8528	   the root directories of one or more absent file systems.  In this
8529	   case, the handling is as follows:

8531	   o  If the attribute set requested includes one of the attributes
8532	      fs_locations, fs_locations_info, or fs_absent, then fetching of
8533	      attributes proceeds normally and no NFS4ERR_MOVED indication is
8534	      returned, even when the rdattr_error attribute is requested.

8536	   o  If the attribute set requested does not include one of the
8537	      attributes fs_locations, fs_locations_info, or fs_absent, then if
8538	      the rdattr_error attribute is requested, each directory entry for
8539	      the root of an absent file system, will report NFS4ERR_MOVED as
8540	      the value of the rdattr_error attribute.

8542	   o  If the attribute set requested does not include any of the
8543	      attributes fs_locations, fs_locations_info, fs_absent, or
8544	      rdattr_error then the occurrence of the root of an absent file
8545	      system within the directory will result in the READDIR failing
8546	      with an NFSERR_MOVED error.

8548	   o  The unavailability of an attribute because of a file system's
8549	      absence, even one that is ordinarily mandatory, does not result in
8550	      any error indication.  The set of attributes returned for the root
8551	      directory of the absent file system in that case is simply
8552	      restricted to those actually available.

8554	10.4.  Uses of Location Information

8556	   The location-bearing attributes (fs_locations and fs_locations_info),
8557	   provide, together with the possibility of absent file systems, a
8558	   number of important facilities in providing reliable, manageable, and
8559	   scalable data access.

8561	   When a file system is present, these attribute can provide
8562	   alternative locations, to be used to access the same data, in the
8563	   event that server failures, communications problems, or other
8564	   difficulties, make continued access to the current file system
8565	   impossible or otherwise impractical.  Under some circumstances
8566	   multiple alternative locations may be used simultaneously to provide
8567	   higher performance access to the file system in question.  Provision
8568	   of such alternate locations is referred to as "replication" although
8569	   there are cases in which replicated sets of data are not in fact
8570	   present, and the replicas are instead different paths to the same
8571	   data.

8573	   When a file system is present and becomes absent, clients can be
8574	   given the opportunity to have continued access to their data, at an
8575	   alternate location.  In this case, a continued attempt to use the
8576	   data in the now-absent file system will result in an NFSERR_MOVED
8577	   error and at that point the successor locations (typically only one
8578	   but multiple choices are possible) can be fetched and used to
8579	   continue access.  Transfer of the file system contents to the new
8580	   location is referred to as "migration", but it should be kept in mind
8581	   that there are cases in which this term can be used, like
8582	   "replication", when there is no actual data migration per se.

8584	   Where a file system was not previously present, specification of file
8585	   system location provides a means by which file systems located on one
8586	   server can be associated with a name space defined by another server,
8587	   thus allowing a general multi-server namespace facility.  Designation
8588	   of such a location, in place of an absent file system, is called
8589	   "referral".

8591	10.4.1.  File System Replication

8593	   The fs_locations and fs_locations_info attributes provide alternative
8594	   locations, to be used to access data in place of or in a addition to
8595	   the current file system instance.  On first access to a file system,
8596	   the client should obtain the value of the set alternate locations by
8597	   interrogating the fs_locations or fs_locations_info attribute, with
8598	   the latter being preferred.

8600	   In the event that server failures, communications problems, or other
8601	   difficulties, make continued access to the current file system
8602	   impossible or otherwise impractical, the client can use the alternate
8603	   locations as a way to get continued access to his data.  Depending on
8604	   specific attributes of these alternate locations, as indicated within
8605	   the fs_locations_info attribute, multiple locations may be used
8606	   simultaneously, to provide higher performance through the
8607	   exploitation of multiple paths between client and target file system.

8609	   The alternate locations may be physical replicas of the (typically
8610	   read-only) file system data, or they may reflect alternate paths to
8611	   the same server or provide for the use of various form of server
8612	   clustering in which multiple servers provide alternate ways of
8613	   accessing the same physical file system.  How these different modes
8614	   of file system transition are represented within the fs_locations and
8615	   fs_locations_info attributes and how the client deals with file
8616	   system transition issues will be discussed in detail below.

8618	   When multiple server addresses correspond to the same actual server,
8619	   as shown by a common so_major_id field within the eir_server_owner
8620	   field returned by EXCHANGE_ID, the client may assume that for each
8621	   file system in the namespace of a given server network address, there
8622	   exist file systems at corresponding namespace locations for each of
8623	   the other server network addresses, even in the absence of explicit
8624	   listing in fs_locations and fs_locations_info.  Such corresponding
8625	   file system locations can be used as alternate locations, just as
8626	   those explicitly specified via the fs_locations and fs_locations_info
8627	   attributes.  Where these specific locations are designated in the
8628	   fs_locations_info attribute, the conditions of use specified in this
8629	   attribute (e.g. priorities, specification of simultaneous use) may
8630	   limit the clients use of these alternate locations.

8632	   When multiple replicas exist and are used simultaneously or in
8633	   succession by a client, they must designate the same data (with
8634	   metadata being the same to the degree indicated by the
8635	   fs_locations_info attribute).  Where file systems are writable, a
8636	   change made on one instance must be visible on all instances,
8637	   immediately upon the earlier of the return of the modifying request
8638	   or the visibility of that change on any of the associated replicas.
8639	   Where a file system is not writable but represents a read-only copy
8640	   (possibly periodically updated) of a writable file system, similar
8641	   requirements apply to the propagation of updates.  It must be
8642	   guaranteed that any change visible on the original file system
8643	   instance must be immediately visible on any replica before the client
8644	   transitions access to that replica, to avoid any possibility, that a
8645	   client in effecting a transition to a replica, will see any reversion
8646	   in file system state.  The specific means by which this will be
8647	   prevented varies based on fs4_status_type reported as part of the
8648	   fs_status attribute.  (See Section 10.11).

8650	10.4.2.  File System Migration

8652	   When a file system is present and becomes absent, clients can be
8653	   given the opportunity to have continued access to their data, at an
8654	   alternate location, as specified by the fs_locations or
8655	   fs_locations_info attribute.  Typically, a client will be accessing
8656	   the file system in question, get an NFS4ERR_MOVED error, and then use
8657	   the fs_locations or fs_locations_info attribute to determine the new
8658	   location of the data.  When fs_locations_info is used, additional
8659	   information will be available which will define the nature of the
8660	   client's handling of the transition to a new server.

8662	   Such migration can be helpful in providing load balancing or general
8663	   resource reallocation.  The protocol does not specify how the file
8664	   system will be moved between servers.  It is anticipated that a
8665	   number of different server-to-server transfer mechanisms might be
8666	   used with the choice left to the server implementer.  The NFSv4.1
8667	   protocol specifies the method used to communicate the migration event
8668	   between client and server.

8670	   The new location may be an alternate communication path to the same
8671	   server, or, in the case of various forms of server clustering,
8672	   another server providing access to the same physical file system.
8673	   The client's responsibilities in dealing with this transition depend
8674	   on the specific nature of the new access path and how and whether
8675	   data was in fact migrated.  These issues will be discussed in detail
8676	   below.

8678	   When multiple server addresses correspond to the same actual server,
8679	   as shown by a common value for so_major_id field of the
8680	   eir_server_owner field returned by EXCHANGE_ID, the location or
8681	   locations may designate alternate server addresses in the form of
8682	   specific server network addresses, when the file system in question
8683	   is available at those addresses, and no longer accessible at the
8684	   original address.

8686	   Although a single successor location is typical, multiple locations
8687	   may be provided, together with information that allows priority among
8688	   the choices to be indicated, via information in the fs_locations_info
8689	   attribute.  Where suitable clustering mechanisms make it possible to
8690	   provide multiple identical file systems or paths to them, this allows
8691	   the client the opportunity to deal with any resource or
8692	   communications issues that might limit data availability.

8694	   When an alternate location is designated as the target for migration,
8695	   it must designate the same data (with metadata being the same to the
8696	   degree indicated by the fs_locations_info attribute).  Where file
8697	   systems are writable, a change made on the original file system must
8698	   be visible on all migration targets.  Where a file system is not
8699	   writable but represents a read-only copy (possibly periodically
8700	   updated) of a writable file system, similar requirements apply to the
8701	   propagation of updates.  Any change visible in the original file
8702	   system must already be effected on all migration targets, to avoid
8703	   any possibility, that a client in effecting a transition to the
8704	   migration target will see any reversion in file system state.

8706	10.4.3.  Referrals

8708	   Referrals provide a way of placing a file system in a location
8709	   essentially without respect to its physical location on a given
8710	   server.  This allows a single server of a set of servers to present a
8711	   multi-server namespace that encompasses file systems located on
8712	   multiple servers.  Some likely uses of this include establishment of
8713	   site-wide or organization-wide namespaces, or even knitting such
8714	   together into a truly global namespace.

8716	   Referrals occur when a client determines, upon first referencing a
8717	   position in the current namespace, that it is part of a new file
8718	   system and that that file system is absent.  When this occurs,
8719	   typically by receiving the error NFS4ERR_MOVED, the actual location
8720	   or locations of the file system can be determined by fetching the
8721	   fs_locations or fs_locations_info attribute.

8723	   The locations-related attribute may designate a single file system
8724	   location or multiple file system locations, to be selected based on
8725	   the needs of the client.  The server, in the fs_locations_info
8726	   attribute may specify priorities to be associated with various file
8727	   system location choices.  The server may assign different priorities
8728	   to different locations as reported to individual clients, in order to
8729	   adapt to client physical location or to effect load balancing.  When
8730	   both read-only and read-write file systems are present, some of the
8731	   read-only locations may not absolutely up-to-date (as they would have
8732	   to be in the case of replication and migration).  Servers may also
8733	   specify file system locations that include client-substituted
8734	   variable so that different clients are referred to different file
8735	   systems (with different data contents) based on client attributes
8736	   such as cpu architecture.

8738	   Use of multi-server namespaces is enabled by NFSv4 but is not
8739	   required.  The use of multi-server namespaces and their scope will
8740	   depend on the applications used, and system administration
8741	   preferences.

8743	   Multi-server namespaces can be established by a single server
8744	   providing a large set of referrals to all of the included file
8745	   systems.  Alternatively, a single multi-server namespace may be
8746	   administratively segmented with separate referral file systems (on
8747	   separate servers) for each separately-administered section of the
8748	   name space.  Any segment or the top-level referral file system may
8749	   use replicated referral file systems for higher availability.

8751	   Generally, multi-server namespaces are for the most part uniform, in
8752	   that the same data made available to one client at a given location
8753	   in the namespace is made availably to all clients at that location.
8754	   There are however facilities provided which allow different client to
8755	   be directed to different sets of data, so as to adapt to such client
8756	   characteristics as cpu architecture.

8758	10.5.  Additional Client-side Considerations

8760	   When clients make use of servers that implement referrals,
8761	   replication, and migration, care should be taken so that a user who
8762	   mounts a given file system that includes a referral or a relocated
8763	   file system continue to see a coherent picture of that user-side file
8764	   system despite the fact that it contains a number of server-side file
8765	   systems which may be on different servers.

8767	   One important issue is upward navigation from the root of a server-
8768	   side file system to its parent (specified as ".." in UNIX).  The
8769	   client needs to determine when it hits an fsid root going up the file
8770	   tree.  When at such a point, and needs to ascend to the parent, it
8771	   must do so locally instead of sending a LOOKUPP call to the server.
8772	   The LOOKUPP would normally return the ancestor of the target file
8773	   system on the target server, which may not be part of the space that
8774	   the client mounted.

8776	   A related issue is upward navigation from named attribute
8777	   directories.  The named attribute directories are essentially
8778	   detached from the namespace and this property should be safely
8779	   represented in the client operating environment.  LOOKUPP on a named
8780	   attribute directory may return the filehandle of the associated file
8781	   and conveying this to applications might be unsafe as many
8782	   applications expect the parent of a directory to be a directory by
8783	   itself.  Therefore the client may want to hide the parent of named
8784	   attribute directories (represented as ".." in UNIX) or represent the
8785	   named attribute directory as its own parent (as typically done for
8786	   the file system root directory in UNIX)

8788	   Another issue concerns refresh of referral locations.  When referrals
8789	   are used extensively, they may change as server configurations
8790	   change.  It is expected that clients will cache information related
8791	   to traversing referrals so that future client side requests are
8792	   resolved locally without server communication.  This is usually
8793	   rooted in client-side name lookup caching.  Clients should
8794	   periodically purge this data for referral points in order to detect
8795	   changes in location information.  When the change attribute changes
8796	   for directories that hold referral entries or for the referral
8797	   entries themselves, clients should consider any associated cached
8798	   referral information to be out of date.

8800	10.6.  Effecting File System Transitions

8802	   Transitions between file system instances, whether due to switching
8803	   between replicas upon server unavailability, or in response to a
8804	   server-initiated migration events are best dealt with together.  Even
8805	   though the prototypical use cases of replication and migration
8806	   contain distinctive sets of features, when all possibilities for
8807	   these operations are considered, the underlying unity of these
8808	   operations, from the client's point of view is clear, even though for
8809	   the server pragmatic considerations will normally force different
8810	   implementation strategies for planned and unplanned transitions.

8812	   A number of methods are possible for servers to replicate data and to
8813	   track client state in order to allow clients to transition between
8814	   file system instances with a minimum of disruption.  Such methods
8815	   vary between those that use inter-server clustering techniques to
8816	   limit the changes seen by the client, to those that are less
8817	   aggressive, use more standard methods of replicating data, and impose
8818	   a greater burden on the client to adapt to the transition.

8820	   The NFSv4.1 protocol does not impose choices on clients and servers
8821	   with regard to that spectrum of transition methods.  In fact, there
8822	   are many valid choices, depending on client and application
8823	   requirements and their interaction with server implementation
8824	   choices.  The NFSv4.1 protocol does define the specific choices that
8825	   can be made, how these choices are communicated to the client and how
8826	   the client is to deal with any discontinuities.

8828	   In the sections below, references will be made to various possible
8829	   server implementation choices as a way of illustrating the transition
8830	   scenarios that clients may deal with.  The intent here is not to
8831	   define or limit server implementations but rather to illustrate the
8832	   range of issues that clients may face.

8834	   In the discussion below, references will be made to a file system
8835	   having a particular property or of two file systems (typically the
8836	   source and destination) belonging to a common class of any of several
8837	   types.  Two file systems that belong to such a class share some
8838	   important aspect of file system behavior that clients may depend upon
8839	   when present, to easily effect a seamless transition between file
8840	   system instances.  Conversely, where the file systems do not belong
8841	   to such a common class, the client has to deal with various sorts of
8842	   implementation discontinuities which may cause performance or other
8843	   issues in effecting a transition.

8845	   Where the fs_locations_info attribute is available, such file system
8846	   classification data will be made directly available to the client.
8847	   See Section 10.10 for details.  When only fs_locations is available,
8848	   default assumptions with regard to such classifications have to be
8849	   inferred.  See Section 10.9 for details.

8851	   In cases in which one server is expected to accept opaque values from
8852	   the client that originated from another server, it is a wise
8853	   implementation practice for the servers to encode the "opaque" values
8854	   in big endian octet order.  If this is done, servers acting as
8855	   replicas or immigrating file systems will be able to parse values
8856	   like stateids, directory cookies, filehandles, etc. even if their
8857	   native octet order is different from that of other servers
8858	   cooperating in the replication and migration of the file system.

8860	10.6.1.  File System Transitions and Simultaneous Access

8862	   When a single file system may be accessed at multiple locations,
8863	   whether this is because of an indication of file system identity as
8864	   reported by the fs_locations or fs_locations_info attributes or
8865	   because two file systems instances have corresponding locations on
8866	   server addresses which connect to the same server as indicated by a
8867	   common so_major_id field in the eir_server_owner field returned by
8868	   EXCHANGE_ID, the client will, depending on specific circumstances as
8869	   discussed below, either:

8871	   o  Access multiple instances simultaneously, as representing
8872	      alternate paths to the same data and metadata.

8874	   o  The client accesses one instance (or set of instances) and then
8875	      transitions to an alternative instance (or set of instances) as a
8876	      result of network issues, server unresponsiveness, or server-
8877	      directed migration.  The transition may involve changes in
8878	      filehandles, fileids, the change attribute, and or locking state,
8879	      depending on the attributes of the source and destination file
8880	      system instances, as specified in the fs_locations_info attribute.

8882	   Which of these choices is possible, and how a transition is effected
8883	   is governed by equivalence classes of file system instances as
8884	   reported by the fs_locations_info attribute, and, for file systems
8885	   instances in the same location within multiple single-server
8886	   namespace, by the so_major_id field in the eir_server_owner field
8887	   returned by EXCHANGE_ID.

8889	10.6.2.  Simultaneous Use and Transparent Transitions

8891	   When two file system instances have the same location within their
8892	   respective single-server namespaces and those two server IP addresses
8893	   return the so_major_id value in the eir_server_owner value returned
8894	   in response to EXCHANGE_ID, those file systems instances can be
8895	   treated as the same, and either used together simultaneously or
8896	   serially with no transition activity required on the part of the
8897	   client.

8899	   Whether simultaneous use of the two file system instances is valid is
8900	   controlled by whether the fs_locations_info attribute shows the two
8901	   instances as having the same _simultaneous-use_ class.

8903	   Note that for two such file systems, any information within the
8904	   fs_locations_info attribute that indicates the need for special
8905	   transition activity, i.e. the appearance of the two file system
8906	   instances with different _handle_, _fileid_, _verifier_, _change_
8907	   classes, MUST be ignored by the client.  The server SHOULD not
8908	   indicate that these instances belong to different _handle_, _fileid_,
8909	   _verifier_, _change_ classes, whether the two instances are shown
8910	   belonging to the same _simultaneous-use_ class or not.

8912	   Where these conditions do not apply, a non-transparent file system
8913	   instance transition is required with the details depending on the
8914	   respective _handle_, _fileid_, _verifier_, _change_ classes of the
8915	   two file system instances and whether the two servers in question
8916	   have the same eir_server_scope value as reported by EXCHANGE_ID.

8918	10.6.2.1.  Simultaneous Use of File System Instances

8920	   When the conditions above hold, in either of the following two cases,
8921	   the client may use the two file system instances simultaneously.

8923	   o  The fs_locations_info attribute does not contain separate per-IP
8924	      address entries for file systems instances at the distinct IP
8925	      addresses.  This includes the case in which the fs_locations_info
8926	      attribute is unavailable.

8928	   o  The fs_locations_info attribute indicates that two file system
8929	      instances belong to the same _simultaneous-use_ class.

8931	   In this case, the client may use both file system instances
8932	   simultaneously, as representations of the same file system, whether
8933	   that happens because the two IP addresses connect to the same
8934	   physical server or because different servers connect to clustered
8935	   file systems and export their data in common.  When simultaneous use
8936	   is in effect, any change made to one file system instance must be
8937	   immediately reflected in the other file system instance(s).  Locks
8938	   are treated as part of a common lease, associated with a common
8939	   client ID.  Depending on the details of the eir_server_owner returned
8940	   by EXCHANGE_ID, the two server instances may be accessed by different
8941	   sessions or a single session in common.

8943	10.6.2.2.  Transparent File System Transitions

8945	   When the conditions above hold and the fs_locations_info attribute
8946	   explicitly shows the file system instances for these distinct IP
8947	   addresses as belonging to different _simultaneous-use_ classes, the
8948	   file system instances should not be used by the client
8949	   simultaneously, but rather serially with one being used unless and
8950	   until communication difficulties, lack of responsiveness, or an
8951	   explicit migration event causes another file system instance (or set
8952	   of file system instances sharing a common _simultaneous-use_ class to
8953	   be used.

8955	   When a change in file system instance is to be done, the client will
8956	   use the same client ID already in effect.  If it already has
8957	   connections to the new server address, these will be used.  Otherwise
8958	   new connections to existing sessions or new sessions associated with
8959	   the existing client ID are established as indicated by the
8960	   eir_server_owner returned by EXCHANGE_ID.

8962	   In all such transparent transition cases, the following apply:

8964	   o  File handles stay the same if persistent and if volatile are only
8965	      subject to expiration, if they would be in the absence of file
8966	      system transition.

8968	   o  Fileid values do not change across the transition.

8970	   o  The file system will have the same fsid in both the old and new
8971	      locations.

8973	   o  Change attribute values are consistent across the transition and
8974	      do not have to be refetched.  When change attributes indicate that
8975	      a cached object is still valid, it can remain cached.

8977	   o  Client, and state identifier retain their validity across the
8978	      transition, except where their staleness is recognized and
8979	      reported by the new server.  Except where such staleness requires
8980	      it, no lock reclamation is needed.

8982	   o  Write verifiers are presumed to retain their validity and can be
8983	      presented to COMMIT, with the expectation that if COMMIT on the
8984	      new server accept them as valid, then that server has all of the
8985	      data unstably written to the original server and has committed it
8986	      to stable storage as requested.

8988	10.6.3.  Filehandles and File System Transitions

8990	   There are a number of ways in which filehandles can be handled across
8991	   a file system transition.  These can be divided into two broad
8992	   classes depending upon whether the two file systems across which the
8993	   transition happens share sufficient state to effect some sort of
8994	   continuity of file system handling.

8996	   When there is no such co-operation in filehandle assignment, the two
8997	   file systems are reported as being in different _handle_ classes.  In
8998	   this case, all filehandles are assumed to expire as part of the file
8999	   system transition.  Note that this behavior does not depend on
9000	   fh_expire_type attribute and supersedes the specification of
9001	   FH4_VOL_MIGRATION bit, which only affects behavior when
9002	   fs_locations_info is not available.

9004	   When there is co-operation in filehandle assignment, the two file
9005	   systems are reported as being in the same _handle_ classes.  In this
9006	   case, persistent filehandle remain valid after the file system
9007	   transition, while volatile filehandles (excluding those while are
9008	   only volatile due to the FH4_VOL_MIGRATION bit) are subject to
9009	   expiration on the target server.

9011	10.6.4.  Fileid's and File System Transitions

9013	   In NFSv4.0, the issue of continuity of fileid's in the event of a
9014	   file system transition was not addressed.  The general expectation
9015	   had been that in situations in which the two file system instances
9016	   are created by a single vendor using some sort of file system image
9017	   copy, fileid's will be consistent across the transition while in the
9018	   analogous multi-vendor transitions they will not.  This poses
9019	   difficulties, especially for the client without special knowledge of
9020	   the of the transition mechanisms adopted by the server.

9022	   It is important to note that while clients themselves may have no
9023	   trouble with a fileid changing as a result of a file system
9024	   transition event, applications do typically have access to the fileid
9025	   (e.g. via stat), and the result of this is that an application may
9026	   work perfectly well if there is no file system instance transition or
9027	   if any such transition is among instances created by a single vendor,
9028	   yet be unable to deal with the situation in which a multi-vendor
9029	   transition occurs, at the wrong time.

9031	   Providing the same fileid's in a multi-vendor (multiple server
9032	   vendors) environment has generally been held to be quite difficult.

9034	   While there is work to be done, it needs to be pointed out that this
9035	   difficulty is partly self-imposed.  Servers have typically identified
9036	   fileid with inode number, i.e. with a quantity used to find the file
9037	   in question.  This identification poses special difficulties for
9038	   migration of an fs between vendors where assigning the same index to
9039	   a given file may not be possible.  Note here that a fileid does not
9040	   require that it be useful to find the file in question, only that it
9041	   is unique within the given fs.  Servers prepared to accept a fileid
9042	   as a single piece of metadata and store it apart from the value used
9043	   to index the file information can relatively easily maintain a fileid
9044	   value across a migration event, allowing a truly transparent
9045	   migration event.

9047	   In any case, where servers can provide continuity of fileids, they
9048	   should and the client should be able to find out that such continuity
9049	   is available, and take appropriate action.  Information about the
9050	   continuity (or lack thereof) of fileid's across a file system is
9051	   represented by specifying whether the file systems in question are of
9052	   the same _fileid_ class.

9054	10.6.5.  Fsids and File System Transitions

9056	   Since fsids are only unique within a per-server basis, it is to be
9057	   expected that they will change during a file system transition.
9058	   Clients should not make the fsid's received from the server visible
9059	   to application since they may not be globally unique, and because
9060	   they may change during a file system transition event.  Applications
9061	   are best served if they are isolated from such transitions to the
9062	   extent possible.

9064	   When a file system transition is made and the fs_locations_info
9065	   indicates that file system in question may be split into multiple
9066	   file systems (via the FSLI4F_MULTI_FS flag), client should do
9067	   GETATTR's on all known objects within the file system undergoing
9068	   transition, to determine the new file system boundaries.  Clients may
9069	   maintain the fsid's passed to existing applications by mapping all of
9070	   the fsid for the descendent file systems to a the common fsid used
9071	   for the original file system.

9073	10.6.6.  The Change Attribute and File System Transitions

9075	   Since the change attribute is defined as a server-specific one,
9076	   change attributes fetched from one server are normally presumed to be
9077	   invalid on another server.  Such a presumption is troublesome since
9078	   it would invalidate all cached change attributes, requiring
9079	   refetching.  Even more disruptive, the absence of any assured
9080	   continuity for the change attribute means that even if the same value
9081	   is gotten on refetch no conclusions can drawn as to whether the
9082	   object in question has changed.  The identical change attribute could
9083	   be merely an artifact, of a modified file with a different change
9084	   attribute construction algorithm, with that new algorithm just
9085	   happening to result in an identical change value.

9087	   When the two file systems have consistent change attribute formats,
9088	   and this fact is communicated to the client by reporting as in the
9089	   same _change_ class, the client may assume a continuity of change
9090	   attribute construction and handle this situation just as it would be
9091	   handled without any file system transition.

9093	10.6.7.  Lock State and File System Transitions

9095	   In a file system transition, the client needs to handle cases in
9096	   which the two servers have cooperated in state management and in
9097	   which they have not.  Cooperation by two servers in state management
9098	   requires coordination of clientids.  Before the client attempts to
9099	   use a client ID associated with one server in a request to the server
9100	   of the other file system, it must eliminate the possibility that two
9101	   non-cooperating servers have assigned the same client ID by accident.
9102	   The client needs to compare the eir_server_scope values returned by
9103	   each server.  If the scope values do not match, then the servers have
9104	   not cooperated in state management.  If the scope values match, then
9105	   this indicates the servers have cooperated in assigning clientids to
9106	   the point that they will reject clientids that refer to state they do
9107	   not know about.

9109	   In the case of migration, the servers involved in the migration of a
9110	   file system SHOULD transfer all server state from the original to the
9111	   new server.  When this done, it must be done in a way that is
9112	   transparent to the client.  With replication, such a degree of common
9113	   state is typically not the case.  Clients, however should use the
9114	   information provided by the eir_server_scope returned by EXCHANGE_ID
9115	   to determine whether such sharing may be in effect, rather than
9116	   making assumptions based on the reason for the transition.

9118	   This state transfer will reduce disruption to the client when a file
9119	   system transition If the servers are successful in transferring all
9120	   state, the client can attempt to establish sessions associated with
9121	   the client ID used for the source file system instance.  If the
9122	   server accepts that as a valid client ID, then the client may used
9123	   the existing stateid's associated with that client ID for the old
9124	   file system instance in connection with the that same client ID in
9125	   connection with the file system instance.

9127	   When the two servers belong to the same server scope, it does
9128	   necessarily mean that when dealing with the transition, the client
9129	   will not have to reclaim state.  However it does mean that the client
9130	   may proceed using his current client ID when establishing
9131	   communication with the new server and that that new server will
9132	   either recognize that client ID as valid, or reject it, in which case
9133	   locks must be reclaimed by the client.

9135	   File systems co-operating in state management may actually share
9136	   state or simply divide the id space so as to recognize (and reject as
9137	   stale) each others state and clients id's.  Servers which do share
9138	   state may not do so under all conditions or at all times.  The
9139	   requirement for the server is that if it cannot be sure in accepting
9140	   a client ID that it reflects the locks the client was given, it must
9141	   treat all associated state as stale and report it as such to the
9142	   client.

9144	   When the two file systems instances are on servers that do not share
9145	   a server scope value the client must establish a new client ID on the
9146	   destination, if it does not have one already and reclaim if possible.
9147	   In this case, old stateids and client ID's should not be presented to
9148	   the new server since there is no assurance that they will not
9149	   conflict with IDs valid on that server.

9151	   In either case, when actual locks are not known to be maintained, the
9152	   destination server may establish a grace period specific to the given
9153	   file system, with non-reclaim locks being rejected for that file
9154	   system, even though normal locks are being granted for other file
9155	   systems.  Clients should not infer the absence of a grace period for
9156	   file systems being transitioned to a server from responses to
9157	   requests for other file systems.

9159	   In the case of lock reclamation for a given file system after a file
9160	   system transition, edge conditions can arise similar to those for
9161	   reclaim after server reboot (although in the case of the planned
9162	   state transfer associated with migration, these can be avoided by
9163	   securely recording lock state as part of state migration.  Where the
9164	   destination server cannot guarantee that locks will not be
9165	   incorrectly granted, the destination server should not establish a
9166	   file-system-specific grace period.

9168	   In place of a file-system-specific version of RECLAIM_COMPLETE,
9169	   servers may assume that an attempt to obtain a new lock, other than
9170	   be reclaim, indicate the end of the client's attempt to reclaim locks
9171	   for that file system.  [NOTE: The alternative would be to adapt
9172	   RECLAIM_COMPLETE to this task].

9174	   Information about client identity that may be propagated between
9175	   servers in the form of client_owner4 and associated verifiers, under
9176	   the assumption that the client presents the same values to all the
9177	   servers with which it deals.

9179	   Servers are encouraged to provide facilities to allow locks to be
9180	   reclaimed on the new server after a file system transition.  Often,
9181	   however, in cases in which the two servers do not share a server
9182	   scope value, such facilities may not be available and client should
9183	   be prepared to re-obtain locks, even though it is possible that the
9184	   client may have his LOCK or OPEN request denied due to a conflicting
9185	   lock.  In some environments, such as the transition between read-only
9186	   file systems, such denial of locks should not pose large difficulties
9187	   in practice.  When an attempt to re-establish a lock on a new server
9188	   is denied, the client should treat the situation as if his original
9189	   lock had been revoked.  In all cases in which the lock is granted,
9190	   the client cannot assume that no conflicting could have been granted
9191	   in the interim.  Where change attribute continuity is present, the
9192	   client may check the change attribute to check for unwanted file
9193	   modifications.  Where even this is not available, and the file system
9194	   is not read-only, a client may reasonably treat all pending locks as
9195	   having been revoked.

9197	10.6.7.1.  Leases and File System Transitions

9199	   In the case of lease renewal, the client may not be submitting
9200	   requests for a file system that has been transferred to another
9201	   server.  This can occur because of the lease renewal mechanism.  The
9202	   client renews leases for all file systems when submitting a request
9203	   on an associated session, regardless of the specific file system
9204	   being referenced.

9206	   In order for the client to schedule renewal of leases that may have
9207	   been relocated to the new server, the client must find out about
9208	   lease relocation before those leases expire.  To accomplish this, the
9209	   SEQUENCE operation will return the status bit
9210	   SEQ4_STATUS_LEASE_MOVED, if responsibility for any of the leases to
9211	   be renewed has been transferred to a new server.  This condition will
9212	   continue until the client receives an NFS4ERR_MOVED error and the
9213	   server receives the subsequent GETATTR for the fs_locations or
9214	   fs_locations_info attribute for an access to each file system for
9215	   which a lease has been moved to a new server.

9217	   When a client receives an SEQ4_STATUS_LEASE_MOVED indication, it
9218	   should perform an operation on each file system associated with the
9219	   server in question.  When the client receives an NFS4ERR_MOVED error,
9220	   the client can follow the normal process to obtain the new server
9221	   information (through the fs_locations and fs_locations_info
9222	   attributes) and perform renewal of those leases on the new server,
9223	   unless information in fs_locations_info attribute shows that no state
9224	   could have been transferred.  If the server has not had state
9225	   transferred to it transparently, the client will receive
9226	   NFS4ERR_STALE_CLIENTID from the new server, as described above, and
9227	   the client can then reclaim locks as is done in the event of server
9228	   failure.

9230	10.6.7.2.  Transitions and the Lease_time Attribute

9232	   In order that the client may appropriately manage its leases in the
9233	   case of a file system transition, the destination server must
9234	   establish proper values for the lease_time attribute.

9236	   When state is transferred transparently, that state should include
9237	   the correct value of the lease_time attribute.  The lease_time
9238	   attribute on the destination server must never be less than that on
9239	   the source since this would result in premature expiration of leases
9240	   granted by the source server.  Upon transitions in which state is
9241	   transferred transparently, the client is under no obligation to re-
9242	   fetch the lease_time attribute and may continue to use the value
9243	   previously fetched (on the source server).

9245	   If state has not been transferred transparently, either because the
9246	   associated servers are show as have different eir_server_scope
9247	   strings or because the client ID is rejected when presented to the
9248	   new server, the client should fetch the value of lease_time on the
9249	   new (i.e. destination) server, and use it for subsequent locking
9250	   requests.  However the server must respect a grace period at least as
9251	   long as the lease_time on the source server, in order to ensure that
9252	   clients have ample time to reclaim their lock before potentially
9253	   conflicting non-reclaimed locks are granted.

9255	10.6.8.  Write Verifiers and File System Transitions

9257	   In a file system transition, the two file systems may be clustered in
9258	   the handling of unstably written data.  When this is the case, and
9259	   the two file systems belong to the same _verifier_ class, valid
9260	   verifiers from one system may be recognized by the other and
9261	   superfluous writes avoided.  There is no requirement that all valid
9262	   verifiers be recognized, but it cannot be the case that a verifier is
9263	   recognized as valid when it is not.  [NOTE: We need to resolve the
9264	   issue of proper verifier scope].

9266	   When two file systems belong to different _verifier_ classes, the
9267	   client must assume that all unstable writes in existence at the time
9268	   file system transition, have been lost since there is no way the old
9269	   verifier can recognized as valid (or not) on the target server.

9271	10.7.  Effecting File System Referrals

9273	   Referrals are effected when an absent file system is encountered, and
9274	   one or more alternate locations are made available by the
9275	   fs_locations or fs_locations_info attributes.  The client will
9276	   typically get an NFS4ERR_MOVED error, fetch the appropriate location
9277	   information and proceed to access the file system on different
9278	   server, even though it retains its logical position within the
9279	   original namespace.

9281	   The examples given in the sections below are somewhat artificial in
9282	   that an actual client will not typically do a multi-component lookup,
9283	   but will have cached information regarding the upper levels of the
9284	   name hierarchy.  However, these example are chosen to make the
9285	   required behavior clear and easy to put within the scope of a small
9286	   number of requests, without getting unduly into details of how
9287	   specific clients might choose to cache things.

9289	10.7.1.  Referral Example (LOOKUP)

9291	   Let us suppose that the following COMPOUND is issued in an
9292	   environment in which /this/is/the/path is absent from the target
9293	   server.  This may be for a number of reasons.  It may be the case
9294	   that the file system has moved, or, it may be the case that the
9295	   target server is functioning mainly, or solely, to refer clients to
9296	   the servers on which various file systems are located.

9298	   o  PUTROOTFH

9300	   o  LOOKUP "this"

9302	   o  LOOKUP "is"

9304	   o  LOOKUP "the"

9306	   o  LOOKUP "path"

9308	   o  GETFH

9310	   o  GETATTR fsid,fileid,size,ctime

9312	   Under the given circumstances, the following will be the result.

9314	   o  PUTROOTFH --> NFS_OK.  The current fh is now the root of the
9315	      pseudo-fs.

9317	   o  LOOKUP "this" --> NFS_OK.  The current fh is for /this and is
9318	      within the pseudo-fs.

9320	   o  LOOKUP "is" --> NFS_OK.  The current fh is for /this/is and is
9321	      within the pseudo-fs.

9323	   o  LOOKUP "the" --> NFS_OK.  The current fh is for /this/is/the and
9324	      is within the pseudo-fs.

9326	   o  LOOKUP "path" --> NFS_OK.  The current fh is for /this/is/the/path
9327	      and is within a new, absent fs, but ... the client will never see
9328	      the value of that fh.

9330	   o  GETFH --> NFS4ERR_MOVED.  Fails because current fh is in an absent
9331	      fs at the start of the operation and the spec makes no exception
9332	      for GETFH.

9334	   o  GETATTR fsid,fileid,size,ctime.  Not executed because the failure
9335	      of the GETFH stops processing of the COMPOUND.

9337	   Given the failure of the GETFH, the client has the job of determining
9338	   the root of the absent file system and where to find that file
9339	   system, i.e. the server and path relative to that server's root fh.
9340	   Note here that in this example, the client did not obtain filehandles
9341	   and attribute information (e.g. fsid) for the intermediate
9342	   directories, so that he would not be sure where the absent file
9343	   system starts.  It could be the case, for example, that /this/is/the
9344	   is the root of the moved file system and that the reason that the
9345	   lookup of "path" succeeded is that the file system was not absent on
9346	   that op but was moved between the last LOOKUP and the GETFH (since
9347	   COMPOUND is not atomic).  Even if we had the fsid's for all of the
9348	   intermediate directories, we could have no way of knowing that /this/
9349	   is/the/path was the root of a new fs, since we don't yet have its
9350	   fsid.

9352	   In order to get the necessary information, let us re-issue the chain
9353	   of lookup's with GETFH's and GETATTR's to at least get the fsid's so
9354	   we can be sure where the appropriate fs boundaries are.  The client
9355	   could choose to get fs_locations_info at the same time but in most
9356	   cases the client will have a good guess as to where fs boundaries are
9357	   (because of where NFS4ERR_MOVED was gotten and where not) making
9358	   fetching of fs_locations_info unnecessary.

9360	   OP01:  PUTROOTFH --> NFS_OK

9362	   -  Current fh is root of pseudo-fs.

9364	   OP02:  GETATTR(fsid) --> NFS_OK

9366	   -  Just for completeness.  Normally, clients will know the fsid of
9367	      the pseudo-fs as soon as they establish communication with a
9368	      server.

9370	   OP03:  LOOKUP "this" --> NFS_OK

9372	   OP04:  GETATTR(fsid) --> NFS_OK

9374	   -  Get current fsid to see where fs boundaries are.  The fsid will be
9375	      that for the pseudo-fs in this example, so no boundary.

9377	   OP05:  GETFH --> NFS_OK

9379	   -  Current fh is for /this and is within pseudo-fs.

9381	   OP06:  LOOKUP "is" --> NFS_OK

9383	   -  Current fh is for /this/is and is within pseudo-fs.

9385	   OP07:  GETATTR(fsid) --> NFS_OK

9387	   -  Get current fsid to see where fs boundaries are.  The fsid will be
9388	      that for the pseudo-fs in this example, so no boundary.

9390	   OP08:  GETFH --> NFS_OK

9392	   -  Current fh is for /this/is and is within pseudo-fs.

9394	   OP09:  LOOKUP "the" --> NFS_OK

9396	   -  Current fh is for /this/is/the and is within pseudo-fs.

9398	   OP10:  GETATTR(fsid) --> NFS_OK

9400	   -  Get current fsid to see where fs boundaries are.  The fsid will be
9401	      that for the pseudo-fs in this example, so no boundary.

9403	   OP11:  GETFH --> NFS_OK

9405	   -  Current fh is for /this/is/the and is within pseudo-fs.

9407	   OP12:  LOOKUP "path" --> NFS_OK

9409	   -  Current fh is for /this/is/the/path and is within a new, absent
9410	      fs, but ...

9412	   -  The client will never see the value of that fh

9414	   OP13:  GETATTR(fsid, fs_locations_info) --> NFS_OK
9415	   -  We are getting the fsid to know where the fs boundaries are.  Note
9416	      that the fsid we are given will not necessarily be preserved at
9417	      the new location.  That fsid might be different and in fact the
9418	      fsid we have for this fs might a valid fsid of a different fs on
9419	      that new server.

9421	   -  In this particular case, we are pretty sure anyway that what has
9422	      moved is /this/is/the/path rather than /this/is/the since we have
9423	      the fsid of the latter and it is that of the pseudo-fs, which
9424	      presumably cannot move.  However, in other examples, we might not
9425	      have this kind of information to rely on (e.g. /this/is/the might
9426	      be a non-pseudo file system separate from /this/is/the/path), so
9427	      we need to have another reliable source information on the
9428	      boundary of the fs which is moved.  If, for example, the file
9429	      system "/this/is" had moved we would have a case of migration
9430	      rather than referral and once the boundaries of the migrated file
9431	      system was clear we could fetch fs_locations_info.

9433	   -  We are fetching fs_locations_info because the fact that we got an
9434	      NFS4ERR_MOVED at this point means that it most likely that this is
9435	      a referral and we need the destination.  Even if it is the case
9436	      that "/this/is/the" is a file system which has migrated, we will
9437	      still need the location information for that file system.

9439	   OP14:  GETFH --> NFS4ERR_MOVED

9441	   -  Fails because current fh is in an absent fs at the start of the
9442	      operation and the spec makes no exception for GETFH.  Note that
9443	      this has the happy consequence that we don't have to worry about
9444	      the volatility or lack thereof of the fh.  If the root of the fs
9445	      on the new location is a persistent fh, then we can assume that
9446	      this fh, which we never saw is a persistent fh, which, if we could
9447	      see it, would exactly match the new fh.  At least, there is no
9448	      evidence to disprove that.  On the other hand, if we find a
9449	      volatile root at the new location, then the filehandle which we
9450	      never saw must have been volatile or at least nobody can prove
9451	      otherwise.

9453	   Given the above, the client knows where the root of the absent file
9454	   system is, by noting where the change of fsid occurred.  The
9455	   fs_locations_info attribute also gives the client the actual location
9456	   of the absent file system, so that the referral can proceed.  The
9457	   server gives the client the bare minimum of information about the
9458	   absent file system so that there will be very little scope for
9459	   problems of conflict between information sent by the referring server
9460	   and information of the file system's home.  No filehandles and very
9461	   few attributes are present on the referring server and the client can
9462	   treat those it receives as basically transient information with the
9463	   function of enabling the referral.

9465	10.7.2.  Referral Example (READDIR)

9467	   Another context in which a client may encounter referrals is when it
9468	   does a READDIR on directory in which some of the sub-directories are
9469	   the roots of absent file systems.

9471	   Suppose such a directory is read as follows:

9473	   o  PUTROOTFH

9475	   o  LOOKUP "this"

9477	   o  LOOKUP "is"

9479	   o  LOOKUP "the"

9481	   o  READDIR (fsid, size, ctime, mounted_on_fileid)

9483	   In this case, because rdattr_error is not requested,
9484	   fs_locations_info is not requested, and some of attributes cannot be
9485	   provided the result will be an NFS4ERR_MOVED error on the READDIR,
9486	   with the detailed results as follows:

9488	   o  PUTROOTFH --> NFS_OK.  The current fh is at the root of the
9489	      pseudo-fs.

9491	   o  LOOKUP "this" --> NFS_OK.  The current fh is for /this and is
9492	      within the pseudo-fs.

9494	   o  LOOKUP "is" --> NFS_OK.  The current fh is for /this/is and is
9495	      within the pseudo-fs.

9497	   o  LOOKUP "the" --> NFS_OK.  The current fh is for /this/is/the and
9498	      is within the pseudo-fs.

9500	   o  READDIR (fsid, size, ctime, mounted_on_fileid) --> NFS4ERR_MOVED.
9501	      Note that the same error would have been returned if /this/is/the
9502	      had migrated, when in fact it is because the directory contains
9503	      the root of an absent fs.

9505	   So now suppose that we reissue with rdattr_error:

9507	   o  PUTROOTFH

9509	   o  LOOKUP "this"
9510	   o  LOOKUP "is"

9512	   o  LOOKUP "the"

9514	   o  READDIR (rdattr_error, fsid, size, ctime, mounted_on_fileid)

9516	   The results will be:

9518	   o  PUTROOTFH --> NFS_OK.  The current fh is at the root of the
9519	      pseudo-fs.

9521	   o  LOOKUP "this" --> NFS_OK.  The current fh is for /this and is
9522	      within the pseudo-fs.

9524	   o  LOOKUP "is" --> NFS_OK.  The current fh is for /this/is and is
9525	      within the pseudo-fs.

9527	   o  LOOKUP "the" --> NFS_OK.  The current fh is for /this/is/the and
9528	      is within the pseudo-fs.

9530	   o  READDIR (rdattr_error, fsid, size, ctime, mounted_on_fileid) -->
9531	      NFS_OK.  The attributes for "path" will only contain rdattr_error
9532	      with the value will be NFS4ERR_MOVED, together with an fsid value
9533	      and an a value for mounted_on_fileid.

9535	   So suppose we do another READDIR to get fs_locations_info, although
9536	   we could have used a GETATTR directly, as in the previous section.

9538	   o  PUTROOTFH

9540	   o  LOOKUP "this"

9542	   o  LOOKUP "is"

9544	   o  LOOKUP "the"

9546	   o  READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid,
9547	      size, ctime)

9549	   The results would be:

9551	   o  PUTROOTFH --> NFS_OK.  The current fh is at the root of the
9552	      pseudo-fs.

9554	   o  LOOKUP "this" --> NFS_OK.  The current fh is for /this and is
9555	      within the pseudo-fs.

9557	   o  LOOKUP "is" --> NFS_OK.  The current fh is for /this/is and is
9558	      within the pseudo-fs.

9560	   o  LOOKUP "the" --> NFS_OK.  The current fh is for /this/is/the and
9561	      is within the pseudo-fs.

9563	   o  READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid,
9564	      size, ctime) --> NFS_OK.  The attributes will be as shown below.

9566	   The attributes for "path" will only contain

9568	   o  rdattr_error (value: NFS4ERR_MOVED)

9570	   o  fs_locations_info )

9572	   o  mounted_on_fileid (value: unique fileid within referring fs)

9574	   o  fsid (value: unique value within referring server)

9576	   The attribute entry for "latest" will not contain size or ctime.

9578	10.8.  The Attribute fs_absent

9580	   In order to provide the client information about whether the current
9581	   file system is present or absent, the fs_absent attribute may be
9582	   interrogated.

9584	   As noted above, this attribute, when supported, may be requested of
9585	   absent file systems without causing NFS4ERR_MOVED to be returned and
9586	   it should always be available.  Servers are strongly urged to support
9587	   this attribute on all file systems if they support it on any file
9588	   system.

9590	10.9.  The Attribute fs_locations

9592	   The fs_locations attribute is structured in the following way:

9594	           struct fs_location {
9595	               utf8str_cis     server<>;
9596	               pathname4       rootpath;
9597	           };

9599	           struct fs_locations {
9600	               pathname4       fs_root;
9601	               fs_location     locations<>;
9602	           };

9604	   The fs_location struct is used to represent the location of a file
9605	   system by providing a server name and the path to the root of the
9606	   file system within that server's namespace.  When a set of servers
9607	   have corresponding file systems at the same path within their
9608	   namespaces, an array of server names may be provided.  An entry in
9609	   the server array is an UTF8 string and represents one of a
9610	   traditional DNS host name, IPv4 address, or IPv6 address.  It is not
9611	   a requirement that all servers that share the same rootpath be listed
9612	   in one fs_location struct.  The array of server names is provided for
9613	   convenience.  Servers that share the same rootpath may also be listed
9614	   in separate fs_location entries in the fs_locations attribute.

9616	   The fs_locations struct and attribute contains an array of such
9617	   locations.  Since the name space of each server may be constructed
9618	   differently, the "fs_root" field is provided.  The path represented
9619	   by fs_root represents the location of the file system in the current
9620	   server's name space, i.e. that of the server from which the
9621	   fs_locations attribute was obtained.  The fs_root path is meant to
9622	   aid the client by clearly referencing the root of the file system
9623	   whose locations are being reported, no matter what object within the
9624	   current file system, the current filehandle designates.

9626	   As an example, suppose there is a replicated file system located at
9627	   two servers (servA and servB).  At servA, the file system is located
9628	   at path "/a/b/c".  At, servB the file system is located at path
9629	   "/x/y/z".  If the client were to obtain the fs_locations value for
9630	   the directory at "/a/b/c/d", it might not necessarily know that the
9631	   file system's root is located in servA's name space at "/a/b/c".
9632	   When the client switches to servB, it will need to determine that the
9633	   directory it first referenced at servA is now represented by the path
9634	   "/x/y/z/d" on servB.  To facilitate this, the fs_locations attribute
9635	   provided by servA would have a fs_root value of "/a/b/c" and two
9636	   entries in fs_locations.  One entry in fs_locations will be for
9637	   itself (servA) and the other will be for servB with a path of
9638	   "/x/y/z".  With this information, the client is able to substitute
9639	   "/x/y/z" for the "/a/b/c" at the beginning of its access path and
9640	   construct "/x/y/z/d" to use for the new server.

9642	   Since fs_locations attribute lacks information defining various
9643	   attributes of the various file system choices presented, it should
9644	   only be interrogated and used when fs_locations_info is not
9645	   available.  When fs_locations is used, information about the specific
9646	   locations should be assumed based on the following rules.

9648	   The following rules are general and apply irrespective of the
9649	   context.

9651	   o  All listed file system instances should be considered as of the
9652	      same _handle_ class, if and only if, the current fh_expire_type
9653	      attribute does not include the FH4_VOL_MIGRATION bit.  Note that
9654	      in the case of referral, filehandle issues do not apply since
9655	      there can be no filehandles known within the current file system
9656	      nor is there any access to the fh_expire_type attribute on the
9657	      referring (absent) file system.

9659	   o  All listed file system instances should be considered as of the
9660	      same _fileid_ class, if and only if, the fh_expire_type attribute
9661	      indicates persistent filehandles and does not include the
9662	      FH4_VOL_MIGRATION bit.  Note that in the case of referral, fileid
9663	      issues do not apply since there can be no fileids known within the
9664	      referring (absent) file system nor is there any access to the
9665	      fh_expire_type attribute.

9667	   o  All file system instances servers should be considered as of
9668	      different _change_ classes.

9670	   For other class assignments, handling depends of file system
9671	   transitions depends on the reasons for the transition:

9673	   o  When the transition is due to migration, the target should be
9674	      treated as being of the same _verifier_ class as the source.

9676	   o  When the transition is due to failover to another replica, the
9677	      target should be treated as being of a different _verifier_ class
9678	      from the source.

9680	   The specific choices reflect typical implementation patterns for
9681	   failover and controlled migration respectively.  Since other choices
9682	   are possible and useful, this information is better obtained by using
9683	   fs_locations_info.

9685	   See the section "Security Considerations" for a discussion on the
9686	   recommendations for the security flavor to be used by any GETATTR
9687	   operation that requests the "fs_locations" attribute.

9689	10.10.  The Attribute fs_locations_info

9691	   The fs_locations_info attribute is intended as a more functional
9692	   replacement for fs_locations which will continue to exist and be
9693	   supported.  Clients can use it get a more complete set of information
9694	   about alternative file system locations.  When the server does not
9695	   support fs_locations_info, fs_locations can be used to get a subset
9696	   of the information.  A server which supports fs_locations_info MUST
9697	   support fs_locations as well.

9699	   There is additional information present in fs_locations_info, that is
9700	   not available in fs_locations:

9702	   o  Attribute continuity information to allow a client to select a
9703	      location which meets the transparency requirements of the
9704	      applications accessing the data and to take advantage of
9705	      optimizations that server guarantees as to attribute continuity
9706	      may provide (e.g. change attribute).

9708	   o  File System identity information which indicates when multiple
9709	      replicas, from the clients point of view, correspond to the same
9710	      target file system, allowing them to be used interchangeably,
9711	      without disruption, as multiple paths to the same thing.

9713	   o  Information which will bear on the suitability of various
9714	      replicas, depending on the use that the client intends.  For
9715	      example, many applications need an absolutely up-to-date copy
9716	      (e.g. those that write), while others may only need access to the
9717	      most up-to-date copy reasonably available.

9719	   o  Server-derived preference information for replicas, which can be
9720	      used to implement load-balancing while giving the client the
9721	      entire fs list to be used in case the primary fails.

9723	   The fs_locations_info attribute consists of a root pathname (just
9724	   like fs_locations), together with an array of fs_location_item4
9725	   structures.

9727	   struct  fs_locations_server4 {
9728	           int32_t         fls_currency;
9729	           opaque          fls_info<>;
9730	           utf8str_cis     fls_server;
9731	   };

9733	   const FSLI4BX_GFLAGS            = 0;
9734	   const FSLI4BX_TFLAGS            = 1;

9736	   const FSLI4BX_CLSIMUL           = 2;
9737	   const FSLI4BX_CLHANDLE          = 3;
9738	   const FSLI4BX_CLFILEID          = 4;
9739	   const FSLI4BX_CLVERIFIER        = 5;
9740	   const FSLI4BX_CHANGE            = 6;

9742	   const FSLI4BX_READRANK          = 7;
9743	   const FSLI4BX_WRITERANK         = 8;
9744	   const FSLI4BX_READORDER         = 9;
9745	   const FSLI4BX_WRITEORDER        = 10;

9747	   const FSLI4GF_WRITABLE          = 0x01;
9748	   const FSLI4GF_CUR_REQ           = 0x02;
9749	   const FSLI4GF_ABSENT            = 0x04;
9750	   const FSLI4GF_GOING             = 0x08;
9751	   const FSLI4GF_SPLIT             = 0x10;

9753	   const FSLI4TF_RDMA              = 0x01;

9755	   struct  fs_locations_item4 {
9756	           fs_locations_server4    fli_entries<>;
9757	           pathname4               fli_rootpath;
9758	   };

9760	   struct  fs_locations_info4 {
9761	           uint32_t                fli_flags;
9762	           pathname4               fli_fs_root;
9763	           fs_locations_item4      fli_items<>;
9764	   };

9766	   const FSLI4IF_VAR_SUB           = 0x00000001;

9768	   typedef fs_locations_info4 fattr4_fs_locations_info;

9770	   The fs_locations_info attribute is structured similarly to the
9771	   fs_locations attribute.  A top-level structure (fs_locations_info4)
9772	   contains the entire attribute including the root pathname of the fs
9773	   and an array of lower-level structures that define replicas that
9774	   share a common root path on their respective servers.  The lower-
9775	   level structure in turn ( fs_locations_item4) contain a specific
9776	   pathname and information on one or more individual server replicas.
9777	   For that last lowest-level fs_locations_info has a
9778	   fs_locations_server4 structure that contains per-server-replica
9779	   information in addition to the server name.

9781	   As noted above, the fs_locations_info attribute, when supported, may
9782	   be requested of absent file systems without causing NFS4ERR_MOVED to
9783	   be returned and it is generally expected that it will be available
9784	   for both present and absent file systems even if only a single
9785	   fs_locations_server4 entry is present, designating the current
9786	   (present) file system, or two fs_locations_server4 entries
9787	   designating the current (and now previous) location of an absent file
9788	   system and its successor location.  Servers are strongly urged to
9789	   support this attribute on all file systems if they support it on any
9790	   file system.

9792	10.10.1.  The fs_locations_server4 Structure

9794	   The fs_locations_server4 structure consists of the following items:

9796	   o  An indication of file system up-to-date-ness (fls_currency) in
9797	      terms of approximate seconds before the present.  A negative value
9798	      indicates that the server is unable to give any reasonably useful
9799	      value here.  A zero indicates that file system is the actual
9800	      writable data or a reliably coherent and fully up-to-date copy.
9801	      Positive values indicate how out- of-date this copy can normally
9802	      be before it is considered for update.  Such a value is not a
9803	      guarantee that such updates will always be performed on the
9804	      required schedule but instead serve as a hint about how far behind
9805	      the most up-to-date copy of the data, this copy would normally be
9806	      expected to be.

9808	   o  A counted array of one-octet values (fls_info) containing
9809	      information about the particular file system instance.  This data
9810	      includes general flags, transport capability flags, file system
9811	      equivalence class information, and selection priority information.
9812	      The encoding will be discussed below.

9814	   o  The server string (fls_server).  For the case of the replica
9815	      currently being accessed (via GETATTR), a null string may be used
9816	      to indicate the current address being used for the RPC call.

9818	   Data within the fls_info array, is in the form of 8-bit data items
9819	   with constants giving the offsets within the array of various values
9820	   describing this particular file system instance.  This style of
9821	   definition was chosen, in preference to explicit XDR structure
9822	   definitions for these values for a number of reasons.

9824	   o  The kinds of data in the fls_info array, representing, flags, file
9825	      system classes and priorities among set of file systems
9826	      representing the same data are such that eight bits provides a
9827	      quite acceptable range of values.  Even where there might be more
9828	      than 256 such file system instances, having more than 256 distinct
9829	      classes or priorities is unlikely.

9831	   o  Explicit definition of the various specific data items within XDR
9832	      would limit expandability in that any extension within a
9833	      subsequent minor version would require yet another attribute,
9834	      leading to specification and implementation clumsiness.

9836	   o  Such explicit definitions would also make it impossible to propose
9837	      standards-track extensions apart from a full minor version.

9839	   This encoding scheme can be adapted to the specification of multi-
9840	   octet numeric values, even though none are currently defined.  If
9841	   extensions are made via standards-track RFC's, multi-octet quantities
9842	   will be encoded as a range of octet with a range of indices with the
9843	   octet interpreted in big endian octet order.

9845	   The set of fls_info data is subject to expansion in a future minor
9846	   version, or in a standard-track RFC, within the context of a single
9847	   minor version.  The server SHOULD NOT send and the client MUST not
9848	   use indices within the fls_info array that are not defined in
9849	   standards-track RFC's.

9851	   The fls_info array contains within it:

9853	   o  Two 8-bit flag fields, one devoted to general file-system
9854	      characteristics and a second reserved for transport-related
9855	      capabilities.

9857	   o  Four 8-bit class values which define various file system
9858	      equivalence classes as explained below.

9860	   o  Four 8-bit priority values which govern file system selection as
9861	      explained below.

9863	   The general file system characteristics flag (at octet index
9864	   FSLI4BX_GFLAGS) has the following bits defined within it:

9866	   o  FSLI4GF_WRITABLE indicates that this fs target is writable,
9867	      allowing it to be selected by clients which may need to write on
9868	      this file system.  When the current file system instance is
9869	      writable, then any other file system to which the client might
9870	      switch must incorporate within its data any committed write made
9871	      on the current file system instance.  See the section on verifier
9872	      class, for issues related to uncommitted writes.  While there is
9873	      no harm in not setting this flag for a file system that turns out
9874	      to be writable, turning the flag on for read-only file system can
9875	      cause problems for clients who select a migration or replication
9876	      target based on it and then find themselves unable to write.

9878	   o  FSLI4GF_CUR_REQ indicates that this replica is the one on which
9879	      the request is being made.  Only a single server entry may have
9880	      this flag set and in the case of a referral, no entry will have
9881	      it.

9883	   o  FSLI4GF_ABSENT indicates that this entry corresponds an absent
9884	      file system replica.  It can only be set if FSLI4GF_CUR_REQ is
9885	      set.  When both such bits are set it indicates that a file system
9886	      instance is not usable but that the information in the entry can
9887	      be used to determine the sorts of continuity available when
9888	      switching from this replica to other possible replicas.  Since
9889	      this bit can only be true if FSLI4GF_CUR_REQ is true, the value
9890	      could be determined using the fs_absent attribute but the
9891	      information is also made available here for the convenience of the
9892	      client.  An entry with this bit, since it represents a true file
9893	      system (albeit absent) does not appear in the event of a referral,
9894	      but only where a file system has been accessed at this location
9895	      and subsequently been migrated.

9897	   o  FSLI4GF_GOING indicates that a replica, while still available,
9898	      should not be used further.  The client, if using it, should make
9899	      an orderly transfer to another file system instance as
9900	      expeditiously as possible.  It is expected that file systems going
9901	      out of service will be announced as FSLI4GF_GOING some time before
9902	      the actual loss of service and that the valid_for value will be
9903	      sufficiently small to allow clients to detect and act on scheduled
9904	      events while large enough that the cost of the requests to fetch
9905	      the fs_locations_info values will not be excessive.  Values on the
9906	      order of ten minutes seem reasonable.

9908	   o  FSLI4GF_SPLIT indicates that when a transition occurs from the
9909	      current file system instance to this one, the replacement may
9910	      consist of multiple file systems.  In this case, the client has to
9911	      be prepared for the possibility that objects on the same fs before
9912	      migration will be on different ones after.  Note that
9913	      FSLI4GF_SPLIT is not incompatible with the file systems belong to
9914	      the same _fileid_ class since, if one has a set of fileid's that
9915	      are unique within an fs, each subset assigned to a smaller fs
9916	      after migration would not have any conflicts internal to that fs.

9918	      A client, in the case of a split file system will interrogate
9919	      existing files with which it has continuing connection (it is free
9920	      simply forget cached filehandles).  If the client remembers the
9921	      directory filehandle associated with each open file, it may
9922	      proceed upward using LOOKUPP to find the new fs boundaries.

9924	      Once the client recognizes that one file system has been split
9925	      into two, it could maintain applications running without
9926	      disruption by presenting the two file systems as a single one
9927	      until a convenient point to recognize the transition, such as a
9928	      reboot.  This would require a mapping of fsids from the server's
9929	      fsids to fsids as seen by the client but this is already necessary
9930	      for other reasons.  As noted above, existing fileids within the
9931	      two descendant fs's will not conflict.  Creation of new files in
9932	      the two descendent fs's may require some amount of fileid mapping
9933	      which can be performed very simply in many important cases.

9935	   The transport-flag field (at octet index FSLI4BX_TFLAGS) contains the
9936	   following bits related to the transport capabilities of the specific
9937	   file system.

9939	   o  FSLI4TF_RDMA indicates that this file system provides NFSv4.1 file
9940	      system access using an RDMA-capable transport.

9942	   Attribute continuity and file system identity information are
9943	   expressed by defining equivalence relations on the sets of file
9944	   systems presented to the client.  Each such relation is expressed as
9945	   a set of file system equivalence classes.  For each relation, a file
9946	   system has an 8-bit class number.  Two file systems belong to the
9947	   same class if both have identical non-zero class numbers.  Zero is
9948	   treated as non-matching.  Most often, the relevant question for the
9949	   client will be whether a given replica is identical-to/
9950	   continuous-with the current one in a given respect but the
9951	   information should be available also as to whether two other replicas
9952	   match in that respect as well.

9954	   The following fields specify the file system's class numbers for the
9955	   equivalence relations used in determining the nature of file system
9956	   transitions.  See Section 10.6 for details about how this information
9957	   is to be used.

9959	   o  The field with octet index FSLI4BX_CLSIMUL defines the
9960	      simultaneous-use class for the file system.

9962	   o  The field with octet index FSLI4BX_CLHANDLE defines the handle
9963	      class for the file system.

9965	   o  The field with octet index FSLI4BX_CLFILEID defines the fileid
9966	      class for the file system.

9968	   o  The field with octet index FSLI4BX_CLVERIFIER defines the verifier
9969	      class for the file system.

9971	   o  The field with octet index FSLI4BX_CLCHANGE defines the change
9972	      class for the file system.

9974	   Server-specified preference information is also provided via 8-bit
9975	   values within the fls_info array.  The values provide a rank and an
9976	   order (see below) to be used with separate values specifiable for the
9977	   cases of read-only and writable file systems.  These values are
9978	   compared for different file systems to establish the server-specified
9979	   preference, with lower values indicating "more preferred".

9981	   Rank is used to express a strict server-imposed ordering on clients,
9982	   with lower values indicating "more preferred."  Clients should
9983	   attempt to use all replicas with a given rank before they use one
9984	   with a higher rank.  Only if all of those file systems are
9985	   unavailable should the client proceed to those of a higher rank.

9987	   Within a rank, the order value is used to specify the server's
9988	   preference to guide the client's selection when the client's own
9989	   preferences are not controlling, with lower values of order
9990	   indicating "more preferred."  If replicas are approximately equal in
9991	   all respects, clients should defer to the order specified by the
9992	   server.  When clients look at server latency as part of their
9993	   selection, they are free to use this criterion but it is suggested
9994	   that when latency differences are not significant, the server-
9995	   specified order should guide selection.

9997	   o  The field at octet index FSLI4BX_READRANK gives the rank value to
9998	      be used for read-only access.

10000	   o  The field at octet index FSLI4BX_READOREDER gives the order value
10001	      to be used for read-only access.

10003	   o  The field at octet index FSLI4BX_WRITERANK gives the rank value to
10004	      be used for writable access.

10006	   o  The field at octet index FSLI4BX_WRITEOREDER gives the order value
10007	      to be used for writable access.

10009	   Depending on the potential need for write access by a given client,
10010	   one of the pairs of rank and order values is used.  The read rank and
10011	   order should only be used if the client knows that only reading will
10012	   ever be done or if it is prepared to switch to a different replica in
10013	   the event that any write access capability is required in the future.

10015	10.10.2.  The fs_locations_info4 Structure

10017	   The fs_locations_info4 structure, encoding the fs_locations_info
10018	   attribute, contains the following:

10020	   o  The fli_flags field which contains general flags that affect the
10021	      interpretation of this fs_locations_info4 structure and all
10022	      fs_locations_item4 structures within it.  The only flag currently
10023	      defined is FSLI4IF_VAR_SUB.  All bits in the fli_flags field which
10024	      are not defined should always be returned as zero.

10026	   o  The fli_fs_root field which contains the pathname of the root of
10027	      the current file system on the current server, just as it does the
10028	      fs_locations4 structure.

10030	   o  An array called fli_items of fs_locations4_item structures, which
10031	      contain information about replicas of the current file system.
10032	      Where the current file system is actually present, or has been
10033	      present, i.e. this is not a referral situation, one of the
10034	      fs_locations_item4 structures will contain an fs_locations_server4
10035	      for the current server.  This structure will have FSLI4GF_ABSENT
10036	      set if the current file system is absent, i.e. normal access to it
10037	      will return NFS4ERR_MOVED.

10039	   o  The fli_valid_for field specifies a time in seconds for which it
10040	      is reasonable for a client to use the fs_locations_info attribute
10041	      without refetch.  The fli_valid_for value does not provide a
10042	      guarantee of validity since servers can unexpectedly go out of
10043	      service or become inaccessible for any number of reasons.  Clients
10044	      are well-advised to refetch this information for actively accessed
10045	      file system at every fli_valid_for seconds.  This is particularly
10046	      important when file system replicas may go out of service in a
10047	      controlled way using the FSLI4GF_GOING flag to communicate an
10048	      ongoing change.  The server should set fli_valid_for to a value
10049	      which allows well-behaved clients to notice the FSLI4GF_GOING flag
10050	      and make an orderly switch before the loss of service becomes
10051	      effective.  If this value is zero, then no refetch interval is
10052	      appropriate and the client need not refetch this data on any
10053	      particular schedule.  In the event of a transition to a new file
10054	      system instance, a new value of the fs_locations_info attribute
10055	      will be fetched at the destination and it is to be expected that
10056	      this may have a different valid_for value, which the client should
10057	      then use, in the same fashion as the previous value.

10059	   The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable
10060	   substitution is to be enabled

10062	10.10.3.  The fs_locations_item4 Structure

10064	   The fs_locations_item4 structure contains a pathname (in the field
10065	   fli_rootpath) which encodes the path of the target file system
10066	   replicas on the set of servers designated by the included
10067	   fs_locations_server4 entries.  The precise manner in which this
10068	   target location is specified depends on the value of the
10069	   FSLI4IF_VAR_SUB flag within the associated fs_locations_info4
10070	   structure.

10072	   If this flag is not set, then fli_rootpath simply designates the
10073	   location of the target file system within each server's single-server
10074	   namespace just as it does for the rootpath within the fs_location
10075	   structure.  When this bit is set, however, component entries of a
10076	   certain form are subject to client-specific variable substitution so
10077	   as to allow a degree of namespace non-uniformity in order to
10078	   accommodate the selection of client-specific file system targets to
10079	   adapt to different client architectures or other characteristics.

10081	   When such substitution is in effect a variable beginning with the
10082	   string "${" and ending with the string "}" and containing a colon is
10083	   to be replaced by the client-specific value associated with that
10084	   variable.  The string "unknown" should be used by the client when it
10085	   has no value for such a variable.  The pathname resulting from such
10086	   substitutions is used to designate the target file system, so that
10087	   different clients may have different file systems, corresponding to
10088	   that location in the multi-server namespace.

10090	   As mentioned above, such substituted pathname variables contain a
10091	   colon.  The part before the colon is to be a DNS domain name with the
10092	   part after being a case-insensitive alphanumeric string.

10094	   Where the domain is "ietf.org", only variable names defined in this
10095	   document or subsequent standards-track RFC's are subject to such
10096	   substitution.  Organizations are free to use their domain names to
10097	   create their own sets of client-specific variables, to be subject to
10098	   such substitution.  In case where such variables are intended to be
10099	   used more broadly than a single organization, publication of an
10100	   informational RFC defining such variables is recommended.

10102	   The variable ${ietf.org:CPU_ARCH} is used to denote the CPU
10103	   architecture object files are compiled.  This specification does not
10104	   limit the acceptable values (except that they must be valid UTF-8
10105	   strings) but such values as "x86", "x86_64" and "sparc" would be
10106	   expected to be used in line with industry practice.

10108	   The variable ${ietf.org:OS_TYPE} is used to denote the operating
10109	   system and thus the kernel and library API's for which code might be
10110	   compiled.  This specification does not limit the acceptable values
10111	   (except that they must be valid UTF-8 strings) but such values as
10112	   "linux" and "freebsd" would be expected to be used in line with
10113	   industry practice.

10115	   The variable ${ietf.org:OS_VERSION} is used to denote the operating
10116	   system version and the thus the specific details of versioned
10117	   interfaces for which code might be compiled.  This specification does
10118	   not limit the acceptable values (except that they must be valid UTF-8
10119	   strings) but combinations of numbers and letters with interspersed
10120	   dots would be expected to be used in line with industry practice,
10121	   with the details of the version format depending on the specific
10122	   value of the value of the variable ${ietf.org:OS_TYPE} with which it
10123	   is used.

10125	   Use of these variable could result in direction of different clients
10126	   to different file systems on the same server, as appropriate to
10127	   particular clients.  In cases in which the target file systems are
10128	   located on different servers, a single server could serve as a
10129	   referral point so that each valid combination of variable values
10130	   would designate a referral hosted on a single server, with the
10131	   targets of those referrals on a number of different servers.

10133	   Although variable substitution is most suitable for use in the
10134	   context of referrals, if may be used in the context of replication
10135	   and migration.  If it is used in these contexts, the server must
10136	   ensure that no matter what values the client presents for the
10137	   substituted variables, the result is always a valid successor file
10138	   system instance to that from which a transition is occurring, i.e.
10139	   that the data is identical or represents a later image of a writable
10140	   file system.

10142	   Note that when fli_rootpath is a null pathname (that is, one with
10143	   zero components), the file system designated is at the root of the
10144	   specified server, whether the FSLI4IF_VAR_SUB flag within the
10145	   associated fs_locations_info4 structure is set or not.

10147	10.11.  The Attribute fs_status

10149	   In an environment in which multiple copies of the same basic set of
10150	   data are available, information regarding the particular source of
10151	   such data and the relationships among different copies, can be very
10152	   helpful in providing consistent data to applications.

10154	             enum fs4_status_type {
10155	                     STATUS4_FIXED = 1,
10156	                     STATUS4_VERSIONED = 2,
10157	                     STATUS4_UPDATED = 3,
10158	                     STATUS4_WRITABLE = 4,
10159	                     STATUS4_ABSENT = 5
10160	             };

10162	             struct fs4_status {
10163	                     fs4_status_type fsstat_type;
10164	                     utf8str_cs      fsstat_source;
10165	                     utf8str_cs      fsstat_current;
10166	                     int32_t         fsstat_age;
10167	                     nfstime4        fsstat_version;
10168	             };

10170	   The type value indicates the kind of file system image represented.
10171	   This is of particular importance when using the version values to
10172	   determine appropriate succession of file system images.  Five types
10173	   are distinguished:

10175	   o  STATUS4_FIXED which indicates a read-only image in the sense that
10176	      it will never change.  The possibility is allowed that as a result
10177	      of migration or switch to a different image, changed data can be
10178	      accessed but within the confines of this instance, no change is
10179	      allowed.  The client can use this fact to aggressively cache.

10181	   o  STATUS4_VERSIONED which indicates that the image, like the
10182	      STATUS4_UPDATED case, is updated exogenously, but it provides a
10183	      guarantee that the server will carefully update an associated
10184	      version value so that the client can protect itself from a
10185	      situation in which it reads data from one version of the file
10186	      system, and then later reads data from an earlier version of the
10187	      same file system.  See below for a discussion of how this can be
10188	      done.

10190	   o  STATUS4_UPDATED which indicates an image that cannot be updated by
10191	      the user writing to it but may be changed exogenously, typically
10192	      because it is a periodically updated copy of another writable file
10193	      system somewhere else.  In this case, version information is not
10194	      provided and the client does not have the responsibility of making
10195	      sure that this version only advances upon a file system instance
10196	      transition.  In this case, it is the responsibility of the server
10197	      to make sure that the data presented after a file system instance
10198	      transition is a proper successor image and includes all changes
10199	      seen by the client and any change made before all such changes.

10201	   o  STATUS4_WRITABLE which indicates that the file system is an actual
10202	      writable one.  The client need not of course actually write to the
10203	      file system, but once it does, it should not accept a transition
10204	      to anything other than a writable instance of that same file
10205	      system.

10207	   o  STATUS4_ABSENT which indicates that the information is the last
10208	      valid for a file system which is no longer present.

10210	   The opaque strings source and current provide a way of presenting
10211	   information about the source of the file system image being present.
10212	   It is not intended that client do anything with this information
10213	   other than make it available to administrative tools.  It is intended
10214	   that this information be helpful when researching possible problems
10215	   with a file system image that might arise when it is unclear if the
10216	   correct image is being accessed and if not, how that image came to be
10217	   made.  This kind of debugging information will be helpful, if, as
10218	   seems likely, copies of file systems are made in many different ways
10219	   (e.g. simple user-level copies, file system- level point-in-time
10220	   copies, cloning of the underlying storage), under a variety of
10221	   administrative arrangements.  In such environments, determining how a
10222	   given set of data was constructed can be very helpful in resolving
10223	   problems.

10225	   The opaque string 'source' is used to indicate the source of a given
10226	   file system with the expectation that tools capable of creating a
10227	   file system image propagate this information, when that is possible.
10228	   It is understood that this may not always be possible since a user-
10229	   level copy may be thought of as creating a new data set and the tools
10230	   used may have no mechanism to propagate this data.  When a file
10231	   system is initially created associating with it data regarding how
10232	   the file system was created, where it was created, by whom, etc. can
10233	   be put in this attribute in a human- readable string form so that it
10234	   will be available when propagated to subsequent copies of this data.

10236	   The opaque string 'current' should provide whatever information is
10237	   available about the source of the current copy.  Such information as
10238	   the tool creating it, any relevant parameters to that tool, the time
10239	   at which the copy was done, the user making the change, the server on
10240	   which the change was made etc.  All information should be in a human-
10241	   readable string form.

10243	   The age provides an indication of how out-of-date the file system
10244	   currently is with respect to its ultimate data source (in case of
10245	   cascading data updates).  This complements the fls_currency field of
10246	   fs_locations_server4 (See Section 10.10) in the following way: the
10247	   information in fls_currency gives a bound for how out of date the
10248	   data in a file system might typically get, while the age gives a
10249	   bound on how out of date that data actually is.  Negative values
10250	   imply no information is available.  A zero means that this data is
10251	   known to be current.  A positive value means that this data is known
10252	   to be no older than that number of seconds with respect to the
10253	   ultimate data source.

10255	   The version field provides a version identification, in the form of a
10256	   time value, such that successive versions always have later time
10257	   values.  When the file system type is anything other than
10258	   STATUS4_VERSIONED, the server may provide such a value but there is
10259	   no guarantee as to its validity and clients will not use it except to
10260	   provide additional information to add to 'source' and 'current'.

10262	   When the type is STATUS4_VERSIONED, servers should provide a value of
10263	   version which progresses monotonically whenever any new version of
10264	   the data is established.  This allows the client, if reliable image
10265	   progression is important to it, to fetch this attribute as part of
10266	   each COMPOUND where data or metadata from the file system is used.

10268	   When it is important to the client to make sure that only valid
10269	   successor images are accepted, it must make sure that it does not
10270	   read data or metadata from the file system without updating its sense
10271	   of the current state of the image, to avoid the possibility that the
10272	   fs_status which the client holds will be one for an earlier image,
10273	   and so accept a new file system instance which is later than that but
10274	   still earlier than updated data read by the client.

10276	   In order to do this reliably, it must do a GETATTR of fs_status that
10277	   follows any interrogation of data or metadata within the file system
10278	   in question.  Often this is most conveniently done by appending such
10279	   a GETATTR after all other operations that reference a given file
10280	   system.  When errors occur between reading file system data and
10281	   performing such a GETATTR, care must be exercised to make sure that
10282	   the data in question is not used before obtaining the proper
10283	   fs_status value.  In this connection, when an OPEN is done within
10284	   such a versioned file system and the associated GETATTR of fs_status
10285	   is not successfully completed, the open file in question must not be
10286	   accessed until that fs_status is fetched.

10288	   The procedure above will ensure that before using any data from the
10289	   file system the client has in hand a newly-fetched current version of
10290	   the file system image.  Multiple values for multiple requests in
10291	   flight can be resolved by assembling them into the required partial
10292	   order (and the elements should form a total order within it) and
10293	   using the last.  The client may then, when switching among file
10294	   system instances, decline to use an instance which is not of type
10295	   STATUS4_VERSIONED or whose version field is earlier than the last one
10296	   obtained from the predecessor file system instance.

10298	11.  Directory Delegations

10300	11.1.  Introduction to Directory Delegations

10302	   Directory caching for the NFSv4.1 protocol is similar to previous
10303	   versions.  Clients typically cache directory information for a
10304	   duration determined by the client.  At the end of a predefined
10305	   timeout, the client will query the server to see if the directory has
10306	   been updated.  By caching attributes, clients reduce the number of
10307	   GETATTR calls made to the server to validate attributes.
10308	   Furthermore, frequently accessed files and directories, such as the
10309	   current working directory, have their attributes cached on the client
10310	   so that some NFS operations can be performed without having to make
10311	   an RPC call.  By caching name and inode information about most
10312	   recently looked up entries in the Directory Name Lookup Cache (DNLC),
10313	   clients do not need to send LOOKUP calls to the server every time
10314	   these files are accessed.

10316	   This caching approach works reasonably well at reducing network
10317	   traffic in many environments.  However, it does not address
10318	   environments where there are numerous queries for files that do not
10319	   exist.  In these cases of "misses", the client must make RPC calls to
10320	   the server in order to provide reasonable application semantics and
10321	   promptly detect the creation of new directory entries.  Examples of
10322	   high miss activity are compilation in software development
10323	   environments.  The current behavior of NFS limits its potential
10324	   scalability and wide-area sharing effectiveness in these types of
10325	   environments.  Other distributed stateful file system architectures
10326	   such as AFS and DFS have proven that adding state around directory
10327	   contents can greatly reduce network traffic in high miss
10328	   environments.

10330	   Delegation of directory contents is a RECOMMENDED feature of NFSv4.1.
10331	   Directory delegations provide similar traffic reduction benefits as
10332	   with file delegations.  By allowing clients to cache directory
10333	   contents (in a read-only fashion) while being notified of changes,
10334	   the client can avoid making frequent requests to interrogate the
10335	   contents of slowly-changing directories, reducing network traffic and
10336	   improving client performance.

10338	   Directory delegations allow improved namespace cache consistency to
10339	   be achieved through delegations and synchronous recalls alone without
10340	   asking for notifications.  In addition, if time-based consistency is
10341	   sufficient, asynchronous notifications can provide performance
10342	   benefits for the client, and possibly the server, under some common
10343	   operating conditions such as slowly-changing and/or very large
10344	   directories.

10346	11.2.  Directory Delegation Design

10348	   NFSv4.1 introduces the GET_DIR_DELEGATION (Section 17.39) operation
10349	   to allow the client to ask for a directory delegation.  The
10350	   delegation covers directory attributes and all entries in the
10351	   directory.  If either of these change the delegation will be recalled
10352	   synchronously.  The operation causing the recall will have to wait
10353	   before the recall is complete.  Any changes to directory entry
10354	   attributes will not cause the delegation to be recalled.

10356	   In addition to asking for delegations, a client can also ask for
10357	   notifications for certain events.  These events include changes to
10358	   directory attributes and/or its contents.  If a client asks for
10359	   notification for a certain event, the server will notify the client
10360	   when that event occurs.  This will not result in the delegation being
10361	   recalled for that client.  The notifications are asynchronous and
10362	   provide a way of avoiding recalls in situations where a directory is
10363	   changing enough that the pure recall model may not be effective while
10364	   trying to allow the client to get substantial benefit.  In the
10365	   absence of notifications, once the delegation is recalled the client
10366	   has to refresh its directory cache which might not be very efficient
10367	   for very large directories.

10369	   The delegation is read only and the client may not make changes to
10370	   the directory other than by performing NFSv4 operations that modify
10371	   the directory or the associated file attributes so that the server
10372	   has knowledge of these changes.  In order to keep the client
10373	   namespace synchronized with the server, the server will notify the
10374	   client holding the delegation of the changes made as a result.  This
10375	   is to avoid any subsequent GETATTR or READDIR calls to the server.
10376	   If a single client is holding the delegation and that client makes
10377	   any changes to the directory, the delegation will not be recalled.
10378	   Multiple clients may hold a delegation on the same directory, but if
10379	   any such client modifies the directory, the server MUST recall the
10380	   delegation from the other clients.

10382	   Delegations can be recalled by the server at any time.  Normally, the
10383	   server will recall the delegation when the directory changes in a way
10384	   that is not covered by the notification, or when the directory
10385	   changes and notifications have not been requested.

10387	   Also if the server notices that handing out a delegation for a
10388	   directory is causing too many notifications or recalls to be sent
10389	   out, it may decide not to hand out a delegation for that directory or
10390	   recall existing delegations.  If another client removes the directory
10391	   for which a delegation has been granted, the server will recall the
10392	   delegation.

10394	11.3.  Attributes in Support of Directory Notifications

10396	   See Section 5.12 for a description of the attributes associated with
10397	   directory notifications.

10399	11.4.  Delegation Recall

10401	   The server will recall the directory delegation by sending a callback
10402	   to the client.  It will use the same callback procedure as used for
10403	   recalling file delegations.  The server will recall the delegation
10404	   when the directory changes in a way that is not covered by the
10405	   notification.  However the server will not recall the delegation if
10406	   attributes of an entry within the directory change.  Also if the
10407	   server notices that handing out a delegation for a directory is
10408	   causing too many notifications to be sent out, it may decide not to
10409	   hand out a delegation for that directory.  If another client tries to
10410	   remove the directory for which a delegation has been granted, the
10411	   server will recall the delegation.

10413	   The server will recall the delegation by sending a CB_RECALL callback
10414	   to the client.  If the recall is done because of a directory changing
10415	   event, the request making that change will need to wait while the
10416	   client returns the delegation.

10418	11.5.  Directory Delegation Recovery

10420	   Crash recovery for state on regular files has two main goals,
10421	   avoiding the necessity of breaking application guarantees with
10422	   respect to locked files and delivery of updates cached at the client.
10423	   Neither of these applies to directories protected by read delegations
10424	   and notifications.  Thus, the client is required to establish a new
10425	   delegation on a server or client reboot.  [[Comment.15: we have
10426	   special reclaim types allow clients to recovery delegations through
10427	   client reboot.  Do we really want EXCHANGE_ID/CREATE_SESSION to
10428	   destroy directory delegation state?]]

10430	12.  Parallel NFS (pNFS)

10432	12.1.  Introduction

10434	   PNFS is a set of OPTIONAL features of NFSv4.1 which allow direct
10435	   client access to the storage devices containing the file data.  When
10436	   file data for a single NFSv4 server is stored on multiple and/or
10437	   higher throughput storage devices (by comparison to the server's
10438	   throughput capability), the result can be significantly better file
10439	   access performance.  The relationship among multiple clients, a
10440	   single server, and multiple storage devices for pNFS (server and
10441	   clients have access to all storage devices) is shown in this diagram:

10443	       +-----------+
10444	       |+-----------+                                 +-----------+
10445	       ||+-----------+                                |           |
10446	       |||           |        NFSv4 + pNFS            |           |
10447	       +||  Clients  |<------------------------------>|   Server  |
10448	        +|           |                                |           |
10449	         +-----------+                                |           |
10450	              |||                                     +-----------+
10451	              |||                                           |
10452	              |||                                           |
10453	              ||| Storage        +-----------+              |
10454	              ||| Protocol       |+-----------+             |
10455	              ||+----------------||+-----------+  Control   |
10456	              |+-----------------|||           |    Protocol|
10457	              +------------------+||  Storage  |------------+
10458	                                  +|  Devices  |
10459	                                   +-----------+

10461	                                 Figure 62

10463	   In this structure, the responsibility for coordination of file access
10464	   by multiple clients is shared among the server, clients, and storage
10465	   devices.  This is in contrast to NFSv4 without pNFS in which this is
10466	   primarily the server's responsibility, some of which can be delegated
10467	   to clients under strictly specified conditions.

10469	   PNFS takes the form of OPTIONAL operations that manage data location
10470	   information called a layout.  The layout is managed in a similar
10471	   fashion as NFSv4 data delegations (e.g., they are recallable and
10472	   revocable).  However, they are distinct abstractions and are
10473	   manipulated with new operations.  When a client holds a layout, it
10474	   has rights to access the data directly using the location information
10475	   in the layout.

10477	   This document specifies the use of NFSv4.1 as a storage protocol.
10478	   PNFS allows other storage protocols, and these protocols are
10479	   deliberately not specified here.  These might include:

10481	   o  Block/volume protocols such as iSCSI ([29]), and FCP ([30]).  The
10482	      block/volume protocol support can be independent of the addressing
10483	      structure of the block/volume protocol used, allowing more than
10484	      one protocol to access the same file data and enabling
10485	      extensibility to other block/volume protocols.

10487	   o  Object protocols such as OSD over iSCSI or Fibre Channel [31].

10489	   o  Other storage protocols, including PVFS and other file systems
10490	      that are in use in HPC environments.

10492	   With some storage protocols, the storage devices cannot perform fine-
10493	   grained access checks to ensure that clients are only performing
10494	   accesses within the bounds permitted to them by the pNFS operations
10495	   with the server (e.g., the checks may only be possible at file system
10496	   granularity rather than file granularity).  In situations where this
10497	   added responsibility placed on clients creates unacceptable security
10498	   risks, pNFS configurations in which storage devices cannot perform
10499	   fine-grained access checks SHOULD NOT be used.  All pNFS server
10500	   implementations MUST support NFSv4.1 access to any file accessible
10501	   via pNFS in order to provide an interoperable means of file access in
10502	   such situations.  See Section 12.9 on Security for further
10503	   discussion.

10505	   There are issues about how layouts interact with the existing NFSv4
10506	   abstractions of data delegations and record locking.  Delegation
10507	   issues are discussed in Section 12.5.4.  Byte range locking issues
10508	   are discussed in Section 12.2.10 and Section 12.5.1.

10510	12.2.  PNFS Definitions

10512	   PNFS partitions the NFSv4.1 file system protocol into two parts, the
10513	   metadata path and the data path.  The metadata path is implemented by
10514	   a metadata server that supports pNFS and the operations described in
10515	   this document (Section 17).  The data path is implemented by a
10516	   storage device that supports the storage protocol.  A subset (defined
10517	   in Section 13.7) of NFSv4.1 is one such storage protocol.  This leads
10518	   to new terms used to describe the protocol extension and some
10519	   clarifications of existing terms.

10521	12.2.1.  Metadata

10523	   This is information about a file, such as its name, owner, where it
10524	   stored, and so forth.  Metadata also includes lower-level information
10525	   like block addresses and indirect block pointers.

10527	12.2.2.  Metadata Server

10529	   A pNFS metadata server is an NFSv4.1 server which supports pNFS
10530	   operations and features.  When supporting pNFS the metadata server
10531	   might hold only the metadata associated with a file, while the data
10532	   can be stored on the storage devices.  However, data may also be
10533	   written through the metadata server which in turn ensures data is
10534	   written to the storage devices.

10536	12.2.3.  Client

10538	   A pNFS client is a NFSv4.1 client as defined by this document, which
10539	   supports pNFS operations and features, and supports least one storage
10540	   protocol for performing I/O directly to storage devices.

10542	12.2.4.  Storage Device

10544	   A storage device controls a regular file's data, but leaves other
10545	   metadata management up to the metadata server.  A storage device
10546	   could be another NFSv4.1 server, an object storage device (OSD), a
10547	   block device accessed over a SAN (e.g., either FiberChannel or iSCSI
10548	   SAN), or some other entity.

10550	12.2.5.  Data Server

10552	   A data server is a storage device that is implemented by a server of
10553	   higher level storage access protocol, such as NFSv4.1.

10555	12.2.6.  Storage Protocol or Data Protocol

10557	   A storage protocol or data protocol is the used between the pNFS
10558	   client and the storage device to access the file data.  Three layout
10559	   types have been described: file protocols (i.e., NFSv4.1), object
10560	   protocols (e.g., OSD), and block/volume protocols (e.g., based on
10561	   SCSI-block commands).  These protocols are in turn realizable over a
10562	   variety of transport stacks.

10564	   Depending the storage protocol, block-level metadata may or may not
10565	   be managed by the metadata server, but is instead managed by object
10566	   storage devices or other servers acting as a storage device.

10568	12.2.7.  Control Protocol

10570	   The control protocol is used by the exported file system between the
10571	   metadata server and storage devices.  Specification of such protocols
10572	   is outside the scope of this document.  Such control protocols would
10573	   be used to control such activities as the allocation and deallocation
10574	   of storage and the management of state required by the data servers
10575	   to perform client access control.

10577	   While pNFS allows for any control protocol, in practice the control
10578	   protocol is closely related to the storage protocol.  For example, if
10579	   the data servers are NFSv4.1 servers, then the protocol between the
10580	   metadata server and the data servers is likely to involve NFSv4.1
10581	   operations.  Similarly, when object storage devices are used, the
10582	   pNFS metadata server will likely use iSCSI/OSD commands to manipulate
10583	   storage.

10585	   Regardless, this document does not mandate any particular control
10586	   protocol.  Instead, it just describes the requirements on the control
10587	   protocol for maintaining attributes like modify time, the change
10588	   attribute, and the end-of-file (EOF) position.

10590	12.2.8.  Layout

10592	   A layout defines how a file's data is organized on one or more
10593	   storage devices.  There are many possible layout types.  They vary in
10594	   the storage protocol used to access the data, and in the aggregation
10595	   scheme that lays out the file data on the underlying storage devices.
10596	   A layout is more precisely identified by the following tuple:
10597	   <Client, filehandle, layout type>; where filehandle refers to the
10598	   filehandle of the file on the metadata server.  Layouts describe a
10599	   file, not an octet-range of a file; Section 12.2.11 describes layout
10600	   segments which do pertain to a range.

10602	12.2.9.  Layout Types

10604	   A layout describes the mapping of a file's data to the storage
10605	   devices that hold the data.  A layout is said to belong to a specific
10606	   layout type (data type layouttype4, see Section 3.2.15).  The layout
10607	   type allows for variants to handle different storage protocols, such
10608	   as block/volume [24], object [23], and file (Section 13) layout
10609	   types.  A metadata server, along with its control protocol, MUST
10610	   support at least one layout type.  A private sub-range of the layout
10611	   type name space is also defined.  Values from the private layout type
10612	   range can be used for internal testing or experimentation.

10614	   As an example, a file layout type could be an array of tuples (e.g.,
10615	   deviceID, file_handle), along with a definition of how the data is
10616	   stored across the devices (e.g., striping).  A block/volume layout
10617	   might be an array of tuples that store <deviceID, block_number, block
10618	   count> along with information about block size and the file offset of
10619	   the first block.  An object layout might be an array of tuples
10620	   <deviceID, objectID> and an additional structure (i.e., the
10621	   aggregation map) that defines how the logical octet sequence of the
10622	   file data is serialized into the different objects.  Note, the actual
10623	   layouts are more complex than these simple expository examples.

10625	12.2.10.  Layout Iomode

10627	   The layout iomode (data type layoutiomode4, see Section 3.2.23)
10628	   indicates to the metadata server the client's intent to perform
10629	   either just READ operations (Section 17.22) or a mixture of I/O
10630	   possibly containing WRITE (Section 17.32) and READ operations.  For
10631	   certain layout types, it is useful for a client to specify this
10632	   intent at LAYOUTGET (Section 17.43) time.  E.g., for block/volume
10633	   based protocols, block allocation could occur when a READ/WRITE
10634	   iomode is specified.  A special LAYOUTIOMODE4_ANY iomode is defined
10635	   and can only be used for LAYOUTRETURN and LAYOUTRECALL, not for
10636	   LAYOUTGET.  It specifies that layouts pertaining to both READ and
10637	   READ/WRITE iomodes are being returned or recalled, respectively.

10639	   A storage device may validate I/O with regards to the iomode; this is
10640	   dependent upon storage device implementation and layout type.  Thus,
10641	   if the client's layout iomode differs from the I/O being performed,
10642	   the storage device may reject the client's I/O with an error
10643	   indicating a new layout with the correct I/O mode should be fetched.
10644	   E.g., if a client gets a layout with a READ iomode and performs a
10645	   WRITE to a storage device, the storage device is allowed to reject
10646	   that WRITE.

10648	   The iomode does not conflict with OPEN share modes or lock requests;
10649	   open mode checks and lock enforcement are always enforced, and are
10650	   logically separate from the pNFS layout level.  As well, open modes
10651	   and locks are the preferred method for restricting user access to
10652	   data files.  E.g., an OPEN of read, deny-write does not conflict with
10653	   a LAYOUTGET containing an iomode of READ/WRITE performed by another
10654	   client.  Applications that depend on writing into the same file
10655	   concurrently may use record locking to serialize their accesses.

10657	12.2.11.  Layout Segment

10659	   Since a layout that describes an entire file may be very large, there
10660	   is a desire to manage layouts in smaller chunks that correspond to
10661	   octet-ranges of the file.  For example, the entire layout need not be
10662	   returned, recalled, or committed.  These chunks are called layout
10663	   segments and are further identified by the octet-range and iomode
10664	   they represent, yielding a layout segment identifier consisting of
10665	   <client ID, filehandle, layout type, range, iomode>.  The concepts of
10666	   a layout and its layout segments allow clients and metadata servers
10667	   to aggregate the results of layout operations into a singly
10668	   maintained layout.

10670	   It is important to define when layout segments overlap and/or
10671	   conflict with each other.  For two layout segments with overlapping
10672	   octet ranges to actually overlap each other, both segments must be of
10673	   the same layout type, correspond to the same filehandle, and have the
10674	   same iomode.  Layout segments conflict, when they overlap and differ
10675	   in the content of the layout (i.e., the storage device/file mapping
10676	   parameters differ).  Note, differing iomodes do not lead to
10677	   conflicting layouts.  It is permissible for layout segments with
10678	   different iomodes, pertaining to the same octet range, to be held by
10679	   the same client.

10681	12.2.12.  Device IDs

10683	   The device ID (data type deviceid4, see Section 3.2.16) names a
10684	   storage device.  In practice, a significant amount of information may
10685	   be required to fully address a storage device.  Instead of embedding
10686	   all that information in a layout, layouts embed device IDs.  The
10687	   NFSv4.1 operation GETDEVICEINFO (Section 17.40) is used to retrieve
10688	   the complete address information about the storage device according
10689	   to its layout type.  For example, the address of an NFSv4.1 data
10690	   server or of an object storage device could be an IP address and
10691	   port.  The address of a block storage device could be a volume label.

10693	   The device ID is qualified by the layout type and unique per file
10694	   system identifier (FSID, see Section 3.2.5).  This allows different
10695	   layout drivers to generate device IDs without the need for co-
10696	   ordination.

10698	   Clients cannot expect the mapping between device ID and storage
10699	   device address to persist across metadata server restart.  See
10700	   Section 12.7.4 for a description of how recovery works in that
10701	   situation.

10703	12.3.  PNFS Operations

10705	   NFSv4.1 has several operations that are needed for pNFS servers,
10706	   regardless of layout type or storage protocol.  These operations are
10707	   all issued to a metadata server and summarized here.

10709	   GETDEVICEINFO.  As noted previously (Section 12.2.12), GETDEVICEINFO
10710	      (Section 17.40) returns the mapping of device ID to storage device
10711	      address.

10713	   GETDEVICELIST  (Section 17.41), allows clients to fetch the all the
10714	      device ID to storage device address mappings of particular file
10715	      system.

10717	   LAYOUTGET  (Section 17.43) is used by a client to get a layout
10718	      segment for a file.

10720	   LAYOUTCOMMIT  (Section 17.42) is used to inform the metadata server
10721	      that the client wants to commit data it wrote to the storage
10722	      device (which as indicated in the layout segment returned by
10723	      LAYOUTGET).

10725	   LAYOUTRETURN  (Section 17.44) is used to return a layout segment or
10726	      all layouts belong to a file system to a metadata server.

10728	   The following pNFS-related operations are callback operations a
10729	   metadata server might issue to a pNFS client.

10731	   CB_LAYOUTRECALL  (Section 19.3) recalls a layout segment or all
10732	      layouts belonging to a file system, or all layouts belong to a
10733	      client ID.

10735	   CB_RECALL_ANY  (Section 19.6), tells a client that it needs to return
10736	      some number of recallable objects, including layouts, to the
10737	      metadata server.

10739	   CB_RECALLABLE_OBJ_AVAIL  (Section 19.7) tells a client that a
10740	      recallable object that it was denied (in case of pNFS, a layout,
10741	      denied by LAYOUTGET) due to resource exhaustion, is now available.

10743	12.4.  PNFS Attributes

10745	   A number of attributes specific to pNFS are listed and described in
10746	   Section 5.13

10748	12.5.  Layout Semantics

10750	12.5.1.  Guarantees Provided by Layouts

10752	   Layouts delegate to the client the ability to access data out of
10753	   band.  The layout guarantees the holder that the layout will be
10754	   recalled when the state encapsulated by the layout becomes invalid
10755	   (e.g., through some operation that directly or indirectly modifies
10756	   the layout) or, possibly, when a conflicting layout is requested, as
10757	   determined by the layout's iomode.  When a layout is recalled, and
10758	   then returned by the client, the client retains the ability to access
10759	   file data with normal NFSv4.1 I/O operations through the metadata
10760	   server.  Only the right to do I/O out-of-band is affected.

10762	   Holding a layout does not guarantee that a user of the layout has the
10763	   rights to access the data represented by the layout.  All user access
10764	   rights MUST be obtained through the appropriate open, lock, and
10765	   access operations (i.e., those that would be used in the absence of
10766	   pNFS).  However, if a valid layout for a file is not held by the
10767	   client, the storage device should reject all I/Os to that file's
10768	   octet range that originate from that client.  In summary, layouts and
10769	   ordinary file access controls are independent.  The act of modifying
10770	   a file for which a layout is held, does not necessarily conflict with
10771	   the holding of the layout that describes the file being modified.
10772	   However, with certain layout types (e.g., block/volume layouts), the
10773	   layout's iomode must agree with the type of I/O being performed.

10775	   Depending upon the layout type and storage protocol in use, storage
10776	   device access permissions may be granted by LAYOUTGET and may be
10777	   encoded within the type specific layout.  If access permissions are
10778	   encoded within the layout, the metadata server must recall the layout
10779	   when those permissions become invalid for any reason; for example
10780	   when a file becomes unwritable or inaccessible to a client.  Note,
10781	   clients are still required to perform the appropriate access
10782	   operations as described above (e.g., open and lock ops).  The degree
10783	   to which it is possible for the client to circumvent these access
10784	   operations must be clearly addressed by the individual layout type
10785	   documents, as well as the consequences of doing so.  In addition,
10786	   these documents must be clear about the requirements and non-
10787	   requirements for the checking performed by the server.

10789	   If the pNFS metadata server supports mandatory record locks then
10790	   record locks must behave as specified by the NFSv4.1 protocol, as
10791	   observed by users of files.  If a storage device is unable to
10792	   restrict access by a pNFS client which does not hold a required
10793	   mandatory record lock then the metadata server must not grant layouts
10794	   to a client, for that storage device, that permits any access that
10795	   conflicts with a mandatory record lock held by another client.  In
10796	   this scenario, it is also necessary for the metadata server to ensure
10797	   that record locks are not granted to a client if any other client
10798	   holds a conflicting layout (a layout that overlaps the range, and has
10799	   an iomode that conflicts with the lock type); in this case all
10800	   conflicting layouts must be recalled and returned before the lock
10801	   request can be granted.  This requires the metadata server to
10802	   understand the capabilities of its storage devices.

10804	12.5.2.  Getting a Layout

10806	   A client obtains a layout through a new operation, LAYOUTGET.  The
10807	   metadata server will give out layouts of a particular type (e.g.,
10808	   block/volume, object, or file) and aggregation as requested by the
10809	   client.  The client selects an appropriate layout type which the
10810	   server supports and the client is prepared to use.  The layout
10811	   returned to the client may not line up exactly with the requested
10812	   octet range.  A field within the LAYOUTGET request, loga_minlength,
10813	   specifies the minimum overlap that MUST exist between the requested
10814	   layout and the layout returned by the metadata server.  The
10815	   loga_minlength field should at least one.  A metadata server may give
10816	   out multiple overlapping, non-conflicting layout segments to the same
10817	   client in response to a LAYOUTGET.

10819	   There is no implied ordering between getting a layout and performing
10820	   a file OPEN.  For example, a layout may first be retrieved by placing
10821	   a LAYOUTGET operation in the same COMPOUND as the initial file OPEN.
10822	   Once the layout has been retrieved, it can be held across multiple
10823	   OPEN and CLOSE sequences.

10825	   The storage protocol used by the client to access the data on the
10826	   storage device is determined by the layout's type.  The client needs
10827	   to select a layout driver that understands how to interpret and use
10828	   that layout.  The method for layout driver selection used by the
10829	   client is outside the scope of the pNFS extension.

10831	   Although the metadata server is in control of the layout for a file,
10832	   the pNFS client can provide hints to the server when a file is opened
10833	   or created about the preferred layout type and aggregation schemes.
10834	   PNFS introduces a layout_hint (Section 5.13.4) attribute that the
10835	   client can set at file creation time to provide a hint to the server
10836	   for new files.  Setting this attribute separately, after the file has
10837	   been created could make it difficult, or impossible, for the server
10838	   implementation to comply.  This in turn further complicates the
10839	   exclusive file creation via OPEN, which when done via the EXCLUSIVE4
10840	   createmode does not allow the setting of attributes at file creation
10841	   time.  However as noted in Section 17.16.4, if the server supports a
10842	   persistent reply cache, the EXCLUSIVE4 createmode is not needed.
10843	   Therefore, a metadata server that supports the layout_hint attribute
10844	   MUST support a persistent session reply cache, and a pNFS client that
10845	   wants to set layout_hint at file creation (OPEN) time MUST NOT use
10846	   the EXCLUSIVE4 createmode, and instead MUST used GUARDED for an
10847	   exclusive regular file creation.

10849	12.5.3.  Committing a Layout

10851	   Due to the nature of the protocol, the file attributes, and data
10852	   location mapping (e.g., which offsets store data versus store holes,
10853	   see Section 13.5) information that exists on the metadata server may
10854	   become inconsistent in relation to the data stored on the storage
10855	   devices; e.g., when WRITEs occur before a layout has been committed
10856	   (e.g., between a LAYOUTGET and a LAYOUTCOMMIT).  Thus, it is
10857	   necessary to occasionally re-synchronized this state and make it
10858	   visible to other clients through the metadata server.

10860	   The LAYOUTCOMMIT operation is responsible for committing a modified
10861	   layout segment to the metadata server.  Note: the data should be
10862	   written and committed to the appropriate storage devices before the
10863	   LAYOUTCOMMIT occurs.  Note, if the data is being written
10864	   asynchronously (i.e., if using NFSv4.1 as the storage protocol, the
10865	   field committed in WRITE4resok is UNSTABLE4) through the metadata
10866	   server, a COMMIT to the metadata server is required to synchronize
10867	   the data and make it visible on the storage devices (see
10868	   Section 12.5.5 for more details).  The scope of this operation
10869	   depends on the storage protocol in use.  For block/volume-based
10870	   layouts, it may require updating the block list that comprises the
10871	   file and committing this layout to stable storage.  Whereas, for
10872	   file-layouts it requires some synchronization of attributes between
10873	   the metadata and storage devices (i.e., mainly the size attribute:
10874	   EOF).  It is important to note that the level of synchronization is
10875	   from the point of view of the client which issued the LAYOUTCOMMIT.
10876	   The updated state on the metadata server need only reflect the state
10877	   as of the client's last operation previous to the LAYOUTCOMMIT, it
10878	   need not reflect a globally synchronized state (e.g., other clients
10879	   may be performing, or may have performed I/O since the client's last
10880	   operation and the LAYOUTCOMMIT).

10882	   The control protocol is free to synchronize the attributes before it
10883	   receives a LAYOUTCOMMIT, however upon successful completion of a
10884	   LAYOUTCOMMIT, state that exists on the metadata server that describes
10885	   the file MUST be in sync with the state existing on the storage
10886	   devices that comprise that file as of the issuing client's last
10887	   operation.  Thus, a client that queries the size of a file between a
10888	   WRITE to a storage device and the LAYOUTCOMMIT may observe a size
10889	   that does not reflects the actual data written.

10891	12.5.3.1.  LAYOUTCOMMIT and mtime/atime/change

10893	   The change attribute and the modify/access times may be updated, by
10894	   the server, at LAYOUTCOMMIT time; since for some layout types, the
10895	   change attribute and atime/mtime cannot be updated by the appropriate
10896	   I/O operation performed at a storage device.  The arguments to
10897	   LAYOUTCOMMIT allow the client to provide suggested access and modify
10898	   time values to the server.  Again, depending upon the layout type,
10899	   these client provided values may or may not be used.  The server
10900	   should sanity check the client provided values before they are used.
10901	   For example, the server should ensure that time does not flow
10902	   backwards.  According to the NFSv4 specification, The client always
10903	   has the option to set these attributes through an explicit SETATTR
10904	   operation.

10906	   As mentioned, for some layout protocols the change attribute and
10907	   mtime/atime may be updated at or after the time the I/O occurred
10908	   (e.g., if the storage device is able to communicate these attributes
10909	   to the metadata server).  If, upon receiving a LAYOUTCOMMIT, the
10910	   server implementation is able to determine that the file did not
10911	   change since the last time the change attribute was updated (e.g., no
10912	   WRITEs or over-writes occurred), the implementation need not update
10913	   the change attribute; file-based protocols may have enough state to
10914	   make this determination or may update the change attribute upon each
10915	   file modification.  This also applies for mtime and atime; if the
10916	   server implementation is able to determine that the file has not been
10917	   modified since the last mtime update, the server need not update
10918	   mtime at LAYOUTCOMMIT time.  Once LAYOUTCOMMIT completes, the new
10919	   change attribute and mtime/atime should be visible if that file was
10920	   modified since the latest previous LAYOUTCOMMIT or LAYOUTGET.

10922	12.5.3.2.  LAYOUTCOMMIT and size

10924	   The file's size may be updated at LAYOUTCOMMIT time as well.  The
10925	   LAYOUTCOMMIT argument contains a field, loca_last_write_offset, that
10926	   indicates the highest octet offset written but not yet committed via
10927	   LAYOUTCOMMIT.  Note: this argument is switched on a boolean value
10928	   (field no_newoffset) indicating whether or not a previous write
10929	   occurred.  If no_newoffset is FALSE, no loca_last_write_offset is
10930	   given.  A loca_last_write_offset specifying an offset of 0 means
10931	   octet 0 was the highest last octet written.

10933	   The metadata server may do one of the following:

10935	   1.  It may update the file's size based on the last write offset.
10936	       However, to the extent possible, the metadata server should
10937	       sanity check any value to which the file's size is going to be
10938	       set.  E.g., it must not truncate the file based on the client
10939	       presenting a smaller last write offset than the file's current
10940	       size.

10942	   2.  If it has sufficient other knowledge of file size (e.g., by
10943	       querying the storage devices through the control protocol), it
10944	       may ignore the client provided argument and use the query-derived
10945	       value.

10947	   3.  It may use the last write offset as a hint, subject to correction
10948	       when other information is available as above.

10950	   The method chosen to update the file's size will depend on the
10951	   storage device's and/or the control protocol's implementation.  For
10952	   example, if the storage devices are block devices with no knowledge
10953	   of file size, the metadata server must rely on the client to set the
10954	   size appropriately.  A new size flag and length are also returned in
10955	   the results of a LAYOUTCOMMIT.  This union indicates whether a new
10956	   size was set, and to what length it was set.  If a new size is set as
10957	   a result of LAYOUTCOMMIT, then the metadata server must reply with
10958	   the new size.  As well, if the size is updated, the metadata server
10959	   in conjunction with the control protocol SHOULD ensure that the new
10960	   size is reflected by the storage devices immediately upon return of
10961	   the LAYOUTCOMMIT operation; e.g., a READ up to the new file size
10962	   should succeed on the storage devices (assuming no intervening
10963	   truncations).  Again, if the client wants to explicitly zero-extend
10964	   or truncate a file, SETATTR must be used; it need not be used when
10965	   simply writing past EOF via WRITE.

10967	12.5.3.3.  LAYOUTCOMMIT and layoutupdate

10969	   The LAYOUTCOMMIT argument contains a loca_layoutupdate field
10970	   (Section 17.42.2) of data type layoutupdate4 (Section 3.2.21).  This
10971	   argument is a layout type-specific structure.  The structure can be
10972	   used to pass arbitrary layout type-specific information from the
10973	   client to the metadata server at LAYOUTCOMMIT time.  For example, if
10974	   using a block/volume layout, the client can indicate to the metadata
10975	   server which reserved or allocated blocks the client used and did not
10976	   use.  The content of loca_layoutupdate (field lou_body) need not be
10977	   the same the layout type-specific content returned by LAYOUTGET
10978	   (Section 17.43.3) in the loc_body field of the lo_content field, of
10979	   the logr_layout field.  The content of loca_layoutupdate is defined
10980	   by the layout type specification and is opaque to LAYOUTCOMMIT.

10982	12.5.4.  Recalling a Layout

10984	   Since a layout protects a client's access to a file via a direct
10985	   client-storage-device path, a layout need only be recalled when it is
10986	   semantically unable to serve this function.  Typically, this occurs
10987	   when the layout no longer encapsulates the true location of the file
10988	   over the octet range it represents.  Any operation or action (e.g.,
10989	   server driven restriping or load balancing) that changes the layout
10990	   will result in a recall of the layout.  A layout is recalled by the
10991	   CB_LAYOUTRECALL callback operation (see Section 19.3).  This callback
10992	   can either recall a layout segment identified by a octet range, all
10993	   the layouts associated with a file system (FSID), or all layouts.
10994	   Recalling all layouts or all the layouts associated with a file
10995	   system also invalidates the client's device cache for the affected
10996	   file systems.  Multiple layout segments may be returned in a single
10997	   compound operation.  Section 12.5.4.2 discusses sequencing issues
10998	   surrounding the getting, returning, and recalling of layouts.

11000	   The iomode is also specified when recalling a layout or layout
11001	   segment.  Generally, the iomode in the recall request must match the
11002	   layout, or segment, being returned; e.g., a recall with an iomode of
11003	   LAYOUTIOMODE4_RW should cause the client to only return
11004	   LAYOUTIOMODE4_RW layout segments (not
11005	   LAYOUTIOMODE4_REALAYOUTIOMODE4_READ segments).  However, a special
11006	   LAYOUTIOMODE4_ANY enumeration is defined to enable recalling a layout
11007	   of any type (i.e., the client must return both read-only and read/
11008	   write layouts).

11010	   A REMOVE operation may cause the metadata server to recall the layout
11011	   to prevent the client from accessing a non-existent file and to
11012	   reclaim state stored on the client.  Since a REMOVE may be delayed
11013	   until the last close of the file has occurred, the recall may also be
11014	   delayed until this time.  As well, once the file has been removed,
11015	   after the last reference, the client SHOULD no longer be able to
11016	   perform I/O using the layout (e.g., with file-based layouts an error
11017	   such as ESTALE could be returned).

11019	   Although pNFS does not alter the caching capabilities of clients, or
11020	   their semantics, it recognizes that some clients may perform more
11021	   aggressive write-behind caching to optimize the benefits provided by
11022	   pNFS.  However, write-behind caching may impact the latency in
11023	   returning a layout in response to a CB_LAYOUTRECALL; just as caching
11024	   impacts DELEGRETURN with regards to data delegations.  Client
11025	   implementations should limit the amount of unwritten data they have
11026	   outstanding at any one time.  Server implementations may fence
11027	   clients from performing direct I/O to the storage devices if they
11028	   perceive that the client is taking too long to return a layout once
11029	   recalled.  A server may be able to monitor client progress by
11030	   watching client I/Os or by observing LAYOUTRETURNs of sub-portions of
11031	   the recalled layout.  The server can also limit the amount of dirty
11032	   data to be flushed to storage devices by limiting the octet ranges
11033	   covered in the layouts it gives out.

11035	   Once a layout has been returned, the client MUST NOT issue I/Os to
11036	   the storage devices for the file, octet range, and iomode represented
11037	   by the returned layout.  If a client does issue an I/O to a storage
11038	   device for which it does not hold a layout, the storage device SHOULD
11039	   reject the I/O.

11041	12.5.4.1.  Recall Callback Robustness

11043	   It has been assumed thus far that pNFS client state for a file
11044	   exactly matches the pNFS server state for that file and client
11045	   regarding layout ranges and permissions.  This assumption leads to
11046	   the implication that any callback results in a LAYOUTRETURN or set of
11047	   LAYOUTRETURNs that exactly match the range in the callback, since
11048	   both client and server agree about the state being maintained.
11049	   However, it can be useful if this assumption does not always hold.
11050	   For example:

11052	   o  It may be useful for clients to be able to discard layout
11053	      information without calling LAYOUTRETURN.  If conflicts that
11054	      require callbacks are very rare, and a server can use a multi-file
11055	      callback to recover per-client resources (e.g., via a FSID recall,
11056	      or a multi-file recall within a single compound), the result may
11057	      be significantly less client-server pNFS traffic.

11059	   o  It may be similarly useful for servers to maintain information
11060	      about what ranges are held by a client on a coarse-grained basis,
11061	      leading to the server's layout ranges being beyond those actually
11062	      held by the client.  In the extreme, a server could manage
11063	      conflicts on a per-file basis, only issuing whole-file callbacks
11064	      even though clients may request and be granted sub-file ranges.

11066	   o  In order to avoid errors, it is vital that a client not assign
11067	      itself layout permissions beyond what the server has granted and
11068	      that the server not forget layout permissions that have been
11069	      granted.  On the other hand, if a server believes that a client
11070	      holds a layout segment that the client does not know about, it's
11071	      useful for the client to cleanly indicate completion of the
11072	      requested recall either by issuing a LAYOUTRETURN for the entire
11073	      requested range or by returning an NFS4ERR_NOMATCHING_LAYOUT error
11074	      to the layout recall callback.

11076	   Thus, in light of the above, it is useful for a server to be able to
11077	   issue callbacks for layout ranges it has not granted to a client, and
11078	   for a client to return ranges it does not hold.  A pNFS client must
11079	   always return layout segments that comprise the full range specified
11080	   by the recall.  Note, the full recalled layout range need not be
11081	   returned as part of a single operation, but may be returned in
11082	   segments.  This allows the client to stage the flushing of dirty
11083	   data, layout commits, and returns.  Also, it indicates to the
11084	   metadata server that the client is making progress.

11086	   It is possible that write requests may be presented to a storage
11087	   device no longer allowed to perform them.  This behavior is limited
11088	   by requiring that a client MUST wait for completion of all writes
11089	   covered by a layout range before returning a layout that covers that
11090	   range.  Since the server has no control as to when the client will
11091	   return the layout, the server may later decide to unilaterally revoke
11092	   the client's access provided by the layout in question.  Upon doing
11093	   so the server must deal with the possibility of lingering writes,
11094	   outstanding writes still in flight to data servers identified by the
11095	   revoked layout.  Each layout-specification MUST define whether
11096	   unilateral layout revocation by the metadata server is supported, and
11097	   if so, the specification must also outline how lingering writes are
11098	   to be dealt with; e.g., storage devices identified by the revoked
11099	   layout in question could be fenced off from the appropriate client.
11100	   If unilateral revocation is not supported, there MUST be no
11101	   possibility that the client has outstanding write requests when a
11102	   layout is returned.

11104	   In order to ensure client/server convergence on the layout state, the
11105	   final LAYOUTRETURN operation in a sequence of LAYOUTRETURN operations
11106	   for a particular recall, MUST specify the entire range being
11107	   recalled, echoing the recalled layout type, iomode, recall/return
11108	   type (FILE, FSID, or ALL), and octet range; even if layout segments
11109	   pertaining to partial ranges were previously returned.  In addition,
11110	   if the client holds no layout segment that overlaps the range being
11111	   recalled, the client should return the NFS4ERR_NOMATCHING_LAYOUT
11112	   error code.  This allows the server to update its view of the
11113	   client's layout state.

11115	12.5.4.2.  Serialization of Layout Operations

11117	   As with other stateful operations, pNFS requires the correct
11118	   sequencing of layout operations.  PNFS uses the sessions feature of
11119	   NFSv4.1 to provide the correct sequencing between regular operations
11120	   and callbacks.  It is the server's responsibility to avoid
11121	   inconsistencies regarding the layouts it hands out and the client's
11122	   responsibility to properly serialize its layout requests and layout
11123	   returns.

11125	12.5.4.2.1.  Get/Return Serialization

11127	   The protocol allows the client to send concurrent LAYOUTGET and
11128	   LAYOUTRETURN operations to the server.  However, the protocol does
11129	   not provide any means for the server to process the requests in the
11130	   same order in which they were created, nor does it provide a way for
11131	   the client to determine the order in which parallel outstanding
11132	   operations were processed by the server.  Thus, when a layout segment
11133	   retrieved by an outstanding LAYOUTGET operation intersects with a
11134	   layout segment returned by an outstanding LAYOUTRETURN the order in
11135	   which the two conflicting operations are processed determines the
11136	   final state of the overlapping segment.  To disambiguate between the
11137	   two cases the client MUST serialize LAYOUTGET operations and
11138	   voluntary LAYOUTRETURN operations for the same file.

11140	   It is permissible for the client to send in parallel multiple
11141	   LAYOUTGET operations for the same file or multiple LAYOUTRETURN
11142	   operations for the same file; but never a mix of both.  It is also
11143	   permissible for the client to combine LAYOUTRETURN and LAYOUTGET
11144	   operations for the same file in the same COMPOUND request as the
11145	   server MUST process these in order.  If a client does issue such
11146	   requests, it MUST NOT have more than one outstanding for the same
11147	   file at the same time and MUST NOT have other LAYOUTGET or
11148	   LAYOUTRETURN operations outstanding at the same time for that same
11149	   file.

11151	12.5.4.2.2.  Recall/Return Sequencing

11153	   One critical issue with operation sequencing concerns callbacks.  The
11154	   protocol must defend against races between the reply to a LAYOUTGET
11155	   operation and a subsequent CB_LAYOUTRECALL.  A client MUST NOT
11156	   process a CB_LAYOUTRECALL that identifies an outstanding LAYOUTGET
11157	   operation to which the client has not yet received a reply.
11158	   Conflicting LAYOUTGET operations are identified in the CB_SEQUENCE
11159	   preceding the CB_LAYOUTRECALL.

11161	   The callback races section (Section 2.10.4.3) describes the sessions
11162	   mechanism for allowing the client to detect such situations in order
11163	   to not process such a CB_LAYOUTRECALL.  The server MUST reference all
11164	   conflicting LAYOUTGET operations in the CB_SEQUENCE that precedes the
11165	   CB_LAYOUTRECALL.  A zero length array of referenced operations is
11166	   used by the server to tell the client that the server does not know
11167	   of any LAYOUTGET operations that conflict with the recall.

11169	12.5.4.2.2.1.  Client Side Considerations

11171	   Consider a pNFS client that has issued a LAYOUTGET and then receives
11172	   an overlapping recall callback for the same file.  There are two
11173	   possibilities, which the client would be unable to distinguish
11174	   without additional information provided by the sessions
11175	   implementation.

11177	   1.  The server processed the LAYOUTGET before issuing the recall, so
11178	       the LAYOUTGET response is in flight, and must be waited for
11179	       because it may be carrying layout info that will need to be
11180	       returned to deal with the recall callback.

11182	   2.  The server issued the callback before receiving the LAYOUTGET.
11183	       The server will not respond to the LAYOUTGET until the recall
11184	       callback is processed.

11186	   These possibilities could cause deadlock, as the client must wait for
11187	   the LAYOUTGET response before processing the recall in the first
11188	   case, but that response will not arrive until after the recall is
11189	   processed in the second case.  Via the CB_SEQUENCE operation, the
11190	   server provides the client with the { slotid , sequenceid } of any
11191	   earlier LAYOUTGET operations which remain unconfirmed at the server
11192	   by the session slot usage rules.  This allows the client to
11193	   disambiguate between the two cases, in case 1, the server will
11194	   provide the operation reference(s), whereas in case 2 it will not
11195	   (because there are no dependent client operations).  Therefore, the
11196	   action at the client will only require waiting in the case that the
11197	   client has not yet seen the server's earlier responses to the
11198	   LAYOUTGET operation(s).

11200	   The following requirements apply to avoid this deadlock: by adhering
11201	   to the following requirements:

11203	   o  A LAYOUTGET MUST be rejected with the error NFS4ERR_RECALLCONFLICT
11204	      if there's an overlapping outstanding recall callback to the same
11205	      client.

11207	   o  When processing a recall, the client MUST wait for a response to
11208	      all conflicting outstanding LAYOUTGETs that are referenced in the
11209	      CB_SEQUENCE for the recall before performing any RETURN that could
11210	      be affected by any such response.

11212	   o  The client SHOULD wait for responses to all operations required to
11213	      complete a recall before sending any LAYOUTGETs that would
11214	      conflict with the recall because the server is likely to return
11215	      errors for them.

11217	   o  Before sending a new LAYOUTGET for a range covered by a layout
11218	      recall, the client SHOULD wait for responses to any outstanding
11219	      LAYOUTGET that overlaps any portion of the new LAYOUTGET's range .
11220	      This is because it is possible (although unlikely) that the prior
11221	      operation may have arrived at the server after the recall
11222	      completed and hence will succeed.

11224	   o  The recall process can be considered as done by the client when
11225	      the final LAYOUTRETURN operation for the recalled range is issued.

11227	12.5.4.2.2.2.  Server Side Considerations

11229	   Consider a related situation from the metadata server's point of
11230	   view.  The metadata server has issued a recall layout callback and
11231	   receives an overlapping LAYOUTGET for the same file before the
11232	   LAYOUTRETURN(s) that respond to the recall callback.  Again, there
11233	   are two cases:

11235	   1.  The client issued the LAYOUTGET before processing the recall
11236	       callback.

11238	   2.  The client issued the LAYOUTGET after processing the recall
11239	       callback, but it arrived before the LAYOUTRETURN that completed
11240	       that processing.

11242	   The metadata server MUST reject the overlapping LAYOUTGET.  The
11243	   client has two ways to avoid this result - it can issue the LAYOUTGET
11244	   as a subsequent element of a COMPOUND containing the LAYOUTRETURN
11245	   that completes the recall callback, or it can wait for the response
11246	   to that LAYOUTRETURN.

11248	   There is little the session sequence logic can do to disambiguate
11249	   between these two cases, because both operations are independent of
11250	   one another.  They are simply asynchronous events which crossed.  The
11251	   situation can even occur if the session is configured to use a single
11252	   connection for both operations and callbacks.

11254	12.5.5.  Metadata Server Write Propagation

11256	   Asynchronous writes written through the metadata server may be
11257	   propagated lazily to the storage devices.  For data written
11258	   asynchronously through the metadata server, a client performing a
11259	   read at the appropriate storage device is not guaranteed to see the
11260	   newly written data until a COMMIT occurs at the metadata server.
11261	   While the write is pending, reads to the storage device can give out
11262	   either the old data, the new data, or a mixture thereof.  After
11263	   either a synchronous write completes, or a COMMIT is received (for
11264	   asynchronously written data), the metadata server must ensure that
11265	   storage devices give out the new data and that the data has been
11266	   written to stable storage.  If the server implements its storage in
11267	   any way such that it cannot obey these constraints, then it must
11268	   recall the layouts to prevent reads being done that cannot be handled
11269	   correctly.

11271	12.6.  PNFS Mechanics

11273	   This section describes the operations flow taken by a pNFS client to
11274	   a metadata server and storage device.

11276	   When a pNFS client encounters a new FSID, it issues a GETATTR to the
11277	   NFSv4.1 server for the fs_layout_type (Section 5.13.1) attribute.  If
11278	   the attribute returns at least one layout type, and the layout
11279	   type(s) returned is(are) among the set supported by the client, the
11280	   client knows that pNFS is a possibility for the filesystem.  If, from
11281	   the server that returned the new FSID, the client does not have a
11282	   client ID that came from an EXCHANGE_ID result that returned
11283	   EXCHGID4_FLAG_USE_PNFS_MDS, it must send an EXCHANGE_ID to the server
11284	   with the EXCHGID4_FLAG_USE_PNFS_MDS bit set.  If the server's
11285	   response does not have EXCHGID4_FLAG_USE_PNFS_MDS, then contrary to
11286	   what the fs_layout_type attribute said, the server does not support
11287	   pNFS, and the client will not be able use pNFS to that server.

11289	   Once the client has a client ID that supports pNFS, it creates a
11290	   persistent session over the client ID, requesting persistent.

11292	   If the client wants to create a file on the file system identified by
11293	   the FSID that supports pNFS, it issues an OPEN with a create type of
11294	   GUARDED4 (if it wants an exclusive create), or UNCHECKED4 (if it does
11295	   not want an exclusive create).  Among the various attributes it sets
11296	   in createattrs, it includes layout_hint and fills it with information
11297	   pertinent to the layout type it wants to use.  The COMPOUND procedure
11298	   that the OPEN is sent with should include a GETATTR operation (on the
11299	   filehandle OPEN sets) that retrieves the layout_type attribute.  This
11300	   is so the client can determine what layout type the server will in
11301	   fact support, and thus what storage protocol the client must use.

11303	   If the client wants to open an existing file, then it also includes a
11304	   GETATTR to determine what layout type the file supports.

11306	   The GETATTR in either the file creation or plain file open case can
11307	   also include the layout_blksize and layout_alignment attributes so
11308	   that the client can determine optimal offsets and lengths for I/O on
11309	   the file.

11311	   Assuming the client supports the layout type returned by GETATTR, it
11312	   then issues LAYOUTGET using the filehandle returned by OPEN,
11313	   specifying the range it wants to do I/O on.  The response is a layout
11314	   segment, which may be a subset of the range the client asked for.  It
11315	   also includes device IDs and a description of how data is organized
11316	   (or in the case of writing, how data is to be organized) across the
11317	   devices.  The device IDs and data description are encoded in a format
11318	   that is specific to the layout type, but the client is expected to
11319	   understand.

11321	   When the client wants to issue an I/O, it determines which device ID
11322	   it needs to send the I/O command to by examining the data description
11323	   in the layout.  It then issues a GETDEVICEINFO to find the device
11324	   address of the device ID.  The client then sends the I/O command to
11325	   device address, using the storage protocol defined for the layout
11326	   type.

11328	   If the I/O was an input request, then at some point the client may
11329	   want to commit the access time to the metadata server.  It uses the
11330	   LAYOUTCOMMIT operation.  If the I/O was an output request, then at
11331	   some point the client may want to commit the modification time and
11332	   the new size of the file if it believes it lengthed the file, to the
11333	   metadata server and the modified data to the filesystem.  Again, it
11334	   uses LAYOUTCOMMIT.

11336	12.7.  Recovery

11338	   Recovery is complicated due to the distributed nature of the pNFS
11339	   protocol.  In general, crash recovery for layouts is similar to crash
11340	   recovery for delegations in the base NFSv4 protocol.  However, the
11341	   client's ability to perform I/O without contacting the metadata
11342	   server and the fact that unlike delegations, layouts are not bound to
11343	   stateids introduces subtleties that must be handled correctly if file
11344	   system corruption is to be avoided.

11346	12.7.1.  Client Recovery

11348	   Client recovery for layouts is similar to client recovery for other
11349	   lock/delegation state.  When an pNFS client reboots, it will lose all
11350	   information about the layouts that it previously owned.  There are
11351	   two methods by which the server can reclaim these resources and allow
11352	   otherwise conflicting layouts to be provided to other clients.

11354	   The first is through the expiry of the client's lease.  If the client
11355	   recovery time is longer than the lease period, the client's lease
11356	   will expire and the server will know that state may be released.  For
11357	   layouts the server may release the state immediately upon lease
11358	   expiry or it may allow the layout to persist awaiting possible lease
11359	   revival, as long as there are no conflicting requests.

11361	   On the other hand, the client may restart in less time than it takes
11362	   for the lease period to expire.  In such a case, the client will
11363	   contact the server through the standard EXCHANGE_ID protocol.  The
11364	   server will find that the client's co_ownerid matches the co_ownerid
11365	   of the previous client invocation, but that the verifier is
11366	   different.  The server uses this as a signal to release all layout
11367	   state associated with the client's previous invocation.  It is
11368	   possible that all data written by the client to storage devices but
11369	   not completed via LAYOUTCOMMIT is lost.

11371	12.7.2.  Dealing with Lease Expiration on the Client

11373	   The mappings between device IDs and device addresses are what allow a
11374	   pNFS client to safely write data to and read data from a storage
11375	   device.  These mappings are leased (just like with locking state)
11376	   from the metadata server, and as long as the lease is valid, the
11377	   client has a right to issue I/O to the storage devices.  The lease on
11378	   device ID to device address mappings is renewed when the metadata
11379	   server receives a SEQUENCE operation from the pNFS client.  The same
11380	   is not specified to be true for the data server receiving a SEQUENCE
11381	   operation, and the client MUST NOT assume that a SEQUENCE sent to a
11382	   data server will renew its lease.

11384	   The loss of the lease leads to the loss of the device ID to device
11385	   address mappings.  If a mapping is used for I/O after lease
11386	   expiration, the consequences could be data corruption.  To avoid
11387	   losing its lease, the client should start its lease timer based on
11388	   the time that it issued the operation to the metadata server rather
11389	   than based on the time the response was received.  It is also
11390	   necessary to take propagation delay into account as described in
11391	   Section 8.12.  Thus, the client must be aware of the one-way
11392	   propagation delay and should issue renewals well in advance of lease
11393	   expiration.

11395	   If a client believes its lease has expired, it MUST NOT issue I/O to
11396	   the storage device until it has validated its lease.  The client can
11397	   issue a SEQUENCE operation to the metadata server.  If the SEQUENCE
11398	   operation is successful, but sr_status_flag has
11399	   SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
11400	   SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, or
11401	   SEQ4_STATUS_ADMIN_STATE_REVOKED set, the client must recover by
11402	   deleting all its records of layouts and device ID to device address
11403	   mappings, then writing any modified but uncommitted data in its
11404	   memory directly to the metadata server with the stable argument to
11405	   WRITE set to FILE_SYNC4, and finally reacquiring any layouts it needs
11406	   via LAYOUTGET.

11408	   If sr_status_flags from the metadata server has
11409	   SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns
11410	   NFS4ERR_STATE_CLIENTID, or SEQUENCE returns NFS4ERR_BAD_SESSION and
11411	   CREATE_SESSION returns NFS4ERR_STATE_CLIENTID) then the metadata
11412	   server has restarted, and the client must recovery using the methods
11413	   described in Section 12.7.4.

11415	   If sr_status_flags from the metadata server has
11416	   SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following
11417	   the procedure described in Section 10.6.7.1.  After that, the client
11418	   may get an indication that the layout state was not moved with the
11419	   filesystem.  The client is then required the client to recover per
11420	   other applicable situations discussed in Paragraph 3 or Paragraph 4
11421	   of this section.

11423	   If sr_status_flags reports no loss of state, then the lease the
11424	   client has with the metadata server is valid and renewed, and the
11425	   client can re-commence I/O to the storage devices.

11427	   While clients should not issue I/Os to storage devices that may
11428	   extend past the lease expiration time period, this is not always
11429	   possible (e.g. an extended network partition that starts after the
11430	   I/O is send and does nor heal till the I/O request is received by the
11431	   data server).  Thus the metadata server and/or storage device are
11432	   responsible for protecting the pNFS server from I/Os that are sent
11433	   before the lease expires, but arrive after the lease expires.  See
11434	   Section 12.7.3.

11436	12.7.3.  Dealing with Loss of Layout State on the Metadata Server

11438	   This section describes recovery from the situation where all of the
11439	   following are true: the metadata server has not restarted; a pNFS
11440	   client's device ID to device address mappings and/or layouts have
11441	   been discarded (usually because the client's lease expired) and are
11442	   invalid; and an I/O from the pNFS client arrives at the storage
11443	   device.  The metadata server and its storage devices may solve this
11444	   by fencing the client (i.e. prevent the execution of I/O operations
11445	   from the client to the storage devices after layout state loss).  The
11446	   details of how fencing is done are specific to the layout type.  The
11447	   solution for NFSv4.1 file-based layouts is described in this document
11448	   (Section 13.13), and for other layout types in their respective
11449	   external specification documents.

11451	12.7.4.  Recovery from Metadata Server Restart

11453	   The pNFS client will discover that the metadata server has restarted
11454	   (e.g. rebooted) via the methods described in Section 8.6.2 and
11455	   discussed in a pNFS-specific context in Paragraph 4, of
11456	   Section 12.7.2.  The client MUST stop using and delete device ID to
11457	   device address mappings it previously received from the metadata
11458	   server.  Having done that, if the client wrote data to the storage
11459	   device without committing the layout segment(s) via LAYOUTCOMMIT,
11460	   then client has additional work to do in order to get the client,
11461	   metadata server and storage device(s) all synchronized on the state
11462	   of the data.

11464	   o  If the client has data still modified and unwritten in the
11465	      client's memory, the client has only two choices.

11467	      1.  The client can obtain a layout segment via LAYOUTGET after the
11468	          server's grace period and write the data to the storage
11469	          devices.

11471	      2.  The client can write that data through the metadata server
11472	          using the WRITE (Section 17.32) operation, and then obtain
11473	          layout segments as needed.

11475	      As noted in Paragraph 2 of Section 8.6.2.1, and in
11476	      Section 17.43.4, LAYOUTGET and WRITE may not be allowed until the
11477	      grace period expires.  Under some conditions, as described in
11478	      Section 12.7.5, LAYOUTGET and/or WRITE maybe permitted during the
11479	      metadata server's grace period.

11481	   o  If the client synchronously wrote data to the storage device, but
11482	      still a copy of that data in its memory, then it has available to
11483	      it the recovery options listed above in the previous bullet point.
11484	      If the metadata server is also in its grace period, the client has
11485	      available to it the options below in the next bullet item.

11487	   o  The client does not have a copy of the data in its memory and the
11488	      metadata server is still in its grace period.  The client cannot
11489	      use LAYOUTGET (within or outside the grace period) to reclaim a
11490	      layout segment because the contents of the response from LAYOUTGET
11491	      may not match what it had previously.  The range might be
11492	      different or it might get the same range but the content of the
11493	      layout might be different.  Even if the content of the layout
11494	      appears to be the same, the device IDs may map to difference
11495	      device addresses, and even if the device addresses are the same,
11496	      the device addresses could have been assigned to a different
11497	      storage device.  The option of retrieving the data from the
11498	      storage device and writing it to the metadata server per the
11499	      recovery scenario described above in the previous two bullets is
11500	      not available because, again, the mappings of range to device ID,
11501	      device ID to device address, device address to physical device are
11502	      stale and new mappings via new LAYOUTGET do not solve the problem.

11504	      The only recovery option for this scenario is to issue a
11505	      LAYOUTCOMMIT in reclaim mode, which the metadata server will
11506	      accept as long as it is in its grace period.  The use of
11507	      LAYOUTCOMMIT in reclaim mode informs the metadata server that the
11508	      layout segment has changed.  It is critical the metadata server
11509	      receive this information before its grace period ends, and thus
11510	      before it starts allowing updates to the filesystem.

11512	      To issue LAYOUTCOMMIT in reclaim mode, the client sets the
11513	      loca_reclaim field of the operation's arguments (Section 17.42.2)
11514	      to TRUE.  During the metadata server's recovery grace period (and
11515	      only during the recovery grace period) the metadata server is
11516	      prepared to accept LAYOUTCOMMIT requests with the loca_reclaim
11517	      field set to TRUE.

11519	      When loca_reclaim is TRUE, the client is attempting to commit
11520	      changes to the layout segment that occurred prior to the restart
11521	      of the metadata server.  The metadata server applies some
11522	      consistency checks on the loca_layoutupdate field of the arguments
11523	      to determine whether the client can commit the data written to the
11524	      data server to the filesystem.  The loca_layoutupdate field is of
11525	      data type layoutupdate4, and contains layout type-specific content
11526	      (in the lou_body field of loca_layoutupdate).  The layout type-
11527	      specific information that loca_layoutupdate might have is
11528	      discussed in Section 12.5.3.3.  If the metadata server's
11529	      consistency checks on loca_layoutupdate succeed, then the metadata
11530	      server MUST commit the data (as described by the loca_offset,
11531	      loca_length, and loca_layoutupdate fields of the arguments) that
11532	      was written to storage device.  If the metadata server's
11533	      consistency checks on loca_layoutupdate fail, the metadata server
11534	      rejects the LAYOUTCOMMIT operation, and makes no changes to the
11535	      file system.  However, any time LAYOUTCOMMIT with loca_reclaim
11536	      TRUE fails, the pNFS client has lost all the data in the range
11537	      defined by <loca_offset, loca_length>.  A client can defend
11538	      against this risk by caching all data, whether written
11539	      synchronously or asynchronously in its memory and not release the
11540	      cached data until a successful LAYOUTCOMMIT.

11542	   o  The client does not have a copy of the data in its memory and the
11543	      metadata server is no longer in its grace period; i.e. the
11544	      metadata server returns NFS4ERR_NO_GRACE.  As with the scenario in
11545	      the above bullet item, the failure of LAYOUTCOMMIT means the data
11546	      in the range <loca_offset, loca_length> lost.  The defense against
11547	      the risk is the same; cache all written data on the client until a
11548	      successful LAYOUTCOMMIT.

11550	12.7.5.  Operations During Metadata Server Grace Period

11552	   Some of the recovery scenarios thus far noted that some operations,
11553	   namely WRITE and LAYOUTGET might be permitted during the metadata
11554	   server's grace period.  The metadata server may allow these
11555	   operations during its grace period, if it can reliably determine that
11556	   servicing such a request will not conflict with an impending
11557	   LAYOUTCOMMIT (or, in the case of WRITE, conflicting with an impending
11558	   OPEN, or a LOCK on a file with mandatory record locking enabled)
11559	   reclaim request.  As mentioned previously, some operations, namely
11560	   WRITE and LAYOUTGET are likely to be rejected during the metadata
11561	   server's grace period, because to provide simple, valid handling
11562	   during the grace period, the easiest method is to simply reject all
11563	   non-reclaim pNFS requests and WRITE operations by returning the
11564	   NFS4ERR_GRACE error.  However, depending on the storage protocol
11565	   (which is specific to the layout type) and metadata server
11566	   implementation, the metadata server may be able to determine that a
11567	   particular request is safe.  For example, a metadata server may save
11568	   provisional allocation mappings for each file to stable storage, as
11569	   well as information about potentially conflicting OPEN share modes
11570	   and mandatory record locks that might have been in effect at the time
11571	   of restart, and use this information during the recovery grace period
11572	   to determine that a WRITE request is safe.

11574	12.7.6.  Storage Device Recovery

11576	   Recovery from storage device restart is mostly dependent upon the
11577	   layout type in use.  However, there are a few general techniques a
11578	   client can use if it discovers a storage device has crashed while
11579	   holding modified, uncommitted data that was asynchronously written.
11580	   First and foremost, it is important to realize that the client is the
11581	   only one who has the information necessary to recover non-committed
11582	   data; since, it holds the modified data and most probably nobody else
11583	   does.  Second, the best solution is for the client to err on the side
11584	   of caution and attempt to re-write the modified data through another
11585	   path.

11587	   The client should immediately write the data to the metadata server,
11588	   with the stable field in the WRITE4args set to FILE_SYNC4.  Once it
11589	   does this, there is no need to wait for the original storage device.

11591	12.8.  Metadata and Storage Device Roles

11593	   If the same physical hardware is used to implement both a metadata
11594	   server and storage device, then the same hardware entity is to be
11595	   understood to be implementing two distinct roles and it is important
11596	   that it be clearly understood on behalf of which role the hardware is
11597	   executing at any given time.

11599	   Various sub-cases can be distinguished.

11601	   1.  The storage device uses NFSv4.1 as the storage protocol.  The
11602	       same physical hardware is used to implement both a metadata and
11603	       data server.  If an EXCHANGE_ID operation issued to the metadata
11604	       server has EXCHGID4_FLAG_USE_PNFS_MDS set and not
11605	       EXCHGID4_FLAG_USE_PNFS_DS not set, the role of all sessions
11606	       derived from the client ID is metadata server-only.  If an
11607	       EXCHANGE_ID operation issued to the data server has
11608	       EXCHGID4_FLAG_USE_PNFS_DS set and EXCHGID4_FLAG_USE_PNFS_MDS not
11609	       set, the role of all sessions derived from the client ID is data
11610	       server only.  These assertions are true regardless whether the
11611	       network addresses of the metadata server and data server are the
11612	       same or not.

11614	       The client will use the same client owner for both the metadata
11615	       server EXCHANGE_ID and the data server EXCHANGE_ID.  Since the
11616	       client issues one with EXCHGID4_FLAG_USE_PNFS_MDS set, and the
11617	       other with EXCHGID4_FLAG_USE_PNFS_DS set, the server will need to
11618	       return unique client IDs, as well as server_owners, which will
11619	       eliminate ambiguity about dual roles the same physical entity
11620	       serves.

11622	   2.  The metadata and data server each return EXCHANGE_ID results with
11623	       EXCHGID4_FLAG_USE_PNFS_DS and EXCHGID4_FLAG_USE_PNFS_MDS both
11624	       set, the server_owner and server_scope results are the same, and
11625	       the client IDs are the same, and if RPCSEC_GSS is used, the
11626	       server principals are the same.  As noted in Section 2.10.3.4.1
11627	       the two servers are the same, whether they have the same network
11628	       address or not.  If the pNFS server is ambiguous in its
11629	       EXCHANGE_ID results as to what role a client ID may be used for,
11630	       yet still requires the NFSv4.1 request be directed in a manner
11631	       specific to a role (e.g. a READ request for a particular offset
11632	       directed to the metadata server role might use a different offset
11633	       if the READ was intended for the data server role, if the file is
11634	       using STRIPE4_DENSE packing, see Section 13.5), the pNFS server
11635	       may mark the the metadata filehandle differently from the data
11636	       filehandle so that operations addressed to the metadata server
11637	       can be distinguished from those directed to the data servers.
11638	       Marking the metadata and data server filehandles differently (and
11639	       this is RECOMMENDED) is possible because the former are derived
11640	       from OPEN operations, and the latter are derived from LAYOUTGET
11641	       operations.

11643	       Note, that it may be the case that while the metadata server and
11644	       the storage device are distinct from one client's point of view,
11645	       the roles may be reversed according to another client's point of
11646	       view.  For example, in the cluster file system model a metadata
11647	       server to one client, may be a data server to another client.  If
11648	       NFSv4.1 is being used as the storage protocol, then pNFS servers
11649	       need to mark filehandles according to their specific roles.

11651	       If a current filehandle is set that is inconsistent with the role
11652	       to which it is directed, then the error NFS4ERR_BADHANDLE should
11653	       result.  For example, if a request is directed at the data
11654	       server, because the first current handle is from a layout, any
11655	       attempt to set the current filehandle to be a value not from a
11656	       layout should be rejected.  Similarly, if the first current file
11657	       handle was for a value not from a layout, a subsequent attempt to
11658	       set the current filehandle to a value obtained from a layout
11659	       should be rejected.

11661	   3.  The storage device does not use NFSv4.1 as the storage protocol,
11662	       and the same physical hardware is used to implement both a
11663	       metadata and storage device.  Whether distinct network addresses
11664	       are used to access metadata server and storage device is
11665	       immaterial, because, it is always clear to the pNFS client and
11666	       server, from upper layer protocol being used (NFSv4.1 or non-
11667	       NFSv4.1) what role the request to the common server network
11668	       address is directed to.

11670	12.9.  Security Considerations

11672	   PNFS has a metadata path and a data path (i.e., storage protocol).
11673	   The metadata path includes the pNFS-specific operations (listed in
11674	   Section 12.3); all existing NFSv4.1 conventional (non-pNFS) security
11675	   mechanisms and features apply to the metadata path.  The combination
11676	   of components in a pNFS system (see Figure 62) is required to
11677	   preserve the security properties of NFSv4.1 with respect to an entity
11678	   accessing storage device from a client, including security
11679	   countermeasures to defend against threats that NFSv4.1 provides
11680	   defenses for in environments where these threats are considered
11681	   significant.

11683	   In some cases, the security countermeasures for connections to
11684	   storage devices may take the form of physical isolation or a
11685	   recommendation not to use pNFS in an environment.  For example, it
11686	   may be impractical to provide confidentiality protection for some
11687	   storage protocols to protect against eavesdropping; in environments
11688	   where eavesdropping on such protocols is of sufficient concern to
11689	   require countermeasures, physical isolation of the communication
11690	   channel (e.g., via direct connection from client(s) to storage
11691	   device(s)) and/or a decision to forego use of pNFS (e.g., and fall
11692	   back to conventional NFSv4.1) may be appropriate courses of action.

11694	   Where communication with storage devices is subject to the same
11695	   threats as client to metadata server communication, the protocols
11696	   used for that communication need to provide security mechanisms
11697	   comparable to those available via RPSEC_GSS for NFSv4.1.  Many
11698	   situations in which pNFS is likely to be used will not be subject to
11699	   the overall threat profile for which NFSv4.1 is required to provide
11700	   countermeasures.

11702	   PNFS implementations MUST NOT remove NFSv4's access controls.  The
11703	   combination of clients, storage devices, and the metadata server are
11704	   responsible for ensuring that all client to storage device file data
11705	   access respects NFSv4.1's ACLs and file open modes.  This entails
11706	   performing both of these checks on every access in the client, the
11707	   storage device, or both (as applicable; when the storage device is an
11708	   NFSv4.1 server, the storage device is ultimately responsible for
11709	   controlling access).  If a pNFS configuration performs these checks
11710	   only in the client, the risk of a misbehaving client obtaining
11711	   unauthorized access is an important consideration in determining when
11712	   it is appropriate to use such a pNFS configuration.  Such
11713	   configurations SHOULD NOT be used when client-only access checks do
11714	   not provide sufficient assurance that NFSv4.1 access control is being
11715	   applied correctly.

11717	13.  PNFS: NFSv4.1 File Layout Type

11719	   This section describes the semantics and format of NFSv4.1 file-based
11720	   layouts for pNFS.  NFSv4.1 file-based layouts uses the
11721	   LAYOUT4_NFSV4_1_FILES layout type.  The LAYOUT4_NFSV4_1_FILES type
11722	   defines striping data across multiple NFSv4.1 data servers.

11724	13.1.  Session Considerations

11726	   Sessions are a mandatory feature of NFSv4.1, and this extends to both
11727	   the metadata server and file-based (NFSv4.1-based) data servers.  If
11728	   data is served by both the metadata server and an NFSv4.1-based data
11729	   server, the metadata and data server MUST have separate client IDs
11730	   (unless the EXCHANGE_ID results indicate the server will allow the
11731	   client ID to support both metadata and data pNFS operations).

11733	   When a creating a client ID to access a pNFS metadata server, the
11734	   pNFS metadata client sends an EXCHANGE_ID operation that has
11735	   EXCHGID4_FLAG_USE_PNFS_MDS set (EXCHGID4_FLAG_USE_NON_PNFS and
11736	   EXCHGID4_FLAG_USE_PNFS_DS MAY be set as well).  If the server's
11737	   EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_MDS set, then the
11738	   client may use the client ID to create sessions that will exchange
11739	   pNFS metadata operations.

11741	   If pNFS metadata client gets a layout that refers it to an NFSv4.1
11742	   data server, it needs a client ID on that data server.  If it does
11743	   not yet have a client ID from the server that had the
11744	   EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then
11745	   the client must send an EXCHANGE_ID to the data server, using the
11746	   same co_ownerid as it sent to the metadata server, with the
11747	   EXCHGID4_FLAG_USE_PNFS_DS flag set in arguments.  If the server's
11748	   EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the
11749	   client may use the client ID to create sessions that will exchange
11750	   pNFS data operations.

11752	   The client ID returned by a metadata server has no required
11753	   association to the client ID returned by a data server that the
11754	   metadata server's layouts referred the client to, although a server
11755	   implementation is free construct such an association (e.g. via a
11756	   private data server/metadata server protocol and client ID table).
11757	   Similarly the EXCHANGE_ID/CREATE_SESSION sequenceid state used by the
11758	   pNFS metadata client and server has no association with the
11759	   EXCHANGE_ID/CREATE_SESSION sequenceid state used by the data client/
11760	   server (and the pNFS server and the pNFS client MUST NOT make this
11761	   association).  By decoupling the client IDs of metadata and data
11762	   servers from each other, implementation of the session on pNFS
11763	   servers is potentially simpler.

11765	   In a non-pNFS server or in a metadata server, the sessionid in the
11766	   SEQUENCE operation implies the client ID, which in turn might be used
11767	   by the server to map the stateid to the right client/server pair.
11768	   However, when a data server is presented with a READ or WRITE
11769	   operation with a stateid, because the stateid is associated with
11770	   client ID on a metadata server, and because the sessionid in the
11771	   preceding SEQUENCE operation is tied to the potentially unrelated
11772	   data server client ID, the data server has no obvious way to
11773	   determine the metadata server from the COMPOUND procedure, and thus
11774	   has no way to validate the stateid.  One recommended approach is for
11775	   pNFS servers to encode metadata server routing and/or identity
11776	   information in the data server filehandles as returned in the layout.
11777	   If the metadata server identity or location changes, requiring the
11778	   data server filehandles to become invalid (stale), the metadata
11779	   server must first recall the layouts.

11781	   Invalidating data server filehandles does not render the pNFS data
11782	   cache invalid.  If the metadata server file handle of a file is
11783	   persistent, the client can map the metadata server filehandle to
11784	   cached data, and when granted data server filehandles, map the data
11785	   server filehandles to their metadata server filehandle.

11787	13.2.  File Layout Definitions

11789	   The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout
11790	   type, and may be applicable to other layout types.

11792	   Unit.  A unit is a set of data written to a data server.

11794	   Pattern.  A pattern is a method of distributing fix sized units
11795	      across a set of data servers.  A pattern is iterated one or more
11796	      times.  A pattern has one or more units.  Each unit in each
11797	      iteration of a pattern MUST be the same size.

11799	   Stripe.  An stripe is a set of data distributed across a set of data
11800	      servers in a pattern before that pattern repeats.

11802	   Stripe Width.  A stripe width is the size of stripe in octets.

11804	   Hereafter, this document will refer to a unit that is a written in a
11805	   pattern as a "stripe unit".

11807	   A pattern may have more stripe units than data servers.  If so, some
11808	   data servers will have more than one stripe unit per stripe.  A data
11809	   server that has multiple stripe units per stripe MAY store each unit
11810	   in a different data file.

11812	13.3.  File Layout Data Types

11814	   The high level NFSv4.1 layout types are nfsv4_1_file_layout_ds_addr4,
11815	   nfsv4_1_file_layouthint4, and nfsv4_1_file_layout4.

11817	   When LAYOUTGET returns a LAYOUT4_NFSV4_1_FILES layout (indicated in
11818	   the loc_type field of the lo_content field), the loc_body field of
11819	   the lo_content field contains a value of data type
11820	   nfsv4_1_file_layout4.  Among other content, nfsv4_1_file_layout4 has
11821	   storage device IDs (within the nfl_ds_fh_list array) of data type
11822	   deviceid4.

11824	   The GETDEVICEINFO operation maps a device ID to a storage device
11825	   address (type device_addr4).  When GETDEVICEINFO returns a device
11826	   address with a layout type of LAYOUT4_NFSV4_1_FILES (the
11827	   da_layout_type field), the da_addr_body field contains a value of
11828	   data type nfsv4_1_file_layout_ds_addr4.

11830	   The SETATTR operation supports a layout hint attribute
11831	   (Section 5.13.4).  When the client sets a layout hint (data type
11832	   layouthint4) with a layout type of LAYOUT4_NFSV4_1_FILES (the
11833	   loh_type field), the loh_body field contains a value of data type
11834	   nfsv4_1_file_layouthint4.

11836	   The top level and lower level NFSv4.1 layout data types have the
11837	   following XDR descriptions.

11839	   enum file_layout_ds_type4 {
11840	           FILEDS4_SIMPLE  = 1,
11841	           FILEDS4_COMPLEX = 2
11842	   };

11844	   %/* Encoded in the da_addr_body field of type device_addr4: */
11845	   union nfsv4_1_file_layout_ds_addr4
11846	       switch (file_layout_ds_type4 nflda_type) {
11847	           case FILEDS4_SIMPLE:
11848	                   netaddr4        nflda_simp_ds_list<>;
11849	           case FILEDS4_COMPLEX:
11850	                   deviceid4       nflda_comp_ds_list<>;
11851	           default:
11852	                   void;
11853	       };

11855	   enum stripetype4 {
11856	           STRIPE4_SPARSE = 1,
11857	           STRIPE4_DENSE = 2
11858	   };

11860	   %/* Encoded in the loh_body field of type layouthint4: */
11861	   struct nfsv4_1_file_layouthint4 {
11862	           stripetype4     nflh_stripe_type;
11863	           length4         nflh_stripe_unit_size;
11864	           uint32_t        nflh_stripe_width;
11865	   };

11867	   struct nfsv4_1_file_layout_ds_fh4 {
11868	           deviceid4       nfldf_ds_id;
11869	           uint32_t        nfldf_ds_index;
11870	           nfs_fh4         nfldf_fh;
11871	   };

11873	   %/* Encoded in the loc_body field of type layout_content4: */
11874	   struct nfsv4_1_file_layout4 {
11875	           stripetype4                     nfl_stripe_type;
11876	           bool                            nfl_commit_through_mds;
11877	           length4                         nfl_stripe_unit_size;
11878	           length4                         nfl_file_size;
11879	           uint32_t                        nfl_stripe_indices<>;
11880	           nfsv4_1_file_layout_ds_fh4      nfl_ds_fh_list<>;
11881	   };

11883	   %/*
11884	   % * Encoded in the lou_body field of type layoutupdate4:
11885	   % *      Nothing. lou_body is a zero length array of octets.
11886	   % */
11887	   The nfsv4_1_file_layout_ds_addr4 data server address is composed of a
11888	   FILEDS4_SIMPLE or a FILEDS4_COMPLEX data server address.  A
11889	   FILEDS4_SIMPLE data server address is composed of an array of network
11890	   addresses (data type netaddr4).  All data servers in a FILEDS4_SIMPLE
11891	   list (field nflda_simp_ds_list) must be equivalent and are used for
11892	   data server multipathing; see Section 13.6 for more details on
11893	   equivalent data servers.  FILEDS4_SIMPLE data servers always refer to
11894	   actual data servers.  On the other hand, a FILEDS4_COMPLEX data
11895	   server address is constructed of list of device IDs (field
11896	   nflda_comp_ds_list).  Each device ID in nflda_comp_ds_list
11897	   corresponds to the device ID of a data server address of type
11898	   FILEDS4_SIMPLE.  A FILEDS4_COMPLEX data server list MUST NOT contain
11899	   device IDs of other FILEDS4_COMPLEX data servers; only device IDs of
11900	   FILEDS4_SIMPLE data servers are to be referenced.  This enables
11901	   multiple equivalent data servers to be identified through a single
11902	   device ID and provides a space efficient mechanism by which to
11903	   identify multiple data servers within a layout.  FILEDS4_COMPLEX and
11904	   FILEDS4_SIMPLE data servers share the same device ID space and should
11905	   be cached similarly by the client.

11907	   The nfsv4_1_file_layout4 data type specifies an ordered array of
11908	   <device ID, filehandle> tuples, as well as the stripe unit size, type
11909	   of stripe layout (discussed later in this section and in
11910	   Section 13.4), and the file's current size as of LAYOUTGET
11911	   (Section 17.43) time.

11913	   The nfl_ds_fh_list array within the nfsv4_1_file_layout4 data type
11914	   contains a list of nfsv4_1_file_layout_devfh4 structures.  Each of
11915	   these structures describes one or more FILEDS4_SIMPLE or
11916	   FILEDS4_COMPLEX data servers that contribute to a stripe of the file.
11917	   The nfl_stripe_indices array contains a list of indices into the
11918	   nfl_ds_fh_list array; an index of zero specifies the first entry in
11919	   nfl_ds_fh_list.  Each successive index selects a nfl_ds_fh_list entry
11920	   which are to be used next in sequence for that stripe.  This allows
11921	   an arbitrary sequencing through the possible data servers to be
11922	   encoded compactly.  The value of every element in nfl_stripe_indices
11923	   must be less than the number of elements in the nfl_ds_fh_list array.

11925	   When the nfl_stripe_indices array is of zero length, the elements of
11926	   the nfl_ds_fh_list array are simply used in order, so that the
11927	   portion of the stripe held by the corresponding entry is determined
11928	   by its position within the data server list.

11930	   If the nfl_stripe_indices array is of non-zero length, there is no
11931	   requirement that the nfl_stripe_indices and nfl_ds_fh_list arrays
11932	   have the same number of entries.  If the nfl_stripe_indices array has
11933	   fewer entries than the nfl_ds_fh_list array, this simply means not
11934	   all entries of nfl_ds_fh_list are in the striping pattern.

11936	   Even if nfl_stripe_indices has the same number of entries as the
11937	   nfl_ds_fh_list array, this does not necessarily mean all entries of
11938	   nfl_ds_fh_list are used, because nothing prevents an index value from
11939	   appearing in multiple entries of nfl_stripe_indices.

11941	   If the nfl_stripe_indices array has more entries than the
11942	   nfl_ds_fh_list array, then this simply means index values in
11943	   nfl_stripe_indices are appearing more than once.

11945	   Each nfl_ds_fh_list entry contains a device ID, data server index,
11946	   and a filehandle.  The device ID (field nfldf_ds_id), identifies the
11947	   data server.  The GETDEVICEINFO operation is used to map nfldf_ds_id
11948	   to a data server address, which will be either a FILEDS4_COMPLEX or
11949	   FILEDS4_SIMPLE data server address.  When the device ID maps to a
11950	   FILEDS4_COMPLEX data server address server, the data server index
11951	   (field nfldf_ds_index) indicates the starting element of the to use
11952	   from the list of device IDs (nflda_comp_ds_list) of the
11953	   FILEDS4_COMPLEX address.  (As discussed in Section 13.4 the
11954	   nfldf_ds_index field plays a critical role in the flattening of a
11955	   FILEDS4_COMPLEX device.)  If the nfldf_ds_id field maps to a
11956	   FILEDS4_SIMPLE device, the nfldf_ds_index field has no meaning and
11957	   should be zero.  The filehandle, nfldf_fh, identifies the file on the
11958	   data server identified by the device ID.

11960	   The generic layout hint structure is described in Section 3.2.22.
11961	   The client uses the layout hint in the layout_hint (Section 5.13.4)
11962	   attribute to specify the type of layout to be used for a newly
11963	   created file.  The LAYOUT4_NFSV4_1_FILES layout type-specific content
11964	   for the layout hint is composed of the preferred stripe packing type
11965	   (field nflh_stripe_type, discussed in Section 13.5), the size of the
11966	   stripe unit (field nflh_stripe_unit_size), and the width of the
11967	   stripe (field nflh_stripe_width).

11969	13.4.  Interpreting the File Layout

11971	   The client is expected to construct a flat list of <data server, file
11972	   handle> pairs over which the file is striped.  A flat data server
11973	   list contains no FILEDS4_COMPLEX data servers, and is constructed by
11974	   concatenating each data server encountered while traversing
11975	   nfl_stripe_indices (or nfl_ds_fh_list in the case of a zero sized
11976	   nfl_stripe_indices array), while expanding each FILEDS4_COMPLEX data
11977	   server address.  The client must expand the FILEDS4_COMPLEX data
11978	   server address's device ID list by starting at the device ID entry of
11979	   the nflda_comp_ds_list array indexed by nfldf_ds_index, ending with
11980	   the device ID prior to nfldf_ds_index (or ending with the last entry
11981	   the of the nflda_comp_ds_list array if nfldf_ds_index is zero.  All
11982	   devices IDs in the nflda_comp_ds_list must be consumed; this may
11983	   require wrapping around the end of the array if nfldf_ds_index is
11984	   non-zero.  The stripe width is determined by the stripe unit size
11985	   multiplied by the number of data server entries within the flattened
11986	   stripe.

11988	   Consider the following example:

11990	   Given a set of data servers with the following device IDs:

11992	        1->{simple}; 2->{complex, ds_list=<3, 4>}; 3->{simple};
11993	        4->{simple}; 5->{simple}; 6->{complex, ds_list=<1, 5>};

11995	   Device IDs 1, 3, 4 and 5 identify FILEDS4_SIMPLE data servers.
11996	   Device ID 2 is a FILEDS4_COMPLEX data server constructed of
11997	   FILEDS4_SIMPLE data servers 3 and 4.  Device ID 6 is a
11998	   FILEDS4_COMPLEX data server constructed of FILEDS4_SIMPLE data
11999	   servers 4, 1, and 5.

12001	   Within an instance of nfsv4_1_file_layout4, imagine a nfl_ds_fh_list
12002	   constructed of <device ID, device index, FH> tuples:

12004	        ds_fh_list = [<6, 1, 0x17>, <1, 0, 0x12>, <5, 0, 0x22>,
12005	                      <2, 0, 0x13>, <3, 0, 0x14>, <4, 0, 0x15>]

12007	   And a nfl_stripe_indices array containing the following indices:

12009	        nfl_stripe_indices = [5, 2, 4, 0, 1, 3]

12011	   Using nfl_stripe_indices as indices into the nfl_ds_fh_list, we get
12012	   the following re-ordered list of nfsv4_1_file_layout_devfh4 values:

12014	        [<4, 0, 0x15>, <5, 0, 0x22>, <2, 0, 0x13>,
12015	         <6, 3, 0x17>, <1, 0, 0x12>, <5, 0, 0x22>]

12017	   Converting the FILEDS4_COMPLEX devices to FILEDS4_SIMPLE devices
12018	   gives us the following list of 9 FILEDS4_SIMPLE <device ID, FH>
12019	   tuples.

12021	        [<4, 0x15>, <5, 0x22>, <3, 0x13>, <4, 0x13>,
12022	         <1, 0x17>, <5, 0x17>, <4, 0x14>, <1, 0x12>,
12023	         <5, 0x22>]

12025	   The above list of tuples fully describes the striping pattern.  We
12026	   observe several things.  First, the tuples are not 3-tuples; they do
12027	   not have an index value because FILEDS4_SIMPLE devices do not use the
12028	   index.  Second, each tuple in the sequence represents a destination
12029	   for each stripe unit in the pattern.  Third, device 2 is a
12030	   FILEDS4_COMPLEX device that gets replaced with devices 3 and 4.
12031	   Fourth, device 6 is a FILEDS4_COMPLEX device that gets replaced with
12032	   devices 1, 5, 4 (and not in the order 4, 1, 5, because the
12033	   nfl_ds_fh_list entry for device 6 has a non-zero index value 1, so we
12034	   start with second simple device that device 6 maps to and wrap around
12035	   to the first simple device after processing the third simple device
12036	   that device 6 maps to).  Fifth, when converting from FILEDS4_COMPLEX
12037	   to FILEDS4_SIMPLE, the filehandle in the FILEDS4_SIMPLE entries that
12038	   replace a FILEDS4_COMPLEX entry is from the replaced FILEDS4_COMPLEX
12039	   entry.  As a result the striping pattern can have the same device ID
12040	   appear multiple times, and with different filehandles.

12042	   The flattened data server list specifies the pattern over which the
12043	   devices must be striped and over which data is written (in increments
12044	   of the stripe unit size).  It also specifies the filehandle to be
12045	   used for each stripe unit of the pattern.  A data server that has
12046	   more than one stripe unit of a pattern to store each unit may store
12047	   those stripes in different files, but to do so, will need unique
12048	   filehandles in the data server list, as the previous example showed.
12049	   While data servers may be repeated multiple times within the
12050	   flattened data server list, if a STRIPE4_DENSE stripe type is used
12051	   (see Section 13.5), the same filehandle MUST NOT be used on the same
12052	   data server for different stripe units of the same file.

12054	   A data file stored on a data server MUST map to a single file as
12055	   defined by the metadata server; i.e., data from two files as viewed
12056	   by the metadata server MUST NOT be stored within the same data file
12057	   on any data server.

12059	13.5.  Sparse and Dense Stripe Unit Packing

12061	   The nfl_stripe_type field specifies how the data is packed within the
12062	   data file on a data server.  It allows for two different data
12063	   packings: STRIPE4_SPARSE and STRIPE4_DENSE.  The stripe type
12064	   determines the calculation that must be made to map the client
12065	   visible file offset to the offset within the data file located on the
12066	   data server.

12068	   STRIPE4_SPARSE merely means that the logical offsets of the file as
12069	   viewed by a client issuing READs and WRITEs directly to the metadata
12070	   server are the same offsets each data server uses when storing a
12071	   stripe unit.  The effect then, for striping patterns consisting of at
12072	   least two stripe units, is for each data server file to be sparse or
12073	   holey.  So for example, suppose a pattern with 3 stripe units, the
12074	   stripe unit size is a block of 4 kilobytes, there are 3 data servers
12075	   in the pattern, then the file in data server 1 will have blocks 0, 3,
12076	   6, 9, ... filled, data server 2's file will have blocks 1, 4, 7, 10,
12077	   ... filled, and data server 3's file will have blocks 2, 5, 8, 11,
12078	   ... filled.  The unfilled blocks of each file will be holes, hence
12079	   the files in each data server are sparse.  Logical blocks 0, 3, 6,
12080	   ... of the file would exist as physical blocks 0, 3, 6 on data server
12081	   1, logical blocks 1, 4, 7, ... would exists as physical blocks 1, 4,
12082	   7 on data server 2, and logical blocks 2, 5, 8, ... would exist as
12083	   physical blocks 2, 5, 8 on data server 3.

12085	   The STRIPE4_SPARSE stripe type has holes for the octet ranges not
12086	   exported by that data server, thereby allowing pNFS clients to use
12087	   the real offset into the data server's file, regardless of the data
12088	   server's position within the pattern.  However, if a client attempts
12089	   I/O to one of the holes, then an error MUST be returned by the data
12090	   server.  Using the above example, if data server 2 received a READ or
12091	   WRITE request for block 4, the data server would return
12092	   NFS4ERR_PNFS_IO_HOLE.  Thus data servers need to understand the
12093	   striping pattern in order to support STRIPE4_SPARSE layouts.

12095	   STRIPE4_DENSE means that the data server files have no holes.
12096	   STRIPE4_DENSE might be selected because the data server does not
12097	   (efficiently) support holey files, e.g. the data server's file system
12098	   allocates storage in the gaps, making STRIPE4_SPARSE a waste of
12099	   space.  If the STRIPE4_DENSE stripe type is indicated in the layout,
12100	   the data files must be packed.  Using the example striping pattern
12101	   and stripe unit size that was used for the STRIPE4_SPARSE example,
12102	   the STRIPE4_DENSE example would have all data servers' data files
12103	   blocks, 0, 1, 2, 3, 4, ... filled.  Logical blocks 0, 3, 6, ... of
12104	   the file would live on blocks 0, 1, 2, ... of the file of data server
12105	   1, logical blocks 1, 4, 7, ... of the file would live on blocks 0, 1,
12106	   2, ... of the file of data server 2, and logical blocks 2, 5, 8, ...
12107	   of the file would live on blocks 0, 1, 2, ... of the file of data
12108	   server 3.

12110	   Since the STRIPE4_DENSE layout does not leave holes on the data
12111	   servers, the pNFS client is allowed to write to any offset of any
12112	   data file of any data server in the stripe.  Thus the the data
12113	   servers need not know the file's striping pattern.

12115	   The calculation to determine the octet offset within the data file
12116	   for dense data server layouts is:

12118	      stripe_width = stripe_unit_size * N;
12119	         where N = number of <data server, filehandle pairs>
12120	             in flattened nfl_ds_fh_list

12122	      data_file_offset = floor(file_offset / stripe_width)
12123	         * stripe_unit_size
12124	         + file_offset % stripe_unit_size

12126	   Regardless of the data server layout, the calculation to determine
12127	   the index into the device array is the same:

12129	      data_server_idx = floor(file_offset / stripe_unit_size) mod N

12131	   Section 13.12 describe the semantics for dealing with reads to holes
12132	   within the striped file.  This is of particular concern, since each
12133	   individual component stripe file (i.e., the component of the striped
12134	   file that lives on a particular data server) may be of different
12135	   length.  Thus, clients may experience 'short' reads when reading off
12136	   the end of one of these component files.

12138	13.6.  Data Server Multipathing

12140	   The NFSv4.1 file layout supports multipathing to "equivalent"
12141	   (defined later in this section) data servers.  Data server-level
12142	   multipathing is primarily of use in the case of a data server
12143	   failure; it allows the client to switch to another data server that
12144	   is exporting the same data stripe unit, without having to contact the
12145	   metadata server for a new layout.

12147	   To support data server multipathing, there is an array of data server
12148	   network addresses (nflda_simp_ds_list) within the FILEDS4_SIMPLE case
12149	   of the nfsv4_1_file_layout_ds_addr4 switched union.  This array
12150	   represents an ordered list of data server (each identified by a
12151	   network address) where the first element has the highest priority.
12152	   Each data server in the list MUST be equivalent to every other data
12153	   server in the list and each data server MUST be attempted in the
12154	   order specified.

12156	   Two data servers are equivalent if they export the same system image
12157	   (e.g., the stateids and filehandles that they use are the same) and
12158	   provide the same consistency guarantees.  Two equivalent data servers
12159	   must also have sufficient connections to the storage, such that
12160	   writing to one data server is equivalent to writing to another; this
12161	   also applies to reading.  Also, if multiple copies of the same data
12162	   exist, reading from one must provide access to all existing copies.
12163	   As such, it is unlikely that multipathing will provide additional
12164	   benefit in the case of an I/O error.

12166	   [[Comment.16: [NOTE: the error cases in which a client is expected to
12167	   attempt an equivalent data server should be specified.]]]

12169	13.7.  Operations Issued to NFSv4.1 Data Servers

12171	   Clients MUST use the filehandle described within the layout when
12172	   accessing data on NFSv4.1 data servers.  When using the layout's
12173	   filehandle, the client MUST only issue the NULL procedure and the
12174	   COMPOUND procedure's BACKCHANNEL_CTL, BIND_CONN_TO_SESSION,
12175	   CREATE_SESSION, COMMIT, DESTROY_CLIENTID, DESTROY_SESSION,
12176	   EXCHANGE_ID, READ, WRITE, PUTFH, SECINFO_NO_NAME, SET_SSV, and
12177	   SEQUENCE operations to the NFSv4.1 data server associated with that
12178	   data server filehandle.  If a client issues an operation to the data
12179	   server other than those specified above, using the filehandle and
12180	   data server listed in the file's layout, that data server MUST return
12181	   an error to the client (unless the pNFS server has chosen to not
12182	   disambiguate the data server filehandle from the metadata server
12183	   filehandle, and/or the pNFS server has chosen to not disambiguate the
12184	   metadata server client ID from the data server client ID).  The
12185	   client MUST follow the instruction implied by the layout (i.e., which
12186	   filehandles to use on which data servers).  As described in
12187	   Section 12.5.1, a client MUST NOT issue I/Os to data servers for
12188	   which it does not hold a valid layout.  The data servers MAY reject
12189	   such requests.

12191	   GETATTR and SETATTR MUST be directed to the metadata server.  In the
12192	   case of a SETATTR of the size attribute, the control protocol is
12193	   responsible for propagating size updates/truncations to the data
12194	   servers.  In the case of extending WRITEs to the data servers, the
12195	   new size must be visible on the metadata server once a LAYOUTCOMMIT
12196	   has completed (see Section 12.5.3.2).  Section 13.12, describes the
12197	   mechanism by which the client is to handle data server files that do
12198	   not reflect the metadata server's size.

12200	13.8.  COMMIT Through Metadata Server

12202	   The nfl_commit_through_mds field in the file layout (data type
12203	   nfsv4_1_file_layout4) gives the metadata server the preferred way of
12204	   performing COMMIT.  If this field is TRUE, the client SHOULD send
12205	   COMMIT to the metadata server instead of sending it to the same data
12206	   server to which the associated WRITEs were sent.  In order to
12207	   maintain the current NFSv4.1 commit and recovery model, all the data
12208	   servers MUST return a common writeverf verifier in all WRITE
12209	   responses for a given file layout.  The value of the writeverf
12210	   verifier MUST be changed at the metadata server or any data server
12211	   that is referenced in the layout, whenever there is a server event
12212	   that can possibly lead to loss of uncommitted data.  The scope of the
12213	   verifier can be for a file or for the entire pNFS server.  It might
12214	   be more difficult for the server to maintain the verifier at the file
12215	   level but the benefit is that only events that impact a given file
12216	   will require recovery action.

12218	   The single COMMIT to the metadata server will return a verifier and
12219	   the client should compare it to all the verifiers from the WRITEs and
12220	   fail the COMMIT if there is any mismatched verifiers.  If COMMIT to
12221	   the metadata server fails, the client should reissue WRITEs for all
12222	   the modified data in the file.  The client should treat modified data
12223	   with a mismatched verifier as a WRITE failure and try to recover by
12224	   reissuing the WRITEs to the original data server or using another
12225	   path to that data if the layout has not been recalled.  Another
12226	   option the client has is getting a new layout or just rewrite the
12227	   data through the metadata server.  If the flag nfl_commit_through_mds
12228	   is FALSE, the client should not send COMMIT to the metadata server.
12229	   Although it is valid to send COMMIT to the metadata server it should
12230	   be used only to commit data that was written through the metadata
12231	   server.  See Section 12.7.6 for recovery options.

12233	13.9.  Global Stateid Requirements

12235	   Note, there are no stateids embedded within the layout returned by
12236	   the metadata server to the pNFS client.  The client uses a stateid
12237	   returned previously by the metadata server (including results from
12238	   OPEN -- a delegation stateid is acceptable as well as a non-
12239	   delegation stateid -- lock operations, WANT_DELEGATION, and also from
12240	   the CB_PUSH_DELEG callback operation) or a special stateid to perform
12241	   I/O on the data servers, as in regular NFSv4.1.  Special stateid
12242	   usage for I/O is subject to the NFSv4.1 protocol specification.  The
12243	   stateid used for I/O MUST have the same effect and be subject to the
12244	   same validation on data server as it would if the I/O was being
12245	   performed on the metadata server itself in the absence of pNFS.  This
12246	   has the implication that stateids are globally valid on both the
12247	   metadata and data servers.  This requires the metadata server to
12248	   propagate changes in lock and open state to the data servers, so that
12249	   the data servers can validate I/O accesses.  This is discussed
12250	   further in Section 13.11.  Depending on when stateids are propagated,
12251	   the existence of a valid stateid on the data server may act as proof
12252	   of a valid layout.

12254	13.10.  The Layout Iomode

12256	   The layout iomode need not be used by the metadata server when
12257	   servicing NFSv4.1 file-based layouts, although in some circumstances
12258	   it may be useful to use.  For example, if the server implementation
12259	   supports reading from read-only replicas or mirrors, it would be
12260	   useful for the server to return a layout enabling the client to do
12261	   so.  As such, the client SHOULD set the iomode based on its intent to
12262	   read or write the data.  The client may default to an iomode of
12263	   LAYOUTIOMODE4_RW.  The iomode need not be checked by the data servers
12264	   when clients perform I/O. However, the data servers SHOULD still
12265	   validate that the client holds a valid layout and return an error if
12266	   the client does not.

12268	13.11.  Data Server State Propagation

12270	   Since the metadata server, which handles lock and open-mode state
12271	   changes, as well as ACLs, may not be co-located with the data servers
12272	   where I/O access are validated, as such, the server implementation
12273	   MUST take care of propagating changes of this state to the data
12274	   servers.  Once the propagation to the data servers is complete, the
12275	   full effect of those changes must be in effect at the data servers.
12276	   However, some state changes need not be propagated immediately,
12277	   although all changes SHOULD be propagated promptly.  These state
12278	   propagations have an impact on the design of the control protocol,
12279	   even though the control protocol is outside of the scope of this
12280	   specification.  Immediate propagation refers to the synchronous
12281	   propagation of state from the metadata server to the data server(s);
12282	   the propagation must be complete before returning to the client.

12284	13.11.1.  Lock State Propagation

12286	   If the pNFS server supports mandatory locking, any mandatory locks on
12287	   a file MUST be made effective at the data servers before the request
12288	   that establishes them returns to the caller.  Thus, mandatory lock
12289	   state MUST be synchronously propagated to the data servers.  On the
12290	   other hand, since advisory lock state is not used for checking I/O
12291	   accesses at the data servers, there is no semantic reason for
12292	   propagating advisory lock state to the data servers.  However, since
12293	   all lock, unlock, open downgrades and upgrades MAY affect the "seqid"
12294	   stored within the stateid (see Section 8.1.3.1), the stateid changes
12295	   may cause difficulty if this state is not propagated.  Thus, when a
12296	   client uses a stateid on a data server for I/O with a newer "seqid"
12297	   number than the one the data server has, the data server may need to
12298	   query the metadata server and get any pending updates to that
12299	   stateid.  This allows stateid sequence number changes to be
12300	   propagated lazily, on-demand.

12302	   Since updates to advisory locks neither confer nor remove privileges,
12303	   these changes need not be propagated immediately, and may not need to
12304	   be propagated promptly.  The updates to advisory locks need only be
12305	   propagated when the data server needs to resolve a question about a
12306	   stateid.  In fact, if record locking is not mandatory (i.e., is
12307	   advisory) the clients are advised not to use the lock-based stateids
12308	   for I/O at all.  The stateids returned by open are sufficient and
12309	   eliminate overhead for this kind of state propagation.

12311	13.11.2.  Open-mode Validation

12313	   Open-mode validation MUST be performed against the open mode(s) held
12314	   by the data servers.  However, the server implementation may not
12315	   always require the immediate propagation of changes.  Reduction in
12316	   access because of CLOSEs or DOWNGRADEs does not have to be propagated
12317	   immediately, but SHOULD be propagated promptly; whereas changes due
12318	   to revocation MUST be propagated immediately.  On the other hand,
12319	   changes that expand access (e.g., new OPEN's and upgrades) do not
12320	   have to be propagated immediately but the data server SHOULD NOT
12321	   reject a request because of open mode issues without making sure that
12322	   the upgrade is not in flight.

12324	13.11.3.  File Attributes

12326	   Since the SETATTR operation has the ability to modify state that is
12327	   visible on both the metadata and data servers (e.g., the size), care
12328	   must be taken to ensure that the resultant state across the set of
12329	   data servers is consistent; especially when truncating or growing the
12330	   file.

12332	   As described earlier, the LAYOUTCOMMIT operation is used to ensure
12333	   that the metadata is synchronized with changes made to the data
12334	   servers.  For the NFSv4.1-based data storage protocol, it is
12335	   necessary to re-synchronize state such as the size attribute, and the
12336	   setting of mtime/change/atime.  See Section 12.5.3 for a full
12337	   description of the semantics regarding LAYOUTCOMMIT and attribute
12338	   synchronization.  It should be noted, that by using an NFSv4.1-based
12339	   layout type, it is possible to synchronize this state before
12340	   LAYOUTCOMMIT occurs.  For example, the control protocol can be used
12341	   to query the attributes present on the data servers.

12343	   Any changes to file attributes that control authorization or access
12344	   as reflected by ACCESS calls or READs and WRITEs on the metadata
12345	   server, MUST be propagated to the data servers for enforcement on
12346	   READ and WRITE I/O calls.  If the changes made on the metadata server
12347	   result in more restrictive access permissions for any user, those
12348	   changes MUST be propagated to the data servers synchronously.

12350	   The OPEN operation (Section 17.16.5) does not impose any requirement
12351	   that I/O operations on an open file have the same credentials as the
12352	   OPEN itself, and so requires the server's READ and WRITE operations
12353	   to perform appropriate access checking.  Changes to ACLs also require
12354	   new access checking by READ and WRITE on the server.  The propagation
12355	   of access right changes due to changes in ACLs may be asynchronous
12356	   only if the server implementation is able to determine that the
12357	   updated ACL is not more restrictive for any user specified in the old
12358	   ACL.  Due to the relative infrequency of ACL updates, it is suggested
12359	   that all changes be propagated synchronously.

12361	13.12.  Data Server Component File Size

12363	   A potential problem exists when a component data file on a particular
12364	   data server is grown past EOF; the problem exists for both dense and
12365	   sparse layouts.  Imagine the following scenario: a client creates a
12366	   new file (size == 0) and writes to octet 131072; the client then
12367	   seeks to the beginning of the file and reads octet 100.  The client
12368	   should receive 0s back as a result of the READ.  However, if the READ
12369	   falls on a data server different than that that received client's
12370	   original WRITE, the data server servicing the READ may still believe
12371	   that the file's size is at 0 and return no data with the EOF flag
12372	   set.  The data server can only return 0s if it knows that the file's
12373	   size has been extended.  This would require the immediate propagation
12374	   of the file's size to all data servers, which is potentially very
12375	   costly.  Therefore, the client that has initiated the extension of
12376	   the file's size MUST be prepared to deal with these EOF conditions;
12377	   the EOF'ed or short READs will be treated as a hole in the file and
12378	   the NFS client will substitute 0s for the data when the offset is
12379	   less than the client's view of the file size.

12381	   The NFSv4.1 protocol only provides close to open file data cache
12382	   semantics; meaning that when the file is closed all modified data is
12383	   written to the server.  When a subsequent OPEN of the file is done,
12384	   the change attribute is inspected for a difference from a cached
12385	   value for the change attribute.  For the case above, this means that
12386	   a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and
12387	   will update the file's size and change attribute.  Access from
12388	   another client after that point will result in the appropriate size
12389	   being returned.

12391	13.13.  Recovery Considerations

12393	   As described in Section 12.7, the layout type-specific storage
12394	   protocol is responsible for handling the effects of I/Os started
12395	   before lease expiration, extending through lease expiration.  The
12396	   NFSv4.1 file layout type prevents all I/Os from being executed after
12397	   lease expiration, without relying on a precise client lease timer and
12398	   without requiring data servers to maintain lease timers.

12400	   It works as follows.  As described in Section 13.1, in COMPOUND
12401	   procedure requests to the data server, the data filehandle provided
12402	   by the PUTFH operation and the stateid in the READ or WRITE operation
12403	   are used to validate that the client has a valid layout for the I/O
12404	   being performed, if it does not, the I/O is rejected.  Before the
12405	   metadata server takes any action to invalidate a layout given out by
12406	   a previous instance, it must make sure that all layouts from that
12407	   previous instance are invalidated at the data servers.

12409	   This means that a metadata server may not restripe a file until it
12410	   has contacted all of the data servers to invalidate the layouts from
12411	   the previous instance nor may it give out mandatory locks that
12412	   conflict with layouts from the previous instance without either doing
12413	   a specific invalidation (as it would have to do anyway) or doing a
12414	   global data server invalidation.

12416	13.14.  Security Considerations for the File Layout Type

12418	   The NFSv4.1 file layout type MUST adhere to the security
12419	   considerations outlined in Section 12.9.  NFSv4.1 data servers must
12420	   make all of the required access checks on each READ or WRITE I/O as
12421	   determined by the NFSv4.1 protocol.  If the metadata server would
12422	   deny READ or WRITE operation on a given file due its ACL, mode
12423	   attribute, open mode, open deny mode, mandatory lock state, or any
12424	   other attributes and state, the data server MUST also deny the READ
12425	   or WRITE operation.  This impacts the control protocol and the
12426	   propagation of state from the metadata server to the data servers;
12427	   see Section 13.11 for more details.

12429	   The methods for authentication, integrity, and privacy for file
12430	   layout-based data servers are the same as that used for metadata
12431	   servers.  Metadata and data servers use ONC RPC security flavors to
12432	   authenticate, and SECINFO and SECINFO_NO_NAME to negotiate the
12433	   security mechanism and services to be used.

12435	   For a given file object, a metadata server MAY require different
12436	   security parameters (secinfo4 value) than the data server.  For a
12437	   given file object with multiple data servers, the secinfo4 value
12438	   SHOULD be the same across all data servers.

12440	   If an NFSv4.1 implementation supports pNFS and supports NFSv4.1 file
12441	   layouts, then the implementation MUST support the SECINFO_NO_NAME
12442	   operation, on both the metadata and data servers.

12444	14.  Internationalization

12446	   The primary issue in which NFS version 4 needs to deal with
12447	   internationalization, or I18N, is with respect to file names and
12448	   other strings as used within the protocol.  The choice of string
12449	   representation must allow reasonable name/string access to clients
12450	   which use various languages.  The UTF-8 encoding of the UCS as
12451	   defined by ISO10646 [10] allows for this type of access and follows
12452	   the policy described in "IETF Policy on Character Sets and
12453	   Languages", RFC2277 [11].

12455	   RFC3454 [12], otherwise know as "stringprep", documents a framework
12456	   for using Unicode/UTF-8 in networking protocols, so as "to increase
12457	   the likelihood that string input and string comparison work in ways
12458	   that make sense for typical users throughout the world."  A protocol
12459	   must define a profile of stringprep "in order to fully specify the
12460	   processing options."  The remainder of this Internationalization
12461	   section defines the NFS version 4 stringprep profiles.  Much of
12462	   terminology used for the remainder of this section comes from
12463	   stringprep.

12465	   There are three UTF-8 string types defined for NFS version 4:
12466	   utf8str_cs, utf8str_cis, and utf8str_mixed.  Separate profiles are
12467	   defined for each.  Each profile defines the following, as required by
12468	   stringprep:

12470	   o  The intended applicability of the profile

12472	   o  The character repertoire that is the input and output to
12473	      stringprep (which is Unicode 3.2 for referenced version of
12474	      stringprep)

12476	   o  The mapping tables from stringprep used (as described in section 3
12477	      of stringprep)

12479	   o  Any additional mapping tables specific to the profile

12481	   o  The Unicode normalization used, if any (as described in section 4
12482	      of stringprep)

12484	   o  The tables from stringprep listing of characters that are
12485	      prohibited as output (as described in section 5 of stringprep)

12487	   o  The bidirectional string testing used, if any (as described in
12488	      section 6 of stringprep)

12490	   o  Any additional characters that are prohibited as output specific
12491	      to the profile

12493	   Stringprep discusses Unicode characters, whereas NFS version 4
12494	   renders UTF-8 characters.  Since there is a one-to-one mapping from
12495	   UTF-8 to Unicode, when the remainder of this document refers to
12496	   Unicode, the reader should assume UTF-8.

12498	   Much of the text for the profiles comes from RFC3491 [13].

12500	14.1.  Stringprep profile for the utf8str_cs type

12502	   Every use of the utf8str_cs type definition in the NFS version 4
12503	   protocol specification follows the profile named nfs4_cs_prep.

12505	14.1.1.  Intended applicability of the nfs4_cs_prep profile

12507	   The utf8str_cs type is a case sensitive string of UTF-8 characters.
12508	   Its primary use in NFS Version 4 is for naming components and
12509	   pathnames.  Components and pathnames are stored on the server's file
12510	   system.  Two valid distinct UTF-8 strings might be the same after
12511	   processing via the utf8str_cs profile.  If the strings are two names
12512	   inside a directory, the NFS version 4 server will need to either:

12514	   o  disallow the creation of a second name if it's post processed form
12515	      collides with that of an existing name, or

12517	   o  allow the creation of the second name, but arrange so that after
12518	      post processing, the second name is different than the post
12519	      processed form of the first name.

12521	14.1.2.  Character repertoire of nfs4_cs_prep

12523	   The nfs4_cs_prep profile uses Unicode 3.2, as defined in stringprep's
12524	   Appendix A.1

12526	14.1.3.  Mapping used by nfs4_cs_prep

12528	   The nfs4_cs_prep profile specifies mapping using the following tables
12529	   from stringprep:

12531	      Table B.1

12533	   Table B.2 is normally not part of the nfs4_cs_prep profile as it is
12534	   primarily for dealing with case-insensitive comparisons.  However, if
12535	   the NFS version 4 file server supports the case_insensitive file
12536	   system attribute, and if case_insensitive is true, the NFS version 4
12537	   server MUST use Table B.2 (in addition to Table B1) when processing
12538	   utf8str_cs strings, and the NFS version 4 client MUST assume Table
12539	   B.2 (in addition to Table B.1) are being used.

12541	   If the case_preserving attribute is present and set to false, then
12542	   the NFS version 4 server MUST use table B.2 to map case when
12543	   processing utf8str_cs strings.  Whether the server maps from lower to
12544	   upper case or the upper to lower case is an implementation
12545	   dependency.

12547	14.1.4.  Normalization used by nfs4_cs_prep

12549	   The nfs4_cs_prep profile does not specify a normalization form.  A
12550	   later revision of this specification may specify a particular
12551	   normalization form.  Therefore, the server and client can expect that
12552	   they may receive unnormalized characters within protocol requests and
12553	   responses.  If the operating environment requires normalization, then
12554	   the implementation must normalize utf8str_cs strings within the
12555	   protocol before presenting the information to an application (at the
12556	   client) or local file system (at the server).

12558	14.1.5.  Prohibited output for nfs4_cs_prep

12560	   The nfs4_cs_prep profile specifies prohibiting using the following
12561	   tables from stringprep:

12563	      Table C.3

12565	      Table C.4

12567	      Table C.5

12569	      Table C.6

12571	      Table C.7

12573	      Table C.8

12575	      Table C.9

12577	14.1.6.  Bidirectional output for nfs4_cs_prep

12579	   The nfs4_cs_prep profile does not specify any checking of
12580	   bidirectional strings.

12582	14.2.  Stringprep profile for the utf8str_cis type

12584	   Every use of the utf8str_cis type definition in the NFS version 4
12585	   protocol specification follows the profile named nfs4_cis_prep.

12587	14.2.1.  Intended applicability of the nfs4_cis_prep profile

12589	   The utf8str_cis type is a case insensitive string of UTF-8
12590	   characters.  Its primary use in NFS Version 4 is for naming NFS
12591	   servers.

12593	14.2.2.  Character repertoire of nfs4_cis_prep

12595	   The nfs4_cis_prep profile uses Unicode 3.2, as defined in
12596	   stringprep's Appendix A.1

12598	14.2.3.  Mapping used by nfs4_cis_prep

12600	   The nfs4_cis_prep profile specifies mapping using the following
12601	   tables from stringprep:

12603	      Table B.1
12604	      Table B.2

12606	14.2.4.  Normalization used by nfs4_cis_prep

12608	   The nfs4_cis_prep profile specifies using Unicode normalization form
12609	   KC, as described in stringprep.

12611	14.2.5.  Prohibited output for nfs4_cis_prep

12613	   The nfs4_cis_prep profile specifies prohibiting using the following
12614	   tables from stringprep:

12616	      Table C.1.2

12618	      Table C.2.2

12620	      Table C.3

12622	      Table C.4

12624	      Table C.5

12626	      Table C.6

12628	      Table C.7

12630	      Table C.8

12632	      Table C.9

12634	14.2.6.  Bidirectional output for nfs4_cis_prep

12636	   The nfs4_cis_prep profile specifies checking bidirectional strings as
12637	   described in stringprep's section 6.

12639	14.3.  Stringprep profile for the utf8str_mixed type

12641	   Every use of the utf8str_mixed type definition in the NFS version 4
12642	   protocol specification follows the profile named nfs4_mixed_prep.

12644	14.3.1.  Intended applicability of the nfs4_mixed_prep profile

12646	   The utf8str_mixed type is a string of UTF-8 characters, with a prefix
12647	   that is case sensitive, a separator equal to '@', and a suffix that
12648	   is fully qualified domain name.  Its primary use in NFS Version 4 is
12649	   for naming principals identified in an Access Control Entry.

12651	14.3.2.  Character repertoire of nfs4_mixed_prep

12653	   The nfs4_mixed_prep profile uses Unicode 3.2, as defined in
12654	   stringprep's Appendix A.1

12656	14.3.3.  Mapping used by nfs4_cis_prep

12658	   For the prefix and the separator of a utf8str_mixed string, the
12659	   nfs4_mixed_prep profile specifies mapping using the following table
12660	   from stringprep:

12662	      Table B.1

12664	   For the suffix of a utf8str_mixed string, the nfs4_mixed_prep profile
12665	   specifies mapping using the following tables from stringprep:

12667	      Table B.1

12669	      Table B.2

12671	14.3.4.  Normalization used by nfs4_mixed_prep

12673	   The nfs4_mixed_prep profile specifies using Unicode normalization
12674	   form KC, as described in stringprep.

12676	14.3.5.  Prohibited output for nfs4_mixed_prep

12678	   The nfs4_mixed_prep profile specifies prohibiting using the following
12679	   tables from stringprep:

12681	      Table C.1.2

12683	      Table C.2.2

12685	      Table C.3

12687	      Table C.4

12689	      Table C.5

12691	      Table C.6

12693	      Table C.7

12695	      Table C.8

12697	      Table C.9

12699	14.3.6.  Bidirectional output for nfs4_mixed_prep

12701	   The nfs4_mixed_prep profile specifies checking bidirectional strings
12702	   as described in stringprep's section 6.

12704	14.4.  UTF-8 Related Errors

12706	   Where the client sends an invalid UTF-8 string, the server should
12707	   return an NFS4ERR_INVAL (Table 8) error.  This includes cases in
12708	   which inappropriate prefixes are detected and where the count
12709	   includes trailing bytes that do not constitute a full UCS character.

12711	   Where the client supplied string is valid UTF-8 but contains
12712	   characters that are not supported by the server as a value for that
12713	   string (e.g. names containing characters that have more than two
12714	   octets on a file system that supports Unicode characters only), the
12715	   server should return an NFS4ERR_BADCHAR (Table 8) error.

12717	   Where a UTF-8 string is used as a file name, and the file system,
12718	   while supporting all of the characters within the name, does not
12719	   allow that particular name to be used, the server should return the
12720	   error NFS4ERR_BADNAME (Table 8).  This includes situations in which
12721	   the server file system imposes a normalization constraint on name
12722	   strings, but will also include such situations as file system
12723	   prohibitions of "." and ".." as file names for certain operations,
12724	   and other such constraints.

12726	15.  Error Values

12728	   NFS error numbers are assigned to failed operations within a compound
12729	   request.  A compound request contains a number of NFS operations that
12730	   have their results encoded in sequence in a compound reply.  The
12731	   results of successful operations will consist of an NFS4_OK status
12732	   followed by the encoded results of the operation.  If an NFS
12733	   operation fails, an error status will be entered in the reply and the
12734	   compound request will be terminated.

12736	15.1.  Error Definitions

12738	                        Protocol Error Definitions

12740	   +-----------------------------------+--------+----------------------+
12741	   | Error                             | Number | Description          |
12742	   +-----------------------------------+--------+----------------------+
12743	   | NFS4_OK                           | 0      | Indicates the        |
12744	   |                                   |        | operation completed  |
12745	   |                                   |        | successfully.        |
12746	   | NFS4ERR_ACCESS                    | 13     | Permission denied.   |
12747	   |                                   |        | The caller does not  |
12748	   |                                   |        | have the correct     |
12749	   |                                   |        | permission to        |
12750	   |                                   |        | perform the          |
12751	   |                                   |        | requested operation. |
12752	   |                                   |        | Contrast this with   |
12753	   |                                   |        | NFS4ERR_PERM, which  |
12754	   |                                   |        | restricts itself to  |
12755	   |                                   |        | owner or privileged  |
12756	   |                                   |        | user permission      |
12757	   |                                   |        | failures.            |
12758	   | NFS4ERR_ATTRNOTSUPP               | 10032  | An attribute         |
12759	   |                                   |        | specified is not     |
12760	   |                                   |        | supported by the     |
12761	   |                                   |        | server.  Does not    |
12762	   |                                   |        | apply to the GETATTR |
12763	   |                                   |        | operation.           |
12764	   | NFS4ERR_ADMIN_REVOKED             | 10047  | Due to administrator |
12765	   |                                   |        | intervention, the    |
12766	   |                                   |        | lockowner's record   |
12767	   |                                   |        | locks, share         |
12768	   |                                   |        | reservations, and    |
12769	   |                                   |        | delegations have     |
12770	   |                                   |        | been revoked by the  |
12771	   |                                   |        | server.              |
12772	   | NFS4ERR_BACK_CHAN_BUSY            | 10057  | The session cannot   |
12773	   |                                   |        | be destroyed because |
12774	   |                                   |        | the server has       |
12775	   |                                   |        | callback requests    |
12776	   |                                   |        | outstanding.         |
12777	   | NFS4ERR_BADCHAR                   | 10040  | A UTF-8 string       |
12778	   |                                   |        | contains a character |
12779	   |                                   |        | which is not         |
12780	   |                                   |        | supported by the     |
12781	   |                                   |        | server in the        |
12782	   |                                   |        | context in which it  |
12783	   |                                   |        | being used.          |
12784	   | NFS4ERR_BAD_COOKIE                | 10003  | READDIR cookie is    |
12785	   |                                   |        | stale.               |
12786	   | NFS4ERR_BADHANDLE                 | 10001  | Illegal NFS          |
12787	   |                                   |        | filehandle.  The     |
12788	   |                                   |        | filehandle failed    |
12789	   |                                   |        | internal consistency |
12790	   |                                   |        | checks.              |
12791	   | NFS4ERR_BADIOMODE                 | 10049  | Layout iomode is     |
12792	   |                                   |        | invalid.             |
12793	   | NFS4ERR_BADLAYOUT                 | 10050  | Layout specified is  |
12794	   |                                   |        | invalid.             |
12795	   | NFS4ERR_BADNAME                   | 10041  | A name string in a   |
12796	   |                                   |        | request consists of  |
12797	   |                                   |        | valid UTF-8          |
12798	   |                                   |        | characters supported |
12799	   |                                   |        | by the server but    |
12800	   |                                   |        | the name is not      |
12801	   |                                   |        | supported by the     |
12802	   |                                   |        | server as a valid    |
12803	   |                                   |        | name for current     |
12804	   |                                   |        | operation.           |
12805	   | NFS4ERR_BADOWNER                  | 10039  | An owner,            |
12806	   |                                   |        | owner_group, or ACL  |
12807	   |                                   |        | attribute value can  |
12808	   |                                   |        | not be translated to |
12809	   |                                   |        | local                |
12810	   |                                   |        | representation.      |
12811	   | NFS4ERR_BAD_SESSION_DIGEST        | 10051  | The digest used in a |
12812	   |                                   |        | SET_SSV or           |
12813	   |                                   |        | BIND_CONN_TO_SESSION |
12814	   |                                   |        | request is not       |
12815	   |                                   |        | valid.               |
12816	   | NFS4ERR_BADTYPE                   | 10007  | An attempt was made  |
12817	   |                                   |        | to create an object  |
12818	   |                                   |        | of a type not        |
12819	   |                                   |        | supported by the     |
12820	   |                                   |        | server.              |
12821	   | NFS4ERR_BAD_RANGE                 | 10042  | The range for a      |
12822	   |                                   |        | LOCK, LOCKT, or      |
12823	   |                                   |        | LOCKU operation is   |
12824	   |                                   |        | not appropriate to   |
12825	   |                                   |        | the allowable range  |
12826	   |                                   |        | of offsets for the   |
12827	   |                                   |        | server.              |
12828	   | NFS4ERR_BAD_SEQID                 | 10026  | The sequence number  |
12829	   |                                   |        | in a locking request |
12830	   |                                   |        | is neither the next  |
12831	   |                                   |        | expected number or   |
12832	   |                                   |        | the last number      |
12833	   |                                   |        | processed.  This     |
12834	   |                                   |        | error does not apply |
12835	   |                                   |        | to and should never  |
12836	   |                                   |        | be generated in      |
12837	   |                                   |        | NFSv4.1.             |
12838	   | NFS4ERR_BADSESSION                | 10052  | TDB                  |
12839	   | NFS4ERR_BADSLOT                   | 10053  | TDB                  |
12840	   | NFS4ERR_BAD_STATEID               | 10025  | A stateid generated  |
12841	   |                                   |        | by the current       |
12842	   |                                   |        | server instance, but |
12843	   |                                   |        | which does not       |
12844	   |                                   |        | designate any        |
12845	   |                                   |        | locking state        |
12846	   |                                   |        | (either current or   |
12847	   |                                   |        | superseded) for a    |
12848	   |                                   |        | current              |
12849	   |                                   |        | lockowner-file pair, |
12850	   |                                   |        | was used.            |
12851	   | NFS4ERR_BADXDR                    | 10036  | The server           |
12852	   |                                   |        | encountered an XDR   |
12853	   |                                   |        | decoding error while |
12854	   |                                   |        | processing an        |
12855	   |                                   |        | operation.           |
12856	   | NFS4ERR_CLID_INUSE                | 10017  | The EXCHANGE_ID      |
12857	   |                                   |        | operation has found  |
12858	   |                                   |        | that a client ID is  |
12859	   |                                   |        | already in use by    |
12860	   |                                   |        | another client.      |
12861	   | NFS4ERR_CLIENTID_BUSY             | 10074  | The DESTROY_CLIENTID |
12862	   |                                   |        | operation has found  |
12863	   |                                   |        | there are has        |
12864	   |                                   |        | sessions and/or      |
12865	   |                                   |        | stateids bound to    |
12866	   |                                   |        | the client ID.       |
12867	   | NFS4ERR_COMPLETE_ALREADY          | 10054  | A RECLAIM_COMPLETE   |
12868	   |                                   |        | operation was done   |
12869	   |                                   |        | by a client which    |
12870	   |                                   |        | had already          |
12871	   |                                   |        | performed one.       |
12872	   | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055  | The connection is    |
12873	   |                                   |        | not bound to the     |
12874	   |                                   |        | specified session.   |
12875	   | NFS4ERR_CONN_BINDING_NOT_ENFORCED | 10073  | Client is trying use |
12876	   |                                   |        | enforced connection  |
12877	   |                                   |        | binding, but it      |
12878	   |                                   |        | disabled enforcement |
12879	   |                                   |        | when the session was |
12880	   |                                   |        | created.             |
12881	   | NFS4ERR_DEADLOCK                  | 10045  | The server has been  |
12882	   |                                   |        | able to determine a  |
12883	   |                                   |        | file locking         |
12884	   |                                   |        | deadlock condition   |
12885	   |                                   |        | for a blocking lock  |
12886	   |                                   |        | request.             |
12887	   | NFS4ERR_DELAY                     | 10008  | The server initiated |
12888	   |                                   |        | the request, but was |
12889	   |                                   |        | not able to complete |
12890	   |                                   |        | it in a timely       |
12891	   |                                   |        | fashion.  The client |
12892	   |                                   |        | should wait and then |
12893	   |                                   |        | try the request with |
12894	   |                                   |        | a new RPC            |
12895	   |                                   |        | transaction ID.  For |
12896	   |                                   |        | example, this error  |
12897	   |                                   |        | should be returned   |
12898	   |                                   |        | from a server that   |
12899	   |                                   |        | supports             |
12900	   |                                   |        | hierarchical storage |
12901	   |                                   |        | and receives a       |
12902	   |                                   |        | request to process a |
12903	   |                                   |        | file that has been   |
12904	   |                                   |        | migrated.  In this   |
12905	   |                                   |        | case, the server     |
12906	   |                                   |        | should start the     |
12907	   |                                   |        | immigration process  |
12908	   |                                   |        | and respond to       |
12909	   |                                   |        | client with this     |
12910	   |                                   |        | error.  This error   |
12911	   |                                   |        | may also occur when  |
12912	   |                                   |        | a necessary          |
12913	   |                                   |        | delegation recall    |
12914	   |                                   |        | makes processing a   |
12915	   |                                   |        | request in a timely  |
12916	   |                                   |        | fashion impossible.  |
12917	   | NFS4ERR_DELEG_ALREADY_WANTED      | 10056  | The client has       |
12918	   |                                   |        | already registered   |
12919	   |                                   |        | that it wants a      |
12920	   |                                   |        | delegation.          |
12921	   | NFS4ERR_DENIED                    | 10010  | An attempt to lock a |
12922	   |                                   |        | file is denied.      |
12923	   |                                   |        | Since this may be a  |
12924	   |                                   |        | temporary condition, |
12925	   |                                   |        | the client is        |
12926	   |                                   |        | encouraged to retry  |
12927	   |                                   |        | the lock request     |
12928	   |                                   |        | until the lock is    |
12929	   |                                   |        | accepted.            |
12930	   | NFS4ERR_DQUOT                     | 69     | Resource (quota)     |
12931	   |                                   |        | hard limit exceeded. |
12932	   |                                   |        | The user's resource  |
12933	   |                                   |        | limit on the server  |
12934	   |                                   |        | has been exceeded.   |
12935	   | NFS4ERR_EXIST                     | 17     | File exists.  The    |
12936	   |                                   |        | file specified       |
12937	   |                                   |        | already exists.      |
12938	   | NFS4ERR_EXPIRED                   | 10011  | A lease has expired  |
12939	   |                                   |        | that is being used   |
12940	   |                                   |        | in the current       |
12941	   |                                   |        | operation.           |
12942	   | NFS4ERR_FBIG                      | 27     | File too large.  The |
12943	   |                                   |        | operation would have |
12944	   |                                   |        | caused a file to     |
12945	   |                                   |        | grow beyond the      |
12946	   |                                   |        | server's limit.      |
12947	   | NFS4ERR_FHEXPIRED                 | 10014  | The filehandle       |
12948	   |                                   |        | provided is volatile |
12949	   |                                   |        | and has expired at   |
12950	   |                                   |        | the server.          |
12951	   | NFS4ERR_FILE_OPEN                 | 10046  | The operation can    |
12952	   |                                   |        | not be successfully  |
12953	   |                                   |        | processed because a  |
12954	   |                                   |        | file involved in the |
12955	   |                                   |        | operation is         |
12956	   |                                   |        | currently open.      |
12957	   | NFS4ERR_GRACE                     | 10013  | The server is in its |
12958	   |                                   |        | recovery or grace    |
12959	   |                                   |        | period which should  |
12960	   |                                   |        | match the lease      |
12961	   |                                   |        | period of the        |
12962	   |                                   |        | server.              |
12963	   | NFS4ERR_INVAL                     | 22     | Invalid argument or  |
12964	   |                                   |        | unsupported argument |
12965	   |                                   |        | for an operation.    |
12966	   |                                   |        | Two examples are     |
12967	   |                                   |        | attempting a         |
12968	   |                                   |        | READLINK on an       |
12969	   |                                   |        | object other than a  |
12970	   |                                   |        | symbolic link or     |
12971	   |                                   |        | specifying a value   |
12972	   |                                   |        | for an enum field    |
12973	   |                                   |        | that is not defined  |
12974	   |                                   |        | in the protocol      |
12975	   |                                   |        | (e.g. nfs_ftype4).   |
12976	   | NFS4ERR_IO                        | 5      | I/O error.  A hard   |
12977	   |                                   |        | error (for example,  |
12978	   |                                   |        | a disk error)        |
12979	   |                                   |        | occurred while       |
12980	   |                                   |        | processing the       |
12981	   |                                   |        | requested operation. |
12982	   | NFS4ERR_ISDIR                     | 21     | Is a directory.  The |
12983	   |                                   |        | caller specified a   |
12984	   |                                   |        | directory in a       |
12985	   |                                   |        | non-directory        |
12986	   |                                   |        | operation.           |
12987	   | NFS4ERR_LAYOUTTRYLATER            | 10058  | Layouts are          |
12988	   |                                   |        | temporarily          |
12989	   |                                   |        | unavailable for the  |
12990	   |                                   |        | file, client should  |
12991	   |                                   |        | retry later.         |
12992	   | NFS4ERR_LAYOUTUNAVAILABLE         | 10059  | Layouts are not      |
12993	   |                                   |        | available for the    |
12994	   |                                   |        | file or its          |
12995	   |                                   |        | containing file      |
12996	   |                                   |        | system.              |
12997	   | NFS4ERR_LEASE_MOVED               | 10031  | A lease being        |
12998	   |                                   |        | renewed is           |
12999	   |                                   |        | associated with a    |
13000	   |                                   |        | file system that has |
13001	   |                                   |        | been migrated to a   |
13002	   |                                   |        | new server.          |
13003	   | NFS4ERR_LOCKED                    | 10012  | A read or write      |
13004	   |                                   |        | operation was        |
13005	   |                                   |        | attempted on a       |
13006	   |                                   |        | locked file.         |
13007	   | NFS4ERR_LOCK_NOTSUPP              | 10043  | Server does not      |
13008	   |                                   |        | support atomic       |
13009	   |                                   |        | upgrade or downgrade |
13010	   |                                   |        | of locks.            |
13011	   | NFS4ERR_LOCK_RANGE                | 10028  | A lock request is    |
13012	   |                                   |        | operating on a       |
13013	   |                                   |        | sub-range of a       |
13014	   |                                   |        | current lock for the |
13015	   |                                   |        | lock owner and the   |
13016	   |                                   |        | server does not      |
13017	   |                                   |        | support this type of |
13018	   |                                   |        | request.             |
13019	   | NFS4ERR_LOCKS_HELD                | 10037  | A CLOSE was          |
13020	   |                                   |        | attempted and file   |
13021	   |                                   |        | locks would exist    |
13022	   |                                   |        | after the CLOSE.     |
13023	   | NFS4ERR_MINOR_VERS_MISMATCH       | 10021  | The server has       |
13024	   |                                   |        | received a request   |
13025	   |                                   |        | that specifies an    |
13026	   |                                   |        | unsupported minor    |
13027	   |                                   |        | version.  The server |
13028	   |                                   |        | must return a        |
13029	   |                                   |        | COMPOUND4res with a  |
13030	   |                                   |        | zero length          |
13031	   |                                   |        | operations result    |
13032	   |                                   |        | array.               |
13033	   | NFS4ERR_SEQ_MISORDERED            | 10063  | The requester sent a |
13034	   |                                   |        | SEQUENCE or          |
13035	   |                                   |        | CB_SEQUENCE          |
13036	   |                                   |        | operation with an    |
13037	   |                                   |        | invalid sequenceid.  |
13038	   | NFS4ERR_SEQUENCE_POS              | 10064  | The requester sent a |
13039	   |                                   |        | COMPOUND or          |
13040	   |                                   |        | CB_COMPOUND with a   |
13041	   |                                   |        | SEQUENCE or          |
13042	   |                                   |        | CB_SEQUENCE          |
13043	   |                                   |        | operation that was   |
13044	   |                                   |        | not the first        |
13045	   |                                   |        | operation.           |
13046	   | NFS4ERR_MLINK                     | 31     | Too many hard links. |
13047	   | NFS4ERR_MOVED                     | 10019  | The file system      |
13048	   |                                   |        | which contains the   |
13049	   |                                   |        | current filehandle   |
13050	   |                                   |        | object is not        |
13051	   |                                   |        | present at the       |
13052	   |                                   |        | server.  It may have |
13053	   |                                   |        | been relocated,      |
13054	   |                                   |        | migrated to another  |
13055	   |                                   |        | server or may have   |
13056	   |                                   |        | never been present.  |
13057	   |                                   |        | The client may       |
13058	   |                                   |        | obtain the new file  |
13059	   |                                   |        | system location by   |
13060	   |                                   |        | obtaining the        |
13061	   |                                   |        | "fs_locations"       |
13062	   |                                   |        | attribute for the    |
13063	   |                                   |        | current filehandle.  |
13064	   |                                   |        | For further          |
13065	   |                                   |        | discussion, refer to |
13066	   |                                   |        | the section          |
13067	   |                                   |        | "Multi-server Name   |
13068	   |                                   |        | Space".              |
13069	   | NFS4ERR_NAMETOOLONG               | 63     | The filename in an   |
13070	   |                                   |        | operation was too    |
13071	   |                                   |        | long.                |
13072	   | NFS4ERR_NOENT                     | 2      | No such file or      |
13073	   |                                   |        | directory.  The file |
13074	   |                                   |        | or directory name    |
13075	   |                                   |        | specified does not   |
13076	   |                                   |        | exist.               |
13077	   | NFS4ERR_NOFILEHANDLE              | 10020  | The logical current  |
13078	   |                                   |        | filehandle value     |
13079	   |                                   |        | (or, in the case of  |
13080	   |                                   |        | RESTOREFH, the saved |
13081	   |                                   |        | filehandle value)    |
13082	   |                                   |        | has not been set     |
13083	   |                                   |        | properly.  This may  |
13084	   |                                   |        | be a result of a     |
13085	   |                                   |        | malformed COMPOUND   |
13086	   |                                   |        | operation (i.e. no   |
13087	   |                                   |        | PUTFH or PUTROOTFH   |
13088	   |                                   |        | before an operation  |
13089	   |                                   |        | that requires the    |
13090	   |                                   |        | current filehandle   |
13091	   |                                   |        | be set).             |
13092	   | NFS4ERR_NO_GRACE                  | 10033  | A reclaim of client  |
13093	   |                                   |        | state was attempted  |
13094	   |                                   |        | in circumstances in  |
13095	   |                                   |        | which the server     |
13096	   |                                   |        | cannot guarantee     |
13097	   |                                   |        | that conflicting     |
13098	   |                                   |        | state has not been   |
13099	   |                                   |        | provided to another  |
13100	   |                                   |        | client.  This can    |
13101	   |                                   |        | occur because the    |
13102	   |                                   |        | reclaim has been     |
13103	   |                                   |        | done outside of the  |
13104	   |                                   |        | grace period of the  |
13105	   |                                   |        | server, after the    |
13106	   |                                   |        | client has done a    |
13107	   |                                   |        | RECLAIM_COMPLETE     |
13108	   |                                   |        | operation, or        |
13109	   |                                   |        | because previous     |
13110	   |                                   |        | operations have      |
13111	   |                                   |        | created a situation  |
13112	   |                                   |        | in which the server  |
13113	   |                                   |        | is not able to       |
13114	   |                                   |        | determine that a     |
13115	   |                                   |        | reclaim-interfering  |
13116	   |                                   |        | edge condition does  |
13117	   |                                   |        | not exist.           |
13118	   | NFS4ERR_NOMATCHING_LAYOUT         | 10060  | Client has no        |
13119	   |                                   |        | matching layout      |
13120	   |                                   |        | (segment) to return. |
13121	   | NFS4ERR_NOSPC                     | 28     | No space left on     |
13122	   |                                   |        | device.  The         |
13123	   |                                   |        | operation would have |
13124	   |                                   |        | caused the server's  |
13125	   |                                   |        | file system to       |
13126	   |                                   |        | exceed its limit.    |
13127	   | NFS4ERR_NOTDIR                    | 20     | Not a directory.     |
13128	   |                                   |        | The caller specified |
13129	   |                                   |        | a non-directory in a |
13130	   |                                   |        | directory operation. |
13131	   | NFS4ERR_NOTEMPTY                  | 66     | An attempt was made  |
13132	   |                                   |        | to remove a          |
13133	   |                                   |        | directory that was   |
13134	   |                                   |        | not empty.           |
13135	   | NFS4ERR_NOTSUPP                   | 10004  | Operation is not     |
13136	   |                                   |        | supported.           |
13137	   | NFS4ERR_NOT_SAME                  | 10027  | This error is        |
13138	   |                                   |        | returned by the      |
13139	   |                                   |        | VERIFY operation to  |
13140	   |                                   |        | signify that the     |
13141	   |                                   |        | attributes compared  |
13142	   |                                   |        | were not the same as |
13143	   |                                   |        | provided in the      |
13144	   |                                   |        | client's request.    |
13145	   | NFS4ERR_NXIO                      | 6      | I/O error.  No such  |
13146	   |                                   |        | device or address.   |
13147	   | NFS4ERR_OLD_STATEID               | 10024  | A stateid which      |
13148	   |                                   |        | designates the       |
13149	   |                                   |        | locking state for a  |
13150	   |                                   |        | lockowner-file at an |
13151	   |                                   |        | earlier time was     |
13152	   |                                   |        | used.  This error    |
13153	   |                                   |        | does not apply to    |
13154	   |                                   |        | and should never be  |
13155	   |                                   |        | generated in         |
13156	   |                                   |        | NFSv4.1.             |
13157	   | NFS4ERR_OPENMODE                  | 10038  | The client attempted |
13158	   |                                   |        | a READ, WRITE, LOCK  |
13159	   |                                   |        | or SETATTR operation |
13160	   |                                   |        | not sanctioned by    |
13161	   |                                   |        | the stateid passed   |
13162	   |                                   |        | (e.g. writing to a   |
13163	   |                                   |        | file opened only for |
13164	   |                                   |        | read).               |
13165	   | NFS4ERR_OP_ILLEGAL                | 10044  | An illegal operation |
13166	   |                                   |        | value has been       |
13167	   |                                   |        | specified in the     |
13168	   |                                   |        | argop field of a     |
13169	   |                                   |        | COMPOUND or          |
13170	   |                                   |        | CB_COMPOUND          |
13171	   |                                   |        | procedure.           |
13172	   | NFS4ERR_OP_NOT_IN_SESSION         | 10070  | The COMPOUND or      |
13173	   |                                   |        | CB_COMPOUND contains |
13174	   |                                   |        | an operation that    |
13175	   |                                   |        | requires a SEQUENCE  |
13176	   |                                   |        | or CB_SEQUENCE       |
13177	   |                                   |        | operation to precede |
13178	   |                                   |        | it in order to       |
13179	   |                                   |        | establish a session. |
13180	   | NFS4ERR_PERM                      | 1      | Not owner.  The      |
13181	   |                                   |        | operation was not    |
13182	   |                                   |        | allowed because the  |
13183	   |                                   |        | caller is either not |
13184	   |                                   |        | a privileged user    |
13185	   |                                   |        | (root) or not the    |
13186	   |                                   |        | owner of the target  |
13187	   |                                   |        | of the operation.    |
13188	   | NFS4ERR_PNFS_IO_HOLE              | 10075  | The pNFS client has  |
13189	   |                                   |        | attempted to read    |
13190	   |                                   |        | from or write to a   |
13191	   |                                   |        | illegal hole of a    |
13192	   |                                   |        | file of a data       |
13193	   |                                   |        | server that is using |
13194	   |                                   |        | the STRIPE4_SPARSE   |
13195	   |                                   |        | stripe type.  See    |
13196	   |                                   |        | Section 13.5.        |
13197	   | NFS4ERR_RECALLCONFLICT            | 10061  | Layout is            |
13198	   |                                   |        | unavailable due to a |
13199	   |                                   |        | conflicting          |
13200	   |                                   |        | LAYOUTRECALL that is |
13201	   |                                   |        | in progress.         |
13202	   | NFS4ERR_RECLAIM_BAD               | 10034  | The reclaim provided |
13203	   |                                   |        | by the client does   |
13204	   |                                   |        | not match any of the |
13205	   |                                   |        | server's state       |
13206	   |                                   |        | consistency checks   |
13207	   |                                   |        | and is bad.          |
13208	   | NFS4ERR_RECLAIM_CONFLICT          | 10035  | The reclaim provided |
13209	   |                                   |        | by the client has    |
13210	   |                                   |        | encountered a        |
13211	   |                                   |        | conflict and can not |
13212	   |                                   |        | be provided.         |
13213	   |                                   |        | Potentially          |
13214	   |                                   |        | indicates a          |
13215	   |                                   |        | misbehaving client.  |
13216	   | NFS4ERR_REP_TOO_BIG               | 10066  | The reply to a       |
13217	   |                                   |        | COMPOUND or          |
13218	   |                                   |        | CB_COMPOUND would    |
13219	   |                                   |        | exceed the channel's |
13220	   |                                   |        | negotiated maximum   |
13221	   |                                   |        | response size.       |
13222	   | NFS4ERR_REP_TOO_BIG_TO_CACHE      | 10067  | The reply to a       |
13223	   |                                   |        | COMPOUND or          |
13224	   |                                   |        | CB_COMPOUND would    |
13225	   |                                   |        | exceed the channel's |
13226	   |                                   |        | negotiated maximum   |
13227	   |                                   |        | size for replies     |
13228	   |                                   |        | cached in the reply  |
13229	   |                                   |        | cache.               |
13230	   | NFS4ERR_REQ_TOO_BIG               | 10065  | The COMPOUND or      |
13231	   |                                   |        | CB_COMPOUND request  |
13232	   |                                   |        | exceeds the          |
13233	   |                                   |        | channel's negotiated |
13234	   |                                   |        | maximum size for     |
13235	   |                                   |        | requests.            |
13236	   | NFS4ERR_RESTOREFH                 | 10030  | The RESTOREFH        |
13237	   |                                   |        | operation does not   |
13238	   |                                   |        | have a saved         |
13239	   |                                   |        | filehandle           |
13240	   |                                   |        | (identified by       |
13241	   |                                   |        | SAVEFH) to operate   |
13242	   |                                   |        | upon.                |
13243	   | NFS4ERR_RETRY_UNCACHED_REP        | 10068  | The requester has    |
13244	   |                                   |        | attempted a retry of |
13245	   |                                   |        | COMPOUND or          |
13246	   |                                   |        | CB_COMPOUND which it |
13247	   |                                   |        | previously requested |
13248	   |                                   |        | not be placed in the |
13249	   |                                   |        | reply cache.         |
13250	   | NFS4ERR_ROFS                      | 30     | Read-only file       |
13251	   |                                   |        | system.  A modifying |
13252	   |                                   |        | operation was        |
13253	   |                                   |        | attempted on a       |
13254	   |                                   |        | read-only file       |
13255	   |                                   |        | system.              |
13256	   | NFS4ERR_SAME                      | 10009  | This error is        |
13257	   |                                   |        | returned by the      |
13258	   |                                   |        | NVERIFY operation to |
13259	   |                                   |        | signify that the     |
13260	   |                                   |        | attributes compared  |
13261	   |                                   |        | were the same as     |
13262	   |                                   |        | provided in the      |
13263	   |                                   |        | client's request.    |
13264	   | NFS4ERR_SERVERFAULT               | 10006  | An error occurred on |
13265	   |                                   |        | the server which     |
13266	   |                                   |        | does not map to any  |
13267	   |                                   |        | of the legal NFS     |
13268	   |                                   |        | version 4 protocol   |
13269	   |                                   |        | error values.  The   |
13270	   |                                   |        | client should        |
13271	   |                                   |        | translate this into  |
13272	   |                                   |        | an appropriate       |
13273	   |                                   |        | error.  UNIX clients |
13274	   |                                   |        | may choose to        |
13275	   |                                   |        | translate this to    |
13276	   |                                   |        | EIO.                 |
13277	   | NFS4ERR_SHARE_DENIED              | 10015  | An attempt to OPEN a |
13278	   |                                   |        | file with a share    |
13279	   |                                   |        | reservation has      |
13280	   |                                   |        | failed because of a  |
13281	   |                                   |        | share conflict.      |
13282	   | NFS4ERR_STALE                     | 70     | Invalid filehandle.  |
13283	   |                                   |        | The filehandle given |
13284	   |                                   |        | in the arguments was |
13285	   |                                   |        | invalid.  The file   |
13286	   |                                   |        | referred to by that  |
13287	   |                                   |        | filehandle no longer |
13288	   |                                   |        | exists or access to  |
13289	   |                                   |        | it has been revoked. |
13290	   | NFS4ERR_STALE_CLIENTID            | 10022  | A client ID not      |
13291	   |                                   |        | recognized by the    |
13292	   |                                   |        | server was used in a |
13293	   |                                   |        | locking or           |
13294	   |                                   |        | CREATE_SESSION       |
13295	   |                                   |        | request.             |
13296	   | NFS4ERR_STALE_STATEID             | 10023  | A stateid generated  |
13297	   |                                   |        | by an earlier server |
13298	   |                                   |        | instance was used.   |
13299	   | NFS4ERR_SYMLINK                   | 10029  | The current          |
13300	   |                                   |        | filehandle provided  |
13301	   |                                   |        | for a LOOKUP is not  |
13302	   |                                   |        | a directory but a    |
13303	   |                                   |        | symbolic link.  Also |
13304	   |                                   |        | used if the final    |
13305	   |                                   |        | component of the     |
13306	   |                                   |        | OPEN path is a       |
13307	   |                                   |        | symbolic link.       |
13308	   | NFS4ERR_TOOSMALL                  | 10005  | The encoded response |
13309	   |                                   |        | to a READDIR request |
13310	   |                                   |        | exceeds the size     |
13311	   |                                   |        | limit set by the     |
13312	   |                                   |        | initial request.     |
13313	   | NFS4ERR_TOO_MANY_OPS              | 10070  | The COMPOUND or      |
13314	   |                                   |        | CB_COMPOUND request  |
13315	   |                                   |        | has too many         |
13316	   |                                   |        | operations.          |
13317	   | NFS4ERR_UNKNOWN_LAYOUTTYPE        | 10062  | Layout type is       |
13318	   |                                   |        | unknown.             |
13319	   | NFS4ERR_UNSAFE_COMPOUND           | 10069  | The client has sent  |
13320	   |                                   |        | a COMPOUND request   |
13321	   |                                   |        | with an unsafe mix   |
13322	   |                                   |        | of operations.       |
13323	   | NFS4ERR_WRONGSEC                  | 10016  | The security         |
13324	   |                                   |        | mechanism being used |
13325	   |                                   |        | by the client for    |
13326	   |                                   |        | the operation does   |
13327	   |                                   |        | not match the        |
13328	   |                                   |        | server's security    |
13329	   |                                   |        | policy.  The client  |
13330	   |                                   |        | should change the    |
13331	   |                                   |        | security mechanism   |
13332	   |                                   |        | being used and retry |
13333	   |                                   |        | the operation.       |
13334	   | NFS4ERR_XDEV                      | 18     | Attempt to do an     |
13335	   |                                   |        | operation between    |
13336	   |                                   |        | different fsids.     |
13337	   +-----------------------------------+--------+----------------------+

13339	                                  Table 8

13341	15.2.  Operations and their valid errors

13343	        Mappings of valid error returns for each protocol operation

13345	   +----------------------+--------------------------------------------+
13346	   | Operation            | Errors                                     |
13347	   +----------------------+--------------------------------------------+
13348	   | ACCESS               | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13349	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13350	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13351	   |                      | NFS4ERR_IO, NFS4ERR_MOVED,                 |
13352	   |                      | NFS4ERR_NOFILEHANDLE,                      |
13353	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13354	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13355	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13356	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13357	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13358	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13359	   | BIND_CONN_TO_SESSION | NFS4ERR_BADSESSION,                        |
13360	   |                      | NFS4ERR_BAD_SESSION_DIGEST,                |
13361	   |                      | NFS4ERR_CONN_BINDING_NOT_ENFORCED          |
13362	   | CLOSE                | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE,  |
13363	   |                      | NFS4ERR_BAD_STATEID, NFS4ERR_BADXDR,       |
13364	   |                      | NFS4ERR_DELAY, NFS4ERR_EXPIRED,            |
13365	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13366	   |                      | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED,        |
13367	   |                      | NFS4ERR_LOCKS_HELD, NFS4ERR_MOVED,         |
13368	   |                      | NFS4ERR_NOFILEHANDLE,                      |
13369	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13370	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13371	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13372	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13373	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13374	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13375	   |                      | NFS4ERR_STALE_STATEID                      |
13376	   | COMMIT               | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13377	   |                      | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED,         |
13378	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR,  |
13379	   |                      | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE,       |
13380	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13381	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13382	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13383	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13384	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13385	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13386	   | CREATE               | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP,       |
13387	   |                      | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE,        |
13388	   |                      | NFS4ERR_BADNAME, NFS4ERR_BADOWNER,         |
13389	   |                      | NFS4ERR_BADTYPE, NFS4ERR_BADXDR,           |
13390	   |                      | NFS4ERR_DELAY, NFS4ERR_DQUOT,              |
13391	   |                      | NFS4ERR_EXIST, NFS4ERR_FHEXPIRED,          |
13392	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED,  |
13393	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOFILEHANDLE, |
13394	   |                      | NFS4ERR_NOSPC, NFS4ERR_NOTDIR,             |
13395	   |                      | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PERM,   |
13396	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13397	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13398	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13399	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13400	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13401	   | EXCHANGE_ID          |                                            |
13402	   | CREATE_SESSION       | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE,        |
13403	   |                      | NFS4ERR_SERVERFAULT,                       |
13404	   |                      | NFS4ERR_STALE_CLIENTID                     |
13405	   | DELEGPURGE           | NFS4ERR_BADXDR, NFS4ERR_NOTSUPP,           |
13406	   |                      | NFS4ERR_LEASE_MOVED, NFS4ERR_MOVED,        |
13407	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13408	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13409	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13410	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13411	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13412	   |                      | NFS4ERR_SERVERFAULT,                       |
13413	   |                      | NFS4ERR_STALE_CLIENTID                     |
13414	   | DELEGRETURN          | NFS4ERR_ADMIN_REVOKED,                     |
13415	   |                      | NFS4ERR_BAD_STATEID, NFS4ERR_BADXDR,       |
13416	   |                      | NFS4ERR_EXPIRED, NFS4ERR_INVAL,            |
13417	   |                      | NFS4ERR_LEASE_MOVED, NFS4ERR_MOVED,        |
13418	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP,     |
13419	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13420	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13421	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13422	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13423	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13424	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13425	   |                      | NFS4ERR_STALE_STATEID                      |
13426	   | DESTROY_CLIENTID     | NFS4ERR_CLIENTID_BUSY,                     |
13427	   |                      | NFS4ERR_STALE_CLIENTID                     |
13428	   | DESTROY_SESSION      | NFS4ERR_BACK_CHAN_BUSY,                    |
13429	   |                      | NFS4ERR_BADSESSION, NFS4ERR_STALE_CLIENTID |
13430	   | GET_DIR_DELEGATION   | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13431	   |                      | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED,         |
13432	   |                      | NFS4ERR_INVAL, NFS4ERR_MOVED,              |
13433	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13434	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13435	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13436	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13437	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13438	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13439	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13440	   |                      | NFS4ERR_WRONGSEC, NFS4ERR_EIO,             |
13441	   |                      | NFS4ERR_NOTSUPP                            |
13442	   | GETATTR              | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13443	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13444	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13445	   |                      | NFS4ERR_IO, NFS4ERR_MOVED,                 |
13446	   |                      | NFS4ERR_NOFILEHANDLE,                      |
13447	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13448	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13449	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13450	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13451	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13452	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13453	   | GETDEVICEINFO        | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13454	   |                      | NFS4ERR_TOOSMALL,                          |
13455	   |                      | NFS4ERR_UNKNOWN_LAYOUTTYPE                 |
13456	   | GETDEVICELIST        | NFS4ERR_BAD_COOKIE, NFS4ERR_FHEXPIRED,     |
13457	   |                      | NFS4ERR_INVAL, NFS4ERR_TOOSMALL,           |
13458	   |                      | NFS4ERR_UNKNOWN_LAYOUTTYPE                 |
13459	   | GETFH                | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED,      |
13460	   |                      | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE,       |
13461	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13462	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13463	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13464	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13465	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13466	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13467	   | ILLEGAL              | NFS4ERR_OP_ILLEGAL                         |
13468	   | LAYOUTCOMMIT         | NFS4ERR_BADLAYOUT, NFS4ERR_BADIOMODE,      |
13469	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13470	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NO_GRACE,    |
13471	   |                      | NFS4ERR_RECLAIM_BAD, NFS4ERR_STALE,        |
13472	   |                      | NFS4ERR_STALE_CLIENTID,                    |
13473	   |                      | NFS4ERR_UNKNOWN_LAYOUTTYPE                 |
13474	   | LAYOUTGET            | NFS4ERR_BADLAYOUT, NFS4ERR_BADIOMODE,      |
13475	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE,          |
13476	   |                      | NFS4ERR_INVAL, NFS4ERR_LAYOUTUNAVAILABLE,  |
13477	   |                      | NFS4ERR_LAYOUTTRYLATER, NFS4ERR_LOCKED,    |
13478	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP,     |
13479	   |                      | NFS4ERR_RECALLCONFLICT, NFS4ERR_STALE,     |
13480	   |                      | NFS4ERR_STALE_CLIENTID, NFS4ERR_TOOSMALL,  |
13481	   |                      | NFS4ERR_UNKNOWN_LAYOUTTYPE                 |
13482	   | LAYOUTRETURN         | NFS4ERR_BADLAYOUT, NFS4ERR_BADIOMODE,      |
13483	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13484	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NO_GRACE,    |
13485	   |                      | NFS4ERR_STALE, NFS4ERR_STALE_CLIENTID,     |
13486	   |                      | NFS4ERR_UNKNOWN_LAYOUTTYPE                 |
13487	   | LINK                 | NFS4ERR_ACCESS, NFS4ERR_BADCHAR,           |
13488	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13489	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13490	   |                      | NFS4ERR_DQUOT, NFS4ERR_EXIST,              |
13491	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN,      |
13492	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR,  |
13493	   |                      | NFS4ERR_MLINK, NFS4ERR_MOVED,              |
13494	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13495	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC,       |
13496	   |                      | NFS4ERR_NOTDIR, NFS4ERR_NOTSUPP,           |
13497	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13498	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13499	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13500	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13501	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13502	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13503	   |                      | NFS4ERR_WRONGSEC, NFS4ERR_XDEV             |
13504	   | LOCK                 | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED,     |
13505	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BAD_RANGE,      |
13506	   |                      | NFS4ERR_BAD_STATEID, NFS4ERR_BADXDR,       |
13507	   |                      | NFS4ERR_DEADLOCK, NFS4ERR_DELAY,           |
13508	   |                      | NFS4ERR_DENIED, NFS4ERR_EXPIRED,           |
13509	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE,          |
13510	   |                      | NFS4ERR_INVAL, NFS4ERR_ISDIR,              |
13511	   |                      | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCK_NOTSUPP, |
13512	   |                      | NFS4ERR_LOCK_RANGE, NFS4ERR_MOVED,         |
13513	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NO_GRACE,    |
13514	   |                      | NFS4ERR_OPENMODE,                          |
13515	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13516	   |                      | NFS4ERR_RECLAIM_BAD,                       |
13517	   |                      | NFS4ERR_RECLAIM_CONFLICT,                  |
13518	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13519	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13520	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13521	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13522	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13523	   |                      | NFS4ERR_STALE_CLIENTID,                    |
13524	   |                      | NFS4ERR_STALE_STATEID                      |
13525	   | LOCKT                | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13526	   |                      | NFS4ERR_BAD_RANGE, NFS4ERR_BADXDR,         |
13527	   |                      | NFS4ERR_DELAY, NFS4ERR_DENIED,             |
13528	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE,          |
13529	   |                      | NFS4ERR_INVAL, NFS4ERR_ISDIR,              |
13530	   |                      | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCK_RANGE,   |
13531	   |                      | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE,       |
13532	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13533	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13534	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13535	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13536	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13537	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13538	   |                      | NFS4ERR_STALE_CLIENTID                     |
13539	   | LOCKU                | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED,     |
13540	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BAD_RANGE,      |
13541	   |                      | NFS4ERR_BAD_STATEID, NFS4ERR_BADXDR,       |
13542	   |                      | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED,        |
13543	   |                      | NFS4ERR_GRACE, NFS4ERR_INVAL,              |
13544	   |                      | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED,        |
13545	   |                      | NFS4ERR_LOCK_RANGE, NFS4ERR_MOVED,         |
13546	   |                      | NFS4ERR_NOFILEHANDLE,                      |
13547	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13548	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13549	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13550	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13551	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13552	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13553	   |                      | NFS4ERR_STALE_STATEID                      |
13554	   | LOOKUP               | NFS4ERR_ACCESS, NFS4ERR_BADCHAR,           |
13555	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13556	   |                      | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED,         |
13557	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED,  |
13558	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13559	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13560	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13561	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13562	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13563	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13564	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13565	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13566	   |                      | NFS4ERR_SYMLINK, NFS4ERR_WRONGSEC          |
13567	   | LOOKUPP              | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13568	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_IO,             |
13569	   |                      | NFS4ERR_MOVED, NFS4ERR_NOENT,              |
13570	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13571	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13572	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13573	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13574	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13575	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13576	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13577	   |                      | NFS4ERR_WRONGSEC                           |
13578	   | NVERIFY              | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP,       |
13579	   |                      | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE,        |
13580	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13581	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13582	   |                      | NFS4ERR_IO, NFS4ERR_MOVED,                 |
13583	   |                      | NFS4ERR_NOFILEHANDLE,                      |
13584	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13585	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13586	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13587	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13588	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_SAME,     |
13589	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13590	   | OPEN                 | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED,     |
13591	   |                      | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR,      |
13592	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13593	   |                      | NFS4ERR_BADOWNER, NFS4ERR_BADXDR,          |
13594	   |                      | NFS4ERR_DELAY, NFS4ERR_DQUOT,              |
13595	   |                      | NFS4ERR_EXIST, NFS4ERR_EXPIRED,            |
13596	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE,          |
13597	   |                      | NFS4ERR_IO, NFS4ERR_INVAL, NFS4ERR_ISDIR,  |
13598	   |                      | NFS4ERR_LEASE_MOVED, NFS4ERR_MOVED,        |
13599	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13600	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC,       |
13601	   |                      | NFS4ERR_NOTDIR, NFS4ERR_NO_GRACE,          |
13602	   |                      | NFS4ERR_PERM, NFS4ERR_RECLAIM_BAD,         |
13603	   |                      | NFS4ERR_RECLAIM_CONFLICT,                  |
13604	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13605	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13606	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13607	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13608	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13609	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_SHARE_DENIED, |
13610	   |                      | NFS4ERR_STALE, NFS4ERR_STALE_CLIENTID,     |
13611	   |                      | NFS4ERR_SYMLINK, NFS4ERR_WRONGSEC          |
13612	   | OPEN_DOWNGRADE       | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE,  |
13613	   |                      | NFS4ERR_BAD_STATEID, NFS4ERR_BADXDR,       |
13614	   |                      | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED,        |
13615	   |                      | NFS4ERR_INVAL, NFS4ERR_MOVED,              |
13616	   |                      | NFS4ERR_NOFILEHANDLE,                      |
13617	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13618	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13619	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13620	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13621	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13622	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13623	   |                      | NFS4ERR_STALE_STATEID                      |
13624	   | OPENATTR             | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13625	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13626	   |                      | NFS4ERR_DQUOT, NFS4ERR_FHEXPIRED,          |
13627	   |                      | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT,  |
13628	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC,       |
13629	   |                      | NFS4ERR_NOTSUPP,                           |
13630	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13631	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13632	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13633	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13634	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13635	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13636	   | PUTFH                | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR,         |
13637	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED,          |
13638	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13639	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13640	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13641	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13642	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13643	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13644	   |                      | NFS4ERR_WRONGSEC                           |
13645	   | PUTPUBFH             | NFS4ERR_OP_NOT_IN_SESSION,                 |
13646	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13647	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13648	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13649	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13650	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_WRONGSEC      |
13651	   | PUTROOTFH            | NFS4ERR_OP_NOT_IN_SESSION,                 |
13652	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13653	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13654	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13655	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13656	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_WRONGSEC      |
13657	   | READ                 | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED,     |
13658	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BAD_STATEID,    |
13659	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13660	   |                      | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED,        |
13661	   |                      | NFS4ERR_GRACE, NFS4ERR_IO, NFS4ERR_INVAL,  |
13662	   |                      | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED,        |
13663	   |                      | NFS4ERR_LOCKED, NFS4ERR_MOVED,             |
13664	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NXIO,        |
13665	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13666	   |                      | NFS4ERR_OPENMODE, NFS4ERR_PNFS_IO_HOLE,    |
13667	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13668	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13669	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13670	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13671	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13672	   |                      | NFS4ERR_STALE_STATEID                      |
13673	   | READDIR              | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13674	   |                      | NFS4ERR_BAD_COOKIE, NFS4ERR_BADXDR,        |
13675	   |                      | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED,          |
13676	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED,  |
13677	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13678	   |                      | NFS4ERR_NOT_SAME,                          |
13679	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13680	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13681	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13682	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13683	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13684	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13685	   |                      | NFS4ERR_TOOSMALL                           |
13686	   | READLINK             | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE,         |
13687	   |                      | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED,          |
13688	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR,  |
13689	   |                      | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE,       |
13690	   |                      | NFS4ERR_NOTSUPP,                           |
13691	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13692	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13693	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13694	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13695	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13696	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13697	   | RECLAIM_COMPLETE     | NFS4ERR_COMPLETE_ALREADY                   |
13698	   | RELEASE_LOCKOWNER    | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR,     |
13699	   |                      | NFS4ERR_EXPIRED, NFS4ERR_LEASE_MOVED,      |
13700	   |                      | NFS4ERR_LOCKS_HELD,                        |
13701	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13702	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13703	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13704	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13705	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13706	   |                      | NFS4ERR_SERVERFAULT,                       |
13707	   |                      | NFS4ERR_STALE_CLIENTID                     |
13708	   | REMOVE               | NFS4ERR_ACCESS, NFS4ERR_BADCHAR,           |
13709	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13710	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13711	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN,      |
13712	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED,  |
13713	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13714	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13715	   |                      | NFS4ERR_NOTEMPTY,                          |
13716	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13717	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13718	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13719	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13720	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13721	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13722	   | RENAME               | NFS4ERR_ACCESS, NFS4ERR_BADCHAR,           |
13723	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13724	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13725	   |                      | NFS4ERR_DQUOT, NFS4ERR_EXIST,              |
13726	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN,      |
13727	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED,  |
13728	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13729	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC,       |
13730	   |                      | NFS4ERR_NOTDIR, NFS4ERR_NOTEMPTY,          |
13731	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13732	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13733	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13734	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13735	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13736	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13737	   |                      | NFS4ERR_WRONGSEC, NFS4ERR_XDEV             |
13738	   | RESTOREFH            | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED,      |
13739	   |                      | NFS4ERR_MOVED, NFS4ERR_OP_NOT_IN_SESSION,  |
13740	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13741	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13742	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13743	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13744	   |                      | NFS4ERR_RESTOREFH, NFS4ERR_SERVERFAULT,    |
13745	   |                      | NFS4ERR_STALE, NFS4ERR_WRONGSEC            |
13746	   | SAVEFH               | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED,      |
13747	   |                      | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE,       |
13748	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13749	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13750	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13751	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13752	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13753	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13754	   | SECINFO              | NFS4ERR_ACCESS, NFS4ERR_BADCHAR,           |
13755	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13756	   |                      | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED,         |
13757	   |                      | NFS4ERR_INVAL, NFS4ERR_MOVED,              |
13758	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13759	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13760	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13761	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13762	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13763	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13764	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13765	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13766	   | SECINFO_NO_NAME      | NFS4ERR_ACCESS, NFS4ERR_BADCHAR,           |
13767	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME,        |
13768	   |                      | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED,         |
13769	   |                      | NFS4ERR_INVAL, NFS4ERR_MOVED,              |
13770	   |                      | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT,        |
13771	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR,      |
13772	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13773	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13774	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13775	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13776	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13777	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13778	   | SEQUENCE             | NFS4ERR_BADSESSION, NFS4ERR_BADSLOT,       |
13779	   |                      | NFS4ERR_CONN_NOT_BOUND_TO_SESSION,         |
13780	   |                      | NFS4ERR_SEQ_MISORDERED,                    |
13781	   |                      | NFS4ERR_SEQUENCE_POS, NFS4ERR_REQ_TOO_BIG, |
13782	   |                      | NFS4ERR_TOO_MANY_OPS, NFS4ERR_REP_TOO_BIG, |
13783	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE               |
13784	   | SET_SSV              | NFS4ERR_BAD_SESSION_DIGEST,                |
13785	   |                      | NFS4ERR_CONN_BINDING_NOT_ENFORCED          |
13786	   | SETATTR              | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED,     |
13787	   |                      | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR,      |
13788	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BADOWNER,       |
13789	   |                      | NFS4ERR_BAD_STATEID, NFS4ERR_BADXDR,       |
13790	   |                      | NFS4ERR_DELAY, NFS4ERR_DQUOT,              |
13791	   |                      | NFS4ERR_EXPIRED, NFS4ERR_FBIG,             |
13792	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE,          |
13793	   |                      | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR,  |
13794	   |                      | NFS4ERR_LOCKED, NFS4ERR_MOVED,             |
13795	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC,       |
13796	   |                      | NFS4ERR_OPENMODE, NFS4ERR_PERM,            |
13797	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13798	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13799	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13800	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13801	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13802	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13803	   |                      | NFS4ERR_STALE_STATEID                      |
13804	   | EXCHANGE_ID          | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE,        |
13805	   |                      | NFS4ERR_INVAL, NFS4ERR_SERVERFAULT         |
13806	   | CREATE_SESSION       | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE,        |
13807	   |                      | NFS4ERR_DELAY, NFS4ERR_SERVERFAULT,        |
13808	   |                      | NFS4ERR_STALE_CLIENTID                     |
13809	   | VERIFY               | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP,       |
13810	   |                      | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE,        |
13811	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13812	   |                      | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL,          |
13813	   |                      | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE,       |
13814	   |                      | NFS4ERR_NOT_SAME,                          |
13815	   |                      | NFS4ERR_OP_NOT_IN_SESSION,                 |
13816	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13817	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13818	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13819	   |                      | NFS4ERR_UNSAFE_COMPOUND,                   |
13820	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE         |
13821	   | WANT_DELEGATION      |                                            |
13822	   | WRITE                | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED,     |
13823	   |                      | NFS4ERR_BADHANDLE, NFS4ERR_BAD_STATEID,    |
13824	   |                      | NFS4ERR_BADXDR, NFS4ERR_DELAY,             |
13825	   |                      | NFS4ERR_DQUOT, NFS4ERR_EXPIRED,            |
13826	   |                      | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED,           |
13827	   |                      | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO,  |
13828	   |                      | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED,        |
13829	   |                      | NFS4ERR_LOCKED, NFS4ERR_MOVED,             |
13830	   |                      | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC,       |
13831	   |                      | NFS4ERR_NXIO, NFS4ERR_OP_NOT_IN_SESSION,   |
13832	   |                      | NFS4ERR_OPENMODE, NFS4ERR_PNFS_IO_HOLE,    |
13833	   |                      | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, |
13834	   |                      | NFS4ERR_REP_TOO_BIG,                       |
13835	   |                      | NFS4ERR_REP_TOO_BIG_TO_CACHE,              |
13836	   |                      | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_ROFS,     |
13837	   |                      | NFS4ERR_SERVERFAULT, NFS4ERR_STALE,        |
13838	   |                      | NFS4ERR_STALE_STATEID                      |
13839	   +----------------------+--------------------------------------------+

13841	                                  Table 9

13843	15.3.  Callback operations and their valid errors

13845	   Mappings of valid error returns for each protocol callback operation

13847	   +-------------------------+-----------------------------------------+
13848	   | Callback Operation      | Errors                                  |
13849	   +-------------------------+-----------------------------------------+
13850	   | CB_GETATTR              | NFS4ERR_BADHANDLE NFS4ERR_BADXDR        |
13851	   |                         | NFS4ERR_OP_NOT_IN_SESSION,              |
13852	   |                         | NFS4ERR_REQ_TOO_BIG,                    |
13853	   |                         | NFS4ERR_TOO_MANY_OPS,                   |
13854	   |                         | NFS4ERR_REP_TOO_BIG,                    |
13855	   |                         | NFS4ERR_REP_TOO_BIG_TO_CACHE,           |
13856	   |                         | NFS4ERR_UNSAFE_COMPOUND,                |
13857	   |                         | NFS4ERR_SERVERFAULT                     |
13858	   | CB_ILLEGAL              | NFS4ERR_OP_ILLEGAL                      |
13859	   | CB_LAYOUTRECALL         | NFS4ERR_NOMATCHING_LAYOUT               |
13860	   | CB_NOTIFY               | NFS4ERR_BAD_STATEID NFS4ERR_INVAL       |
13861	   |                         | NFS4ERR_BADXDR NFS4ERR_SERVERFAULT      |
13862	   | CB_PUSH_DELEG           |                                         |
13863	   | CB_RECALL               | NFS4ERR_BADHANDLE NFS4ERR_BAD_STATEID   |
13864	   |                         | NFS4ERR_BADXDR                          |
13865	   |                         | NFS4ERR_OP_NOT_IN_SESSION,              |
13866	   |                         | NFS4ERR_REQ_TOO_BIG,                    |
13867	   |                         | NFS4ERR_TOO_MANY_OPS,                   |
13868	   |                         | NFS4ERR_REP_TOO_BIG,                    |
13869	   |                         | NFS4ERR_REP_TOO_BIG_TO_CACHE,           |
13870	   |                         | NFS4ERR_SERVERFAULT                     |
13871	   | CB_RECALL_ANY           | NFS4ERR_OP_NOT_IN_SESSION,              |
13872	   |                         | NFS4ERR_REQ_TOO_BIG,                    |
13873	   |                         | NFS4ERR_TOO_MANY_OPS,                   |
13874	   |                         | NFS4ERR_REP_TOO_BIG,                    |
13875	   |                         | NFS4ERR_REP_TOO_BIG_TO_CACHE,           |
13876	   |                         | NFS4ERR_INVAL                           |
13877	   | CB_RECALLABLE_OBJ_AVAIL |                                         |
13878	   | CB_RECALL_CREDIT        |                                         |
13879	   | CB_SEQUENCE             | NFS4ERR_BADSESSION, NFS4ERR_BADSLOT,    |
13880	   |                         | NFS4ERR_CONN_NOT_BOUND_TO_SESSION,      |
13881	   |                         | NFS4ERR_SEQ_MISORDERED,                 |
13882	   |                         | NFS4ERR_SEQUENCE_POS,                   |
13883	   |                         | NFS4ERR_REQ_TOO_BIG,                    |
13884	   |                         | NFS4ERR_TOO_MANY_OPS,                   |
13885	   |                         | NFS4ERR_REP_TOO_BIG,                    |
13886	   |                         | NFS4ERR_REP_TOO_BIG_TO_CACHE            |
13887	   +-------------------------+-----------------------------------------+

13889	                                 Table 10

13891	15.4.  Errors and the operations that use them

13893	   +-----------------------------------+-------------------------------+
13894	   | Error                             | Operations                    |
13895	   +-----------------------------------+-------------------------------+
13896	   | NFS4ERR_ACCESS                    | ACCESS, COMMIT, CREATE,       |
13897	   |                                   | GETATTR, GET_DIR_DELEGATION,  |
13898	   |                                   | LINK, LOCK, LOCKT, LOCKU,     |
13899	   |                                   | LOOKUP, LOOKUPP, NVERIFY,     |
13900	   |                                   | OPEN, OPENATTR, READ,         |
13901	   |                                   | READDIR, READLINK, REMOVE,    |
13902	   |                                   | RENAME, SECINFO,              |
13903	   |                                   | SECINFO_NO_NAME, SETATTR,     |
13904	   |                                   | VERIFY, WRITE                 |
13905	   | NFS4ERR_ADMIN_REVOKED             | CLOSE, DELEGRETURN, LOCK,     |
13906	   |                                   | LOCKU, OPEN, OPEN_DOWNGRADE,  |
13907	   |                                   | READ, RELEASE_LOCKOWNER,      |
13908	   |                                   | SETATTR, WRITE                |
13909	   | NFS4ERR_ATTRNOTSUPP               | CREATE, NVERIFY, OPEN,        |
13910	   |                                   | SETATTR, VERIFY               |
13911	   | NFS4ERR_BACK_CHAN_BUSY            | DESTROY_SESSION               |
13912	   | NFS4ERR_BADCHAR                   | CREATE, LINK, LOOKUP,         |
13913	   |                                   | NVERIFY, OPEN, REMOVE,        |
13914	   |                                   | RENAME, SECINFO,              |
13915	   |                                   | SECINFO_NO_NAME, SETATTR,     |
13916	   |                                   | VERIFY                        |
13917	   | NFS4ERR_BADHANDLE                 | ACCESS, CB_GETATTR,           |
13918	   |                                   | CB_RECALL, CLOSE, COMMIT,     |
13919	   |                                   | CREATE, GETATTR, GETFH,       |
13920	   |                                   | GET_DIR_DELEGATION, LINK,     |
13921	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
13922	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
13923	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
13924	   |                                   | PUTFH, READ, READDIR,         |
13925	   |                                   | READLINK, REMOVE, RENAME,     |
13926	   |                                   | RESTOREFH, SAVEFH, SECINFO,   |
13927	   |                                   | SECINFO_NO_NAME, SETATTR,     |
13928	   |                                   | VERIFY, WRITE                 |
13929	   | NFS4ERR_BADIOMODE                 | LAYOUTCOMMIT, LAYOUTGET,      |
13930	   |                                   | LAYOUTRETURN                  |
13931	   | NFS4ERR_BADLAYOUT                 | LAYOUTCOMMIT, LAYOUTGET,      |
13932	   |                                   | LAYOUTRETURN                  |
13933	   | NFS4ERR_BADNAME                   | CREATE, LINK, LOOKUP, OPEN,   |
13934	   |                                   | REMOVE, RENAME, SECINFO,      |
13935	   |                                   | SECINFO_NO_NAME               |
13936	   | NFS4ERR_BADOWNER                  | CREATE, OPEN, SETATTR         |
13937	   | NFS4ERR_BADSESSION                | BIND_CONN_TO_SESSION,         |
13938	   |                                   | CB_SEQUENCE, DESTROY_SESSION, |
13939	   |                                   | SEQUENCE                      |
13940	   | NFS4ERR_BADSLOT                   | CB_SEQUENCE, SEQUENCE         |
13941	   | NFS4ERR_BADTYPE                   | CREATE                        |
13942	   | NFS4ERR_BADXDR                    | ACCESS, CB_GETATTR,           |
13943	   |                                   | CB_NOTIFY, CB_RECALL, CLOSE,  |
13944	   |                                   | COMMIT, CREATE,               |
13945	   |                                   | CREATE_SESSION, DELEGPURGE,   |
13946	   |                                   | DELEGRETURN, EXCHANGE_ID,     |
13947	   |                                   | GETATTR, GET_DIR_DELEGATION,  |
13948	   |                                   | LINK, LOCK, LOCKT, LOCKU,     |
13949	   |                                   | LOOKUP, NVERIFY, OPEN,        |
13950	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
13951	   |                                   | PUTFH, READ, READDIR,         |
13952	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
13953	   |                                   | RENAME, SECINFO,              |
13954	   |                                   | SECINFO_NO_NAME, SETATTR,     |
13955	   |                                   | VERIFY, WRITE                 |
13956	   | NFS4ERR_BAD_COOKIE                | GETDEVICELIST, READDIR        |
13957	   | NFS4ERR_BAD_RANGE                 | LOCK, LOCKT, LOCKU            |
13958	   | NFS4ERR_BAD_SESSION_DIGEST        | BIND_CONN_TO_SESSION, SET_SSV |
13959	   | NFS4ERR_BAD_STATEID               | CB_NOTIFY, CB_RECALL, CLOSE,  |
13960	   |                                   | DELEGRETURN, LOCK, LOCKU,     |
13961	   |                                   | OPEN_DOWNGRADE, READ,         |
13962	   |                                   | SETATTR, WRITE                |
13963	   | NFS4ERR_CLID_INUSE                | CREATE_SESSION, EXCHANGE_ID   |
13964	   | NFS4ERR_CLIENTID_BUSY             | DESTROY_CLIENTID              |
13965	   | NFS4ERR_COMPLETE_ALREADY          | RECLAIM_COMPLETE              |
13966	   | NFS4ERR_CONN_BINDING_NOT_ENFORCED | BIND_CONN_TO_SESSION, SET_SSV |
13967	   | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | CB_SEQUENCE, SEQUENCE         |
13968	   | NFS4ERR_DEADLOCK                  | LOCK                          |
13969	   | NFS4ERR_DELAY                     | ACCESS, CLOSE, CREATE,        |
13970	   |                                   | CREATE_SESSION, GETATTR,      |
13971	   |                                   | LINK, LOCK, LOCKT, NVERIFY,   |
13972	   |                                   | OPEN, OPENATTR, READ,         |
13973	   |                                   | READDIR, READLINK, REMOVE,    |
13974	   |                                   | RENAME, SETATTR, VERIFY,      |
13975	   |                                   | WRITE                         |
13976	   | NFS4ERR_DENIED                    | LOCK, LOCKT                   |
13977	   | NFS4ERR_DQUOT                     | CREATE, LINK, OPEN, OPENATTR, |
13978	   |                                   | RENAME, SETATTR, WRITE        |
13979	   | NFS4ERR_EIO                       | GET_DIR_DELEGATION            |
13980	   | NFS4ERR_EXIST                     | CREATE, LINK, OPEN, RENAME    |
13981	   | NFS4ERR_EXPIRED                   | CLOSE, DELEGRETURN, LOCK,     |
13982	   |                                   | LOCKU, OPEN, OPEN_DOWNGRADE,  |
13983	   |                                   | READ, RELEASE_LOCKOWNER,      |
13984	   |                                   | SETATTR, WRITE                |
13985	   | NFS4ERR_FBIG                      | SETATTR, WRITE                |
13986	   | NFS4ERR_FHEXPIRED                 | ACCESS, CLOSE, COMMIT,        |
13987	   |                                   | CREATE, GETATTR,              |
13988	   |                                   | GETDEVICEINFO, GETDEVICELIST, |
13989	   |                                   | GETFH, GET_DIR_DELEGATION,    |
13990	   |                                   | LAYOUTCOMMIT, LAYOUTGET,      |
13991	   |                                   | LAYOUTRETURN, LINK, LOCK,     |
13992	   |                                   | LOCKT, LOCKU, LOOKUP,         |
13993	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
13994	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
13995	   |                                   | PUTFH, READ, READDIR,         |
13996	   |                                   | READLINK, REMOVE, RENAME,     |
13997	   |                                   | RESTOREFH, SAVEFH, SECINFO,   |
13998	   |                                   | SECINFO_NO_NAME, SETATTR,     |
13999	   |                                   | VERIFY, WRITE                 |
14000	   | NFS4ERR_FILE_OPEN                 | LINK, REMOVE, RENAME          |
14001	   | NFS4ERR_GRACE                     | LAYOUTGET, LOCK, LOCKT,       |
14002	   |                                   | LOCKU, OPEN, READ, SETATTR,   |
14003	   |                                   | WRITE                         |
14004	   | NFS4ERR_INVAL                     | ACCESS, CB_NOTIFY,            |
14005	   |                                   | CB_RECALL_ANY, CLOSE, COMMIT, |
14006	   |                                   | CREATE, DELEGRETURN,          |
14007	   |                                   | EXCHANGE_ID, GETATTR,         |
14008	   |                                   | GETDEVICEINFO, GETDEVICELIST, |
14009	   |                                   | GET_DIR_DELEGATION,           |
14010	   |                                   | LAYOUTCOMMIT, LAYOUTGET,      |
14011	   |                                   | LAYOUTRETURN, LINK, LOCK,     |
14012	   |                                   | LOCKT, LOCKU, LOOKUP,         |
14013	   |                                   | NVERIFY, OPEN,                |
14014	   |                                   | OPEN_DOWNGRADE, READ,         |
14015	   |                                   | READDIR, READLINK, REMOVE,    |
14016	   |                                   | RENAME, SECINFO,              |
14017	   |                                   | SECINFO_NO_NAME, SETATTR,     |
14018	   |                                   | VERIFY, WRITE                 |
14019	   | NFS4ERR_IO                        | ACCESS, COMMIT, CREATE,       |
14020	   |                                   | GETATTR, LINK, LOOKUP,        |
14021	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14022	   |                                   | OPENATTR, READ, READDIR,      |
14023	   |                                   | READLINK, REMOVE, RENAME,     |
14024	   |                                   | SETATTR, WRITE                |
14025	   | NFS4ERR_ISDIR                     | CLOSE, COMMIT, LINK, LOCK,    |
14026	   |                                   | LOCKT, LOCKU, OPEN, READ,     |
14027	   |                                   | READLINK, SETATTR, WRITE      |
14028	   | NFS4ERR_LAYOUTTRYLATER            | LAYOUTGET                     |
14029	   | NFS4ERR_LAYOUTUNAVAILABLE         | LAYOUTGET                     |
14030	   | NFS4ERR_LEASE_MOVED               | CLOSE, DELEGPURGE,            |
14031	   |                                   | DELEGRETURN, LOCK, LOCKT,     |
14032	   |                                   | LOCKU, OPEN, READ,            |
14033	   |                                   | RELEASE_LOCKOWNER, WRITE      |
14034	   | NFS4ERR_LOCKED                    | LAYOUTGET, READ, SETATTR,     |
14035	   |                                   | WRITE                         |
14036	   | NFS4ERR_LOCKS_HELD                | CLOSE, RELEASE_LOCKOWNER      |
14037	   | NFS4ERR_LOCK_NOTSUPP              | LOCK                          |
14038	   | NFS4ERR_LOCK_RANGE                | LOCK, LOCKT, LOCKU            |
14039	   | NFS4ERR_MLINK                     | LINK                          |
14040	   | NFS4ERR_MOVED                     | ACCESS, CLOSE, COMMIT,        |
14041	   |                                   | CREATE, DELEGPURGE,           |
14042	   |                                   | DELEGRETURN, GETATTR, GETFH,  |
14043	   |                                   | GET_DIR_DELEGATION, LINK,     |
14044	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14045	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14046	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14047	   |                                   | PUTFH, READ, READDIR,         |
14048	   |                                   | READLINK, REMOVE, RENAME,     |
14049	   |                                   | RESTOREFH, SAVEFH, SECINFO,   |
14050	   |                                   | SECINFO_NO_NAME, SETATTR,     |
14051	   |                                   | VERIFY, WRITE                 |
14052	   | NFS4ERR_NAMETOOLONG               | CREATE, LINK, LOOKUP, OPEN,   |
14053	   |                                   | REMOVE, RENAME, SECINFO,      |
14054	   |                                   | SECINFO_NO_NAME               |
14055	   | NFS4ERR_NOENT                     | LINK, LOOKUP, LOOKUPP, OPEN,  |
14056	   |                                   | OPENATTR, REMOVE, RENAME,     |
14057	   |                                   | SECINFO, SECINFO_NO_NAME      |
14058	   | NFS4ERR_NOFILEHANDLE              | ACCESS, CLOSE, COMMIT,        |
14059	   |                                   | CREATE, DELEGRETURN, GETATTR, |
14060	   |                                   | GETFH, GET_DIR_DELEGATION,    |
14061	   |                                   | LAYOUTCOMMIT, LAYOUTGET,      |
14062	   |                                   | LAYOUTRETURN, LINK, LOCK,     |
14063	   |                                   | LOCKT, LOCKU, LOOKUP,         |
14064	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14065	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14066	   |                                   | READ, READDIR, READLINK,      |
14067	   |                                   | REMOVE, RENAME, SAVEFH,       |
14068	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14069	   |                                   | SETATTR, VERIFY, WRITE        |
14070	   | NFS4ERR_NOMATCHING_LAYOUT         | CB_LAYOUTRECALL               |
14071	   | NFS4ERR_NOSPC                     | CREATE, LINK, OPEN, OPENATTR, |
14072	   |                                   | RENAME, SETATTR, WRITE        |
14073	   | NFS4ERR_NOTDIR                    | CREATE, GET_DIR_DELEGATION,   |
14074	   |                                   | LINK, LOOKUP, LOOKUPP, OPEN,  |
14075	   |                                   | READDIR, REMOVE, RENAME,      |
14076	   |                                   | SECINFO, SECINFO_NO_NAME      |
14077	   | NFS4ERR_NOTEMPTY                  | REMOVE, RENAME                |
14078	   | NFS4ERR_NOTSUPP                   | DELEGPURGE, DELEGRETURN,      |
14079	   |                                   | GET_DIR_DELEGATION,           |
14080	   |                                   | LAYOUTGET, LINK, OPENATTR,    |
14081	   |                                   | READLINK                      |
14082	   | NFS4ERR_NOT_SAME                  | READDIR, VERIFY               |
14083	   | NFS4ERR_NO_GRACE                  | LAYOUTCOMMIT, LAYOUTRETURN,   |
14084	   |                                   | LOCK, OPEN                    |
14085	   | NFS4ERR_NXIO                      | READ, WRITE                   |
14086	   | NFS4ERR_OPENMODE                  | LOCK, READ, SETATTR, WRITE    |
14087	   | NFS4ERR_OP_ILLEGAL                | CB_ILLEGAL, ILLEGAL           |
14088	   | NFS4ERR_OP_NOT_IN_SESSION         | ACCESS, CB_GETATTR,           |
14089	   |                                   | CB_RECALL, CB_RECALL_ANY,     |
14090	   |                                   | CLOSE, COMMIT, CREATE,        |
14091	   |                                   | DELEGPURGE, DELEGRETURN,      |
14092	   |                                   | GETATTR, GETFH,               |
14093	   |                                   | GET_DIR_DELEGATION, LINK,     |
14094	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14095	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14096	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14097	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14098	   |                                   | READ, READDIR, READLINK,      |
14099	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14100	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14101	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14102	   |                                   | SETATTR, VERIFY, WRITE        |
14103	   | NFS4ERR_PERM                      | CREATE, OPEN, SETATTR         |
14104	   | NFS4ERR_PNFS_IO_HOLE              | READ, WRITE                   |
14105	   | NFS4ERR_RECALLCONFLICT            | LAYOUTGET                     |
14106	   | NFS4ERR_RECLAIM_BAD               | LAYOUTCOMMIT, LOCK, OPEN      |
14107	   | NFS4ERR_RECLAIM_CONFLICT          | LOCK, OPEN                    |
14108	   | NFS4ERR_REP_TOO_BIG               | ACCESS, CB_GETATTR,           |
14109	   |                                   | CB_RECALL, CB_RECALL_ANY,     |
14110	   |                                   | CB_SEQUENCE, CLOSE, COMMIT,   |
14111	   |                                   | CREATE, DELEGPURGE,           |
14112	   |                                   | DELEGRETURN, GETATTR, GETFH,  |
14113	   |                                   | GET_DIR_DELEGATION, LINK,     |
14114	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14115	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14116	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14117	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14118	   |                                   | READ, READDIR, READLINK,      |
14119	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14120	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14121	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14122	   |                                   | SEQUENCE, SETATTR, VERIFY,    |
14123	   |                                   | WRITE                         |
14124	   | NFS4ERR_REP_TOO_BIG_TO_CACHE      | ACCESS, CB_GETATTR,           |
14125	   |                                   | CB_RECALL, CB_RECALL_ANY,     |
14126	   |                                   | CB_SEQUENCE, CLOSE, COMMIT,   |
14127	   |                                   | CREATE, DELEGPURGE,           |
14128	   |                                   | DELEGRETURN, GETATTR, GETFH,  |
14129	   |                                   | GET_DIR_DELEGATION, LINK,     |
14130	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14131	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14132	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14133	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14134	   |                                   | READ, READDIR, READLINK,      |
14135	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14136	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14137	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14138	   |                                   | SEQUENCE, SETATTR, VERIFY,    |
14139	   |                                   | WRITE                         |
14140	   | NFS4ERR_REQ_TOO_BIG               | ACCESS, CB_GETATTR,           |
14141	   |                                   | CB_RECALL, CB_RECALL_ANY,     |
14142	   |                                   | CB_SEQUENCE, CLOSE, COMMIT,   |
14143	   |                                   | CREATE, DELEGPURGE,           |
14144	   |                                   | DELEGRETURN, GETATTR, GETFH,  |
14145	   |                                   | GET_DIR_DELEGATION, LINK,     |
14146	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14147	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14148	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14149	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14150	   |                                   | READ, READDIR, READLINK,      |
14151	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14152	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14153	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14154	   |                                   | SEQUENCE, SETATTR, VERIFY,    |
14155	   |                                   | WRITE                         |
14156	   | NFS4ERR_RESTOREFH                 | RESTOREFH                     |
14157	   | NFS4ERR_ROFS                      | COMMIT, CREATE, LINK, OPEN,   |
14158	   |                                   | OPENATTR, REMOVE, RENAME,     |
14159	   |                                   | SETATTR, WRITE                |
14160	   | NFS4ERR_SAME                      | NVERIFY                       |
14161	   | NFS4ERR_SEQUENCE_POS              | CB_SEQUENCE, SEQUENCE         |
14162	   | NFS4ERR_SEQ_MISORDERED            | CB_SEQUENCE, SEQUENCE         |
14163	   | NFS4ERR_SERVERFAULT               | ACCESS, CB_GETATTR,           |
14164	   |                                   | CB_NOTIFY, CB_RECALL, CLOSE,  |
14165	   |                                   | COMMIT, CREATE,               |
14166	   |                                   | CREATE_SESSION, DELEGPURGE,   |
14167	   |                                   | DELEGRETURN, EXCHANGE_ID,     |
14168	   |                                   | GETATTR, GETFH,               |
14169	   |                                   | GET_DIR_DELEGATION, LINK,     |
14170	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14171	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14172	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14173	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14174	   |                                   | READ, READDIR, READLINK,      |
14175	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14176	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14177	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14178	   |                                   | SETATTR, VERIFY, WRITE        |
14179	   | NFS4ERR_SHARE_DENIED              | OPEN                          |
14180	   | NFS4ERR_STALE                     | ACCESS, CLOSE, COMMIT,        |
14181	   |                                   | CREATE, DELEGRETURN, GETATTR, |
14182	   |                                   | GETFH, GET_DIR_DELEGATION,    |
14183	   |                                   | LAYOUTCOMMIT, LAYOUTGET,      |
14184	   |                                   | LAYOUTRETURN, LINK, LOCK,     |
14185	   |                                   | LOCKT, LOCKU, LOOKUP,         |
14186	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14187	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14188	   |                                   | PUTFH, READ, READDIR,         |
14189	   |                                   | READLINK, REMOVE, RENAME,     |
14190	   |                                   | RESTOREFH, SAVEFH, SECINFO,   |
14191	   |                                   | SECINFO_NO_NAME, SETATTR,     |
14192	   |                                   | VERIFY, WRITE                 |
14193	   | NFS4ERR_STALE_CLIENTID            | CREATE_SESSION, DELEGPURGE,   |
14194	   |                                   | DESTROY_CLIENTID,             |
14195	   |                                   | DESTROY_SESSION,              |
14196	   |                                   | LAYOUTCOMMIT, LAYOUTGET,      |
14197	   |                                   | LAYOUTRETURN, LOCK, LOCKT,    |
14198	   |                                   | OPEN, RELEASE_LOCKOWNER       |
14199	   | NFS4ERR_STALE_STATEID             | CLOSE, DELEGRETURN, LOCK,     |
14200	   |                                   | LOCKU, OPEN_DOWNGRADE, READ,  |
14201	   |                                   | SETATTR, WRITE                |
14202	   | NFS4ERR_SYMLINK                   | LOOKUP, OPEN                  |
14203	   | NFS4ERR_TOOSMALL                  | GETDEVICEINFO, GETDEVICELIST, |
14204	   |                                   | LAYOUTGET, READDIR            |
14205	   | NFS4ERR_TOO_MANY_OPS              | ACCESS, CB_GETATTR,           |
14206	   |                                   | CB_RECALL, CB_RECALL_ANY,     |
14207	   |                                   | CB_SEQUENCE, CLOSE, COMMIT,   |
14208	   |                                   | CREATE, DELEGPURGE,           |
14209	   |                                   | DELEGRETURN, GETATTR, GETFH,  |
14210	   |                                   | GET_DIR_DELEGATION, LINK,     |
14211	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14212	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14213	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14214	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14215	   |                                   | READ, READDIR, READLINK,      |
14216	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14217	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14218	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14219	   |                                   | SEQUENCE, SETATTR, VERIFY,    |
14220	   |                                   | WRITE                         |
14221	   | NFS4ERR_UNKNOWN_LAYOUTTYPE        | GETDEVICEINFO, GETDEVICELIST, |
14222	   |                                   | LAYOUTCOMMIT, LAYOUTGET,      |
14223	   |                                   | LAYOUTRETURN                  |
14224	   | NFS4ERR_UNSAFE_COMPOUND           | ACCESS, CB_GETATTR, CLOSE,    |
14225	   |                                   | COMMIT, CREATE, DELEGPURGE,   |
14226	   |                                   | DELEGRETURN, GETATTR, GETFH,  |
14227	   |                                   | GET_DIR_DELEGATION, LINK,     |
14228	   |                                   | LOCK, LOCKT, LOCKU, LOOKUP,   |
14229	   |                                   | LOOKUPP, NVERIFY, OPEN,       |
14230	   |                                   | OPENATTR, OPEN_DOWNGRADE,     |
14231	   |                                   | PUTFH, PUTPUBFH, PUTROOTFH,   |
14232	   |                                   | READ, READDIR, READLINK,      |
14233	   |                                   | RELEASE_LOCKOWNER, REMOVE,    |
14234	   |                                   | RENAME, RESTOREFH, SAVEFH,    |
14235	   |                                   | SECINFO, SECINFO_NO_NAME,     |
14236	   |                                   | SETATTR, VERIFY, WRITE        |
14237	   | NFS4ERR_WRONGSEC                  | GET_DIR_DELEGATION, LINK,     |
14238	   |                                   | LOOKUP, LOOKUPP, OPEN, PUTFH, |
14239	   |                                   | PUTPUBFH, PUTROOTFH, RENAME,  |
14240	   |                                   | RESTOREFH                     |
14241	   | NFS4ERR_XDEV                      | LINK, RENAME                  |
14242	   +-----------------------------------+-------------------------------+

14244	                                 Table 11

14246	16.  NFS version 4.1 Procedures

14248	16.1.  Procedure 0: NULL - No Operation
14249	16.1.1.  SYNOPSIS

14251	16.1.2.  ARGUMENTS

14253	   void;

14255	16.1.3.  RESULTS

14257	   void;

14259	16.1.4.  DESCRIPTION

14261	   Standard NULL procedure.  Void argument, void response.  This
14262	   procedure has no functionality associated with it.  Because of this
14263	   it is sometimes used to measure the overhead of processing a service
14264	   request.  Therefore, the server should ensure that no unnecessary
14265	   work is done in servicing this procedure.

14267	16.1.5.  ERRORS

14269	   None.

14271	16.2.  Procedure 1: COMPOUND - Compound Operations

14273	16.2.1.  SYNOPSIS

14275	   compoundargs -> compoundres

14277	16.2.2.  ARGUMENTS

14279	   union nfs_argop4 switch (nfs_opnum4 argop) {
14280	       case <OPCODE>: <argument>;
14281	       ...
14282	   };

14284	   struct COMPOUND4args {
14285	       utf8str_cs      tag;
14286	       uint32_t        minorversion;
14287	       nfs_argop4      argarray<>;
14288	   };

14290	16.2.3.  RESULTS

14292	   union nfs_resop4 switch (nfs_opnum4 resop){
14293	       case <OPCODE>: <result>;
14294	       ...
14295	   };

14297	   struct COMPOUND4res {
14298	       nfsstat4        status;
14299	       utf8str_cs      tag;
14300	       nfs_resop4      resarray<>;
14301	   };

14303	16.2.4.  DESCRIPTION

14305	   The COMPOUND procedure is used to combine one or more of the NFS
14306	   operations into a single RPC request.  The main NFS RPC program has
14307	   two main procedures: NULL and COMPOUND.  All other operations use the
14308	   COMPOUND procedure as a wrapper.

14310	   The COMPOUND procedure is used to combine individual operations into
14311	   a single RPC request.  The server interprets each of the operations
14312	   in turn.  If an operation is executed by the server and the status of
14313	   that operation is NFS4_OK, then the next operation in the COMPOUND
14314	   procedure is executed.  The server continues this process until there
14315	   are no more operations to be executed or one of the operations has a
14316	   status value other than NFS4_OK.

14318	   In the processing of the COMPOUND procedure, the server may find that
14319	   it does not have the available resources to execute any or all of the
14320	   operations within the COMPOUND sequence.  See Section 2.10.4.4 for a
14321	   more detailed discussion.

14323	   The server will generally choose between two methods of decoding the
14324	   client's request.  The first would be the traditional one pass XDR
14325	   decode.  If there is an XDR decoding error in this case, the RPC XDR
14326	   decode error would be returned.  The second method would be to make
14327	   an initial pass to decode the basic COMPOUND request and then to XDR
14328	   decode the individual operations; the most interesting is the decode
14329	   of attributes.  In this case, the server may encounter an XDR decode
14330	   error during the second pass.  In this case, the server would return
14331	   the error NFS4ERR_BADXDR to signify the decode error.

14333	   The COMPOUND arguments contain a "minorversion" field.  For NFSv4.1,
14334	   the value for this field is 1.  If the server receives a COMPOUND
14335	   procedure with a minorversion field value that it does not support,
14336	   the server MUST return an error of NFS4ERR_MINOR_VERS_MISMATCH and a
14337	   zero length resultdata array.

14339	   Contained within the COMPOUND results is a "status" field.  If the
14340	   results array length is non-zero, this status must be equivalent to
14341	   the status of the last operation that was executed within the
14342	   COMPOUND procedure.  Therefore, if an operation incurred an error
14343	   then the "status" value will be the same error value as is being
14344	   returned for the operation that failed.

14346	   Note that operations, 0 (zero) and 1 (one) are not defined for the
14347	   COMPOUND procedure.  Operation 2 is not defined but reserved for
14348	   future definition and use with minor versioning.  If the server
14349	   receives a operation array that contains operation 2 and the
14350	   minorversion field has a value of 0 (zero), an error of
14351	   NFS4ERR_OP_ILLEGAL, as described in the next paragraph, is returned
14352	   to the client.  If an operation array contains an operation 2 and the
14353	   minorversion field is non-zero and the server does not support the
14354	   minor version, the server returns an error of
14355	   NFS4ERR_MINOR_VERS_MISMATCH.  Therefore, the
14356	   NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other
14357	   errors.

14359	   It is possible that the server receives a request that contains an
14360	   operation that is less than the first legal operation (OP_ACCESS) or
14361	   greater than the last legal operation (OP_RELEASE_LOCKOWNER).  In
14362	   this case, the server's response will encode the opcode OP_ILLEGAL
14363	   rather than the illegal opcode of the request.  The status field in
14364	   the ILLEGAL return results will set to NFS4ERR_OP_ILLEGAL.  The
14365	   COMPOUND procedure's return results will also be NFS4ERR_OP_ILLEGAL.

14367	   The definition of the "tag" in the request is left to the
14368	   implementor.  It may be used to summarize the content of the compound
14369	   request for the benefit of packet sniffers and engineers debugging
14370	   implementations.  However, the value of "tag" in the response SHOULD
14371	   be the same value as provided in the request.  This applies to the
14372	   tag field of the CB_COMPOUND procedure as well.

14374	16.2.4.1.  Current File Handle and Stateid

14376	   The COMPOUND procedure offers a simple environment for the execution
14377	   of the operations specified by the client.  The first two relate to
14378	   the file handle while the second two relate to the current stateid.

14380	16.2.4.1.1.  Current File Handle

14382	   The current and saved file handle are used throughout the protocol.
14383	   Most operations implicitly use the current file handle as a argument
14384	   and many set the current file handle as part of the results.  The
14385	   combination of client specified sequences of operations and current
14386	   and saved file handle arguments and results allows for greater
14387	   protocol flexibility.  The best or easiest example of current file
14388	   handle usage is a sequence like the following:

14390	         PUTFH fh1              {fh1}
14391	         LOOKUP "compA"         {fh2}
14392	         GETATTR                {fh2}
14393	         LOOKUP "compB"         {fh3}
14394	         GETATTR                {fh3}
14395	         LOOKUP "compC"         {fh4}
14396	         GETATTR                {fh4}
14397	         GETFH

14399	                                 Figure 75

14401	   In this example, the PUTFH operation explicitly sets the current file
14402	   handle value while the result of each LOOKUP operation sets the
14403	   current file handle value to the resultant file system object.  Also,
14404	   the client is able to insert GETATTR operations using the current
14405	   file handle as an argument.

14407	   Along with the current file handle, there is a saved file handle.
14408	   While the current file handle is set as the result of operations like
14409	   LOOKUP, the saved file handle must be set directly with the use of
14410	   the SAVEFH operation.  The SAVEFH operations copies the current file
14411	   handle value to the saved value.  The saved file handle value is used
14412	   in combination with the current file handle value for the LINK and
14413	   RENAME operations.  The RESTOREFH operation will copy the saved file
14414	   handle value to the current file handle value; as a result, the saved
14415	   file handle value may be used a sort of "scratch" area for the
14416	   client's series of operations.

14418	16.2.4.1.2.  Current Stateid

14420	   With NFSv4.1, additions of a current stateid and a saved stateid have
14421	   been made to the COMPOUND processing environment; this allows for the
14422	   passing of stateids between operations.  There are no changes to the
14423	   syntax of the protocol, only changes to the semantics of a few
14424	   operations.

14426	   A "current stateid" is the stateid that is associated with the
14427	   current file handle.  The current stateid may only be changed by an
14428	   operation that modifies the current file handle or returns a stateid.
14429	   If an operation returns a stateid it MUST set the current stateid to
14430	   the returned value.  If an operation sets the current file handle but
14431	   does not return a stateid, the current stateid MUST be set to the
14432	   all-zeros special stateid.  As an example, PUTFH will change the
14433	   current server state from {ocfh, osid} to {cfh, 0} while LOCK will
14434	   change the current state from {cfh, osid} to {cfh, nsid}.  The SAVEFH
14435	   and RESTOREFH operations will save and restore both the file handle
14436	   and the stateid as a set.

14438	   Any operation which takes as an argument a stateid that is not the
14439	   special all-zeros stateid MUST set the current stateid to the all-
14440	   zeros value before evaluating the operation.  If the argument is the
14441	   special all-zeros stateid, the operation is evaluated using the
14442	   current stateid.

14444	   The following example is the common case of a simple READ operation
14445	   with a supplied stateid showing that the PUTFH initializes the
14446	   current stateid to zero.  The subsequent READ with stateid sid1
14447	   replaces the current stateid before evaluating the operation.

14449	       PUTFH fh1                        - -> {fh1, 0}
14450	       READ sid1,0,1024       {fh1, sid1} -> {fh1, sid1}

14452	                                 Figure 76

14454	   This next example performs an OPEN with the client provided stateid
14455	   sid1 and as a result generates stateid sid2.  The next operation
14456	   specifies the READ with the special all-zero stateid but the current
14457	   stateid set by the previous operation is actually used when the
14458	   operation is evaluated, allowing correct interaction with any
14459	   existing, potentially conflicting, locks.

14461	       PUTFH fh1                        - -> {fh1, 0}
14462	       OPEN R,sid1,"compA"    {fh1, sid1} -> {fh2, sid2}
14463	       READ 0,0,1024          {fh2, sid2} -> {fh2, sid2}
14464	       CLOSE 0                {fh2, sid2} -> {fh2, sid3}

14466	                                 Figure 77

14468	   The final example is similar to the second in how it passes the
14469	   stateid sid2 generated by the LOCK operation to the next READ
14470	   operation.  This allows the client to explicitly surround a single
14471	   I/O operation with a lock and its appropriate stateid to guarantee
14472	   correctness with other client locks.

14474	       PUTFH fh1                        - -> {fh1, 0}
14475	       LOCK W,0,1024,sid1     {fh1, sid1} -> {fh1, sid2}
14476	       READ 0,0,1024          {fh1, sid2} -> {fh1, sid2}
14477	       LOCKU W,0,1024,0       {fh1, sid2} -> {fh1, sid3}

14479	                                 Figure 78

14481	16.2.5.  IMPLEMENTATION

14483	16.2.6.  ERRORS

14485	   All errors defined in the protocol

14487	17.  NFS version 4.1 Operations

14489	17.1.  Operation 3: ACCESS - Check Access Rights

14491	17.1.1.  SYNOPSIS

14493	   (cfh), accessreq -> supported, accessrights

14495	17.1.2.  ARGUMENTS

14497	   /*
14498	    * ACCESS: Check access permission
14499	    */
14500	   const ACCESS4_READ      = 0x00000001;
14501	   const ACCESS4_LOOKUP    = 0x00000002;
14502	   const ACCESS4_MODIFY    = 0x00000004;
14503	   const ACCESS4_EXTEND    = 0x00000008;
14504	   const ACCESS4_DELETE    = 0x00000010;
14505	   const ACCESS4_EXECUTE   = 0x00000020;

14507	   struct ACCESS4args {
14508	           /* CURRENT_FH: object */
14509	           uint32_t        access;
14510	   };

14512	17.1.3.  RESULTS

14514	   struct ACCESS4resok {
14515	           uint32_t        supported;
14516	           uint32_t        access;
14517	   };

14519	   union ACCESS4res switch (nfsstat4 status) {
14520	    case NFS4_OK:
14521	            ACCESS4resok   resok4;
14522	    default:
14523	            void;
14524	   };

14526	17.1.4.  DESCRIPTION

14528	   ACCESS determines the access rights that a user, as identified by the
14529	   credentials in the RPC request, has with respect to the file system
14530	   object specified by the current filehandle.  The client encodes the
14531	   set of access rights that are to be checked in the bit mask "access".
14532	   The server checks the permissions encoded in the bit mask.  If a
14533	   status of NFS4_OK is returned, two bit masks are included in the
14534	   response.  The first, "supported", represents the access rights for
14535	   which the server can verify reliably.  The second, "access",
14536	   represents the access rights available to the user for the filehandle
14537	   provided.  On success, the current filehandle retains its value.

14539	   Note that the supported field will contain only as many values as was
14540	   originally sent in the arguments.  For example, if the client sends
14541	   an ACCESS operation with only the ACCESS4_READ value set and the
14542	   server supports this value, the server will return only ACCESS4_READ
14543	   even if it could have reliably checked other values.

14545	   The results of this operation are necessarily advisory in nature.  A
14546	   return status of NFS4_OK and the appropriate bit set in the bit mask
14547	   does not imply that such access will be allowed to the file system
14548	   object in the future.  This is because access rights can be revoked
14549	   by the server at any time.

14551	   The following access permissions may be requested:

14553	   ACCESS4_READ  Read data from file or read a directory.

14555	   ACCESS4_LOOKUP  Look up a name in a directory (no meaning for non-
14556	      directory objects).

14558	   ACCESS4_MODIFY  Rewrite existing file data or modify existing
14559	      directory entries.

14561	   ACCESS4_EXTEND  Write new data or add directory entries.

14563	   ACCESS4_DELETE  Delete an existing directory entry.

14565	   ACCESS4_EXECUTE  Execute file (no meaning for a directory).

14567	   On success, the current filehandle retains its value.

14569	17.1.5.  IMPLEMENTATION

14571	   In general, it is not sufficient for the client to attempt to deduce
14572	   access permissions by inspecting the uid, gid, and mode fields in the
14573	   file attributes or by attempting to interpret the contents of the ACL
14574	   attribute.  This is because the server may perform uid or gid mapping
14575	   or enforce additional access control restrictions.  It is also
14576	   possible that the server may not be in the same ID space as the
14577	   client.  In these cases (and perhaps others), the client can not
14578	   reliably perform an access check with only current file attributes.

14580	   In the NFS version 2 protocol, the only reliable way to determine
14581	   whether an operation was allowed was to try it and see if it
14582	   succeeded or failed.  Using the ACCESS operation in the NFS version 4
14583	   protocol, the client can ask the server to indicate whether or not
14584	   one or more classes of operations are permitted.  The ACCESS
14585	   operation is provided to allow clients to check before doing a series
14586	   of operations which will result in an access failure.  The OPEN
14587	   operation provides a point where the server can verify access to the
14588	   file object and method to return that information to the client.  The
14589	   ACCESS operation is still useful for directory operations or for use
14590	   in the case the UNIX API "access" is used on the client.

14592	   The information returned by the server in response to an ACCESS call
14593	   is not permanent.  It was correct at the exact time that the server
14594	   performed the checks, but not necessarily afterwards.  The server can
14595	   revoke access permission at any time.

14597	   The client should use the effective credentials of the user to build
14598	   the authentication information in the ACCESS request used to
14599	   determine access rights.  It is the effective user and group
14600	   credentials that are used in subsequent read and write operations.

14602	   Many implementations do not directly support the ACCESS4_DELETE
14603	   permission.  Operating systems like UNIX will ignore the
14604	   ACCESS4_DELETE bit if set on an access request on a non-directory
14605	   object.  In these systems, delete permission on a file is determined
14606	   by the access permissions on the directory in which the file resides,
14607	   instead of being determined by the permissions of the file itself.
14608	   Therefore, the mask returned enumerating which access rights can be
14609	   determined will have the ACCESS4_DELETE value set to 0.  This
14610	   indicates to the client that the server was unable to check that
14611	   particular access right.  The ACCESS4_DELETE bit in the access mask
14612	   returned will then be ignored by the client.

14614	17.2.  Operation 4: CLOSE - Close File

14616	17.2.1.  SYNOPSIS

14618	   (cfh), seqid, open_stateid -> open_stateid

14620	17.2.2.  ARGUMENTS

14622	   /*
14623	    * CLOSE: Close a file and release share reservations
14624	    */
14625	   struct CLOSE4args {
14626	           /* CURRENT_FH: object */
14627	           seqid4          seqid;
14628	           stateid4        open_stateid;
14629	   };

14631	17.2.3.  RESULTS

14633	   union CLOSE4res switch (nfsstat4 status) {
14634	    case NFS4_OK:
14635	            stateid4       open_stateid;
14636	    default:
14637	            void;
14638	   };

14640	17.2.4.  DESCRIPTION

14642	   The CLOSE operation releases share reservations for the regular or
14643	   named attribute file as specified by the current filehandle.  The
14644	   share reservations and other state information released at the server
14645	   as a result of this CLOSE is only that associated with the supplied
14646	   stateid.  State associated with other OPENs is not affected.

14648	   If record locks are held, the client SHOULD release all locks before
14649	   issuing a CLOSE.  The server MAY free all outstanding locks on CLOSE
14650	   but some servers may not support the CLOSE of a file that still has
14651	   record locks held.  The server MUST return failure if any locks would
14652	   exist after the CLOSE.

14654	   The seqid value argument must have the value zero.  If any other
14655	   value is specified the server MUST return the error NFS4ERR_INVAL.

14657	   On success, the current filehandle retains its value.

14659	17.2.5.  IMPLEMENTATION

14661	   Even though CLOSE returns a stateid, this stateid is not useful to
14662	   the client and should be treated as deprecated.  CLOSE "shuts down"
14663	   the state associated with all OPENs for the file by a single
14664	   open_owner.  As noted above, CLOSE will either release all file
14665	   locking state or return an error.  Therefore, the stateid returned by
14666	   CLOSE is not useful for operations that follow.

14668	17.3.  Operation 5: COMMIT - Commit Cached Data

14670	17.3.1.  SYNOPSIS

14672	   (cfh), offset, count -> verifier

14674	17.3.2.  ARGUMENTS

14676	   /*
14677	    * COMMIT: Commit cached data on server to stable storage
14678	    */
14679	   struct COMMIT4args {
14680	           /* CURRENT_FH: file */
14681	           offset4         offset;
14682	           count4          count;
14683	   };

14685	17.3.3.  RESULTS

14687	   struct COMMIT4resok {
14688	           verifier4       writeverf;
14689	   };

14691	   union COMMIT4res switch (nfsstat4 status) {
14692	    case NFS4_OK:
14693	            COMMIT4resok   resok4;
14694	    default:
14695	            void;
14696	   };

14698	17.3.4.  DESCRIPTION

14700	   The COMMIT operation forces or flushes data to stable storage for the
14701	   file specified by the current filehandle.  The flushed data is that
14702	   which was previously written with a WRITE operation which had the
14703	   stable field set to UNSTABLE4.

14705	   The offset specifies the position within the file where the flush is
14706	   to begin.  An offset value of 0 (zero) means to flush data starting
14707	   at the beginning of the file.  The count specifies the number of
14708	   bytes of data to flush.  If count is 0 (zero), a flush from offset to
14709	   the end of the file is done.

14711	   The server returns a write verifier upon successful completion of the
14712	   COMMIT.  The write verifier is used by the client to determine if the
14713	   server has restarted or rebooted between the initial WRITE(s) and the
14714	   COMMIT.  The client does this by comparing the write verifier
14715	   returned from the initial writes and the verifier returned by the
14716	   COMMIT operation.  The server must vary the value of the write
14717	   verifier at each server event or instantiation that may lead to a
14718	   loss of uncommitted data.  Most commonly this occurs when the server
14719	   is rebooted; however, other events at the server may result in
14720	   uncommitted data loss as well.

14722	   On success, the current filehandle retains its value.

14724	17.3.5.  IMPLEMENTATION

14726	   The COMMIT operation is similar in operation and semantics to the
14727	   POSIX fsync(2) system call that synchronizes a file's state with the
14728	   disk (file data and metadata is flushed to disk or stable storage).
14729	   COMMIT performs the same operation for a client, flushing any
14730	   unsynchronized data and metadata on the server to the server's disk
14731	   or stable storage for the specified file.  Like fsync(2), it may be
14732	   that there is some modified data or no modified data to synchronize.
14733	   The data may have been synchronized by the server's normal periodic
14734	   buffer synchronization activity.  COMMIT should return NFS4_OK,
14735	   unless there has been an unexpected error.

14737	   COMMIT differs from fsync(2) in that it is possible for the client to
14738	   flush a range of the file (most likely triggered by a buffer-
14739	   reclamation scheme on the client before file has been completely
14740	   written).

14742	   The server implementation of COMMIT is reasonably simple.  If the
14743	   server receives a full file COMMIT request, that is starting at
14744	   offset 0 and count 0, it should do the equivalent of fsync()'ing the
14745	   file.  Otherwise, it should arrange to have the cached data in the
14746	   range specified by offset and count to be flushed to stable storage.
14747	   In both cases, any metadata associated with the file must be flushed
14748	   to stable storage before returning.  It is not an error for there to
14749	   be nothing to flush on the server.  This means that the data and
14750	   metadata that needed to be flushed have already been flushed or lost
14751	   during the last server failure.

14753	   The client implementation of COMMIT is a little more complex.  There
14754	   are two reasons for wanting to commit a client buffer to stable
14755	   storage.  The first is that the client wants to reuse a buffer.  In
14756	   this case, the offset and count of the buffer are sent to the server
14757	   in the COMMIT request.  The server then flushes any cached data based
14758	   on the offset and count, and flushes any metadata associated with the
14759	   file.  It then returns the status of the flush and the write
14760	   verifier.  The other reason for the client to generate a COMMIT is
14761	   for a full file flush, such as may be done at close.  In this case,
14762	   the client would gather all of the buffers for this file that contain
14763	   uncommitted data, do the COMMIT operation with an offset of 0 and
14764	   count of 0, and then free all of those buffers.  Any other dirty
14765	   buffers would be sent to the server in the normal fashion.

14767	   After a buffer is written by the client with the stable parameter set
14768	   to UNSTABLE4, the buffer must be considered as modified by the client
14769	   until the buffer has either been flushed via a COMMIT operation or
14770	   written via a WRITE operation with stable parameter set to FILE_SYNC4
14771	   or DATA_SYNC4.  This is done to prevent the buffer from being freed
14772	   and reused before the data can be flushed to stable storage on the
14773	   server.

14775	   When a response is returned from either a WRITE or a COMMIT operation
14776	   and it contains a write verifier that is different than previously
14777	   returned by the server, the client will need to retransmit all of the
14778	   buffers containing uncommitted cached data to the server.  How this
14779	   is to be done is up to the implementor.  If there is only one buffer
14780	   of interest, then it should probably be sent back over in a WRITE
14781	   request with the appropriate stable parameter.  If there is more than
14782	   one buffer, it might be worthwhile retransmitting all of the buffers
14783	   in WRITE requests with the stable parameter set to UNSTABLE4 and then
14784	   retransmitting the COMMIT operation to flush all of the data on the
14785	   server to stable storage.  The timing of these retransmissions is
14786	   left to the implementor.

14788	   The above description applies to page-cache-based systems as well as
14789	   buffer-cache-based systems.  In those systems, the virtual memory
14790	   system will need to be modified instead of the buffer cache.

14792	17.4.  Operation 6: CREATE - Create a Non-Regular File Object

14794	17.4.1.  SYNOPSIS

14796	   (cfh), name, type, attrs -> (cfh), change_info, attrs_set

14798	17.4.2.  ARGUMENTS

14800	   /*
14801	    * CREATE: Create a non-regular file
14802	    */
14803	   union createtype4 switch (nfs_ftype4 type) {
14804	    case NF4LNK:
14805	            linktext4      linkdata;
14806	    case NF4BLK:
14807	    case NF4CHR:
14808	            specdata4      devdata;
14809	    case NF4SOCK:
14810	    case NF4FIFO:
14811	    case NF4DIR:
14812	            void;
14813	    default:
14814	            void;          /* server should return NFS4ERR_BADTYPE */
14815	   };

14817	   struct CREATE4args {
14818	           /* CURRENT_FH: directory for creation */
14819	           createtype4     objtype;
14820	           component4      objname;
14821	           fattr4          createattrs;
14822	   };

14824	17.4.3.  RESULTS

14826	   struct CREATE4resok {
14827	           change_info4    cinfo;
14828	           bitmap4         attrset;        /* attributes set */
14829	   };

14831	   union CREATE4res switch (nfsstat4 status) {
14832	    case NFS4_OK:
14833	            CREATE4resok resok4;
14834	    default:
14835	            void;
14836	   };

14838	17.4.4.  DESCRIPTION

14840	   The CREATE operation creates a non-regular file object in a directory
14841	   with a given name.  The OPEN operation MUST be used to create a
14842	   regular file.

14844	   The objname specifies the name for the new object.  The objtype
14845	   determines the type of object to be created: directory, symlink, etc.

14847	   If an object of the same name already exists in the directory, the
14848	   server will return the error NFS4ERR_EXIST.

14850	   For the directory where the new file object was created, the server
14851	   returns change_info4 information in cinfo.  With the atomic field of
14852	   the change_info4 struct, the server will indicate if the before and
14853	   after change attributes were obtained atomically with respect to the
14854	   file object creation.

14856	   If the objname has a length of 0 (zero), or if objname does not obey
14857	   the UTF-8 definition, the error NFS4ERR_INVAL will be returned.

14859	   The current filehandle is replaced by that of the new object.

14861	   The createattrs specifies the initial set of attributes for the
14862	   object.  The set of attributes may include any writable attribute
14863	   valid for the object type.  When the operation is successful, the
14864	   server will return to the client an attribute mask signifying which
14865	   attributes were successfully set for the object.

14867	   If createattrs includes neither the owner attribute nor an ACL with
14868	   an ACE for the owner, and if the server's file system both supports
14869	   and requires an owner attribute (or an owner ACE) then the server
14870	   MUST derive the owner (or the owner ACE).  This would typically be
14871	   from the principal indicated in the RPC credentials of the call, but
14872	   the server's operating environment or file system semantics may
14873	   dictate other methods of derivation.  Similarly, if createattrs
14874	   includes neither the group attribute nor a group ACE, and if the
14875	   server's file system both supports and requires the notion of a group
14876	   attribute (or group ACE), the server MUST derive the group attribute
14877	   (or the corresponding owner ACE) for the file.  This could be from
14878	   the RPC call's credentials, such as the group principal if the
14879	   credentials include it (such as with AUTH_SYS), from the group
14880	   identifier associated with the principal in the credentials (for
14881	   e.g., POSIX systems have a passwd database that has the group
14882	   identifier for every user identifier), inherited from directory the
14883	   object is created in, or whatever else the server's operating
14884	   environment or file system semantics dictate.  This applies to the
14885	   OPEN operation too.

14887	   Conversely, it is possible the client will specify in createattrs an
14888	   owner attribute or group attribute or ACL that the principal
14889	   indicated the RPC call's credentials does not have permissions to
14890	   create files for.  The error to be returned in this instance is
14891	   NFS4ERR_PERM.  This applies to the OPEN operation too.

14893	17.4.5.  IMPLEMENTATION

14895	   If the client desires to set attribute values after the create, a
14896	   SETATTR operation can be added to the COMPOUND request so that the
14897	   appropriate attributes will be set.

14899	17.5.  Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery

14901	17.5.1.  SYNOPSIS

14903	   client ID ->

14905	17.5.2.  ARGUMENTS

14907	   /*
14908	    * DELEGPURGE: Purge Delegations Awaiting Recovery
14909	    */
14910	   struct DELEGPURGE4args {
14911	           clientid4       clientid;
14912	   };

14914	17.5.3.  RESULTS

14916	   struct DELEGPURGE4res {
14917	           nfsstat4        status;
14918	   };

14920	17.5.4.  DESCRIPTION

14922	   Purges all of the delegations awaiting recovery for a given client.
14923	   This is useful for clients which do not commit delegation information
14924	   to stable storage to indicate that conflicting requests need not be
14925	   delayed by the server awaiting recovery of delegation information.

14927	   This operation should be used by clients that record delegation
14928	   information on stable storage on the client.  In this case,
14929	   DELEGPURGE should be issued immediately after doing delegation
14930	   recovery on all delegations known to the client.  Doing so will
14931	   notify the server that no additional delegations for the client will
14932	   be recovered allowing it to free resources, and avoid delaying other
14933	   clients who make requests that conflict with the unrecovered
14934	   delegations.  The set of delegations known to the server and the
14935	   client may be different.  The reason for this is that a client may
14936	   fail after making a request which resulted in delegation but before
14937	   it received the results and committed them to the client's stable
14938	   storage.

14940	   The server MAY support DELEGPURGE, but if it does not, it MUST NOT
14941	   support CLAIM_DELEGATE_PREV.

14943	17.6.  Operation 8: DELEGRETURN - Return Delegation

14945	17.6.1.  SYNOPSIS

14947	   (cfh), stateid ->

14949	17.6.2.  ARGUMENTS

14951	   /*
14952	    * DELEGRETURN: Return a delegation
14953	    */
14954	   struct DELEGRETURN4args {
14955	           /* CURRENT_FH: delegated file */
14956	           stateid4        deleg_stateid;
14957	   };

14959	17.6.3.  RESULTS

14961	   struct DELEGRETURN4res {
14962	           nfsstat4        status;
14963	   };

14965	17.6.4.  DESCRIPTION

14967	   Returns the delegation represented by the current filehandle and
14968	   stateid.

14970	   Delegations may be returned when recalled or voluntarily (i.e. before
14971	   the server has recalled them).  In either case the client must
14972	   properly propagate state changed under the context of the delegation
14973	   to the server before returning the delegation.

14975	17.7.  Operation 9: GETATTR - Get Attributes

14977	17.7.1.  SYNOPSIS

14979	   (cfh), attrbits -> attrbits, attrvals

14981	17.7.2.  ARGUMENTS

14983	   /*
14984	    * GETATTR: Get file attributes
14985	    */
14986	   struct GETATTR4args {
14987	           /* CURRENT_FH: directory or file */
14988	           bitmap4         attr_request;
14989	   };

14991	17.7.3.  RESULTS

14993	   struct GETATTR4resok {
14994	           fattr4          obj_attributes;
14995	   };

14997	   union GETATTR4res switch (nfsstat4 status) {
14998	    case NFS4_OK:
14999	            GETATTR4resok  resok4;
15000	    default:
15001	            void;
15002	   };

15004	17.7.4.  DESCRIPTION

15006	   The GETATTR operation will obtain attributes for the file system
15007	   object specified by the current filehandle.  The client sets a bit in
15008	   the bitmap argument for each attribute value that it would like the
15009	   server to return.  The server returns an attribute bitmap that
15010	   indicates the attribute values which it was able to return, which
15011	   will include all attributes requested by the client which are
15012	   attributes supported by the server for the target file system.  This
15013	   bitmap is followed by the attribute values ordered lowest attribute
15014	   number first.

15016	   The server must return a value for each attribute that the client
15017	   requests if the attribute is supported by the server for the target
15018	   file system.  If the server does not support a particular attribute
15019	   on the target file system then it must not return the attribute value
15020	   and must not set the attribute bit in the result bitmap.  The server
15021	   must return an error if it supports an attribute on the target but
15022	   cannot obtain its value.  In that case, no attribute values will be
15023	   returned.

15025	   File systems which are absent should be treated as having support for
15026	   a very small set of attributes as described in GETATTR Within an
15027	   Absent File System (Section 5), even if previously, when the file
15028	   system was present, more attributes were supported.

15030	   All servers must support the mandatory attributes as specified in
15031	   File Attributes (Section 10.3.1), for all file systems, with the
15032	   exception of absent file systems.

15034	   On success, the current filehandle retains its value.

15036	17.7.5.  IMPLEMENTATION

15038	17.8.  Operation 10: GETFH - Get Current Filehandle

15040	17.8.1.  SYNOPSIS

15042	   (cfh) -> filehandle

15044	17.8.2.  ARGUMENTS

15046	   /* CURRENT_FH: */
15047	   void;

15049	17.8.3.  RESULTS

15051	   /*
15052	    * GETFH: Get current filehandle
15053	    */
15054	   struct GETFH4resok {
15055	           nfs_fh4         object;
15056	   };

15058	   union GETFH4res switch (nfsstat4 status) {
15059	    case NFS4_OK:
15060	           GETFH4resok     resok4;
15061	    default:
15062	           void;
15063	   };

15065	17.8.4.  DESCRIPTION

15067	   This operation returns the current filehandle value.

15069	   On success, the current filehandle retains its value.

15071	17.8.5.  IMPLEMENTATION

15073	   Operations that change the current filehandle like LOOKUP or CREATE
15074	   do not automatically return the new filehandle as a result.  For
15075	   instance, if a client needs to lookup a directory entry and obtain
15076	   its filehandle then the following request is needed.

15078	      PUTFH (directory filehandle)

15080	      LOOKUP (entry name)

15082	      GETFH

15084	17.9.  Operation 11: LINK - Create Link to a File

15086	17.9.1.  SYNOPSIS

15088	   (sfh), (cfh), newname -> (cfh), change_info

15090	17.9.2.  ARGUMENTS

15092	   /*
15093	    * LINK: Create link to an object
15094	    */
15095	   struct LINK4args {
15096	           /* SAVED_FH: source object */
15097	           /* CURRENT_FH: target directory */
15098	           component4      newname;
15099	   };

15101	17.9.3.  RESULTS

15103	   struct LINK4resok {
15104	           change_info4    cinfo;
15105	   };

15107	   union LINK4res switch (nfsstat4 status) {
15108	    case NFS4_OK:
15109	            LINK4resok resok4;
15110	    default:
15111	            void;
15112	   };

15114	17.9.4.  DESCRIPTION

15116	   The LINK operation creates an additional newname for the file
15117	   represented by the saved filehandle, as set by the SAVEFH operation,
15118	   in the directory represented by the current filehandle.  The existing
15119	   file and the target directory must reside within the same file system
15120	   on the server.  On success, the current filehandle will continue to
15121	   be the target directory.  If an object exists in the target directory
15122	   with the same name as newname, the server must return NFS4ERR_EXIST.

15124	   For the target directory, the server returns change_info4 information
15125	   in cinfo.  With the atomic field of the change_info4 struct, the
15126	   server will indicate if the before and after change attributes were
15127	   obtained atomically with respect to the link creation.

15129	   If the newname has a length of 0 (zero), or if newname does not obey
15130	   the UTF-8 definition, the error NFS4ERR_INVAL will be returned.

15132	17.9.5.  IMPLEMENTATION

15134	   Changes to any property of the "hard" linked files are reflected in
15135	   all of the linked files.  When a link is made to a file, the
15136	   attributes for the file should have a value for numlinks that is one
15137	   greater than the value before the LINK operation.

15139	   The statement "file and the target directory must reside within the
15140	   same file system on the server" means that the fsid fields in the
15141	   attributes for the objects are the same.  If they reside on different
15142	   file systems, the error, NFS4ERR_XDEV, is returned.  On some servers,
15143	   the filenames, "." and "..", are illegal as newname.

15145	   In the case that newname is already linked to the file represented by
15146	   the saved filehandle, the server will return NFS4ERR_EXIST.

15148	   Note that symbolic links are created with the CREATE operation.

15150	17.10.  Operation 12: LOCK - Create Lock

15152	17.10.1.  SYNOPSIS

15154	   (cfh) locktype, reclaim, offset, length, locker -> stateid

15156	17.10.2.  ARGUMENTS

15158	   /*
15159	    * For LOCK, transition from open_stateid and lock_owner
15160	    * to a lock stateid.
15161	    */
15162	   struct open_to_lock_owner4 {
15163	           seqid4          open_seqid;
15164	           stateid4        open_stateid;
15165	           seqid4          lock_seqid;
15166	           lock_owner4     lock_owner;
15167	   };

15169	   /*
15170	    * For LOCK, existing lock stateid  continues to request new
15171	    * file lock for the same lock_owner and open_stateid.
15172	    */
15173	   struct exist_lock_owner4 {
15174	           stateid4        lock_stateid;
15175	           seqid4          lock_seqid;
15176	   };

15178	   union locker4 switch (bool new_lock_owner) {
15179	    case TRUE:
15180	           open_to_lock_owner4     open_owner;
15181	    case FALSE:
15182	           exist_lock_owner4       lock_owner;
15183	   };

15185	   /*
15186	    * LOCK/LOCKT/LOCKU: Record lock management
15187	    */
15188	   struct LOCK4args {
15189	           /* CURRENT_FH: file */
15190	           nfs_lock_type4  locktype;
15191	           bool            reclaim;
15192	           offset4         offset;
15193	           length4         length;
15194	           locker4         locker;
15195	   };

15197	17.10.3.  RESULTS

15199	   struct LOCK4denied {
15200	           offset4         offset;
15201	           length4         length;
15202	           nfs_lock_type4  locktype;
15203	           lock_owner4     owner;
15204	   };

15206	   struct LOCK4resok {
15207	           stateid4        lock_stateid;
15208	   };

15210	   union LOCK4res switch (nfsstat4 status) {
15211	    case NFS4_OK:
15212	            LOCK4resok     resok4;
15213	    case NFS4ERR_DENIED:
15214	            LOCK4denied    denied;
15215	    default:
15216	            void;
15217	   };

15219	17.10.4.  DESCRIPTION

15221	   The LOCK operation requests a record lock for the octet range
15222	   specified by the offset and length parameters.  The lock type is also
15223	   specified to be one of the nfs_lock_type4s.  If this is a reclaim
15224	   request, the reclaim parameter will be TRUE;

15226	   Bytes in a file may be locked even if those octets are not currently
15227	   allocated to the file.  To lock the file from a specific offset
15228	   through the end-of-file (no matter how long the file actually is) use
15229	   a length field with all bits set to 1 (one).  If the length is zero,
15230	   or if a length which is not all bits set to one is specified, and
15231	   length when added to the offset exceeds the maximum 64-bit unsigned
15232	   integer value, the error NFS4ERR_INVAL will result.

15234	   Some servers may only support locking for octet offsets that fit
15235	   within 32 bits.  If the client specifies a range that includes an
15236	   octet beyond the last octet offset of the 32-bit range, but does not
15237	   include the last octet offset of the 32-bit and all of the octet
15238	   offsets beyond it, up to the end of the valid 64-bit range, such a
15239	   32-bit server MUST return the error NFS4ERR_BAD_RANGE.

15241	   In the case that the lock is denied, the owner, offset, and length of
15242	   a conflicting lock are returned.

15244	   The locker argument specifies the lock_owner that is associated with
15245	   the LOCK request.  The locker4 structure is a switched union that
15246	   indicates whether the client has already created record locking state
15247	   associated with the current open file and lock owner.  In the case in
15248	   which it has, the argument is just a stateid for the set of locks
15249	   associated with that open file and lock owner, together with a
15250	   lock_seqid value which must be zero.  In the case where no such state
15251	   has been established, or the client does not have the stateid
15252	   available, the argument contains the stateid of the open file with
15253	   which this lock is to be associated, together with the lock_owner
15254	   which which the lock is to be associated.  The open_to_lock_owner
15255	   case covers the very first lock done by a lock_owner for a given open
15256	   file and offers a method to use the established state of the
15257	   open_stateid to transition to the use of a lock stateid.

15259	   The client field of the lock owner, and all seqid values in the
15260	   arguments have zero as the only valid value.  When any of these are
15261	   specified as other than zero, the server MUST return an
15262	   NFS4ERR_INVAL.  The client ID with which all owners and stateids are
15263	   associated is the client ID associated with the session on which the
15264	   request was issued.  The client ID appearing in a LOCK4denied
15265	   structure is the actual client associated with the conflicting lock,
15266	   whether this is the client ID associated with the current session, or
15267	   a different one.

15269	   On success, the current filehandle retains its value.

15271	17.10.5.  IMPLEMENTATION

15273	   If the server is unable to determine the exact offset and length of
15274	   the conflicting lock, the same offset and length that were provided
15275	   in the arguments should be returned in the denied results.  The File
15276	   Locking section contains a full description of this and the other
15277	   file locking operations.

15279	   LOCK operations are subject to permission checks and to checks
15280	   against the access type of the associated file.  However, the
15281	   specific right and modes required for various type of locks, reflect
15282	   the semantics of the server-exported file system, and are not
15283	   specified by the protocol.  For example, Windows 2000 allows a write
15284	   lock of a file open for READ, while a POSIX-compliant system does
15285	   not.

15287	   When the client makes a lock request that corresponds to a range that
15288	   the lockowner has locked already (with the same or different lock
15289	   type), or to a sub-region of such a range, or to a region which
15290	   includes multiple locks already granted to that lockowner, in whole
15291	   or in part, and the server does not support such locking operations
15292	   (i.e. does not support POSIX locking semantics), the server will
15293	   return the error NFS4ERR_LOCK_RANGE.  In that case, the client may
15294	   return an error, or it may emulate the required operations, using
15295	   only LOCK for ranges that do not include any octets already locked by
15296	   that lock_owner and LOCKU of locks held by that lock_owner
15297	   (specifying an exactly-matching range and type).  Similarly, when the
15298	   client makes a lock request that amounts to upgrading (changing from
15299	   a read lock to a write lock) or downgrading (changing from write lock
15300	   to a read lock) an existing record lock, and the server does not
15301	   support such a lock, the server will return NFS4ERR_LOCK_NOTSUPP.
15302	   Such operations may not perfectly reflect the required semantics in
15303	   the face of conflicting lock requests from other clients.

15305	17.11.  Operation 13: LOCKT - Test For Lock

15307	17.11.1.  SYNOPSIS

15309	   (cfh) locktype, offset, length owner -> {void, NFS4ERR_DENIED ->
15310	   owner}

15312	17.11.2.  ARGUMENTS

15314	   struct LOCKT4args {
15315	           /* CURRENT_FH: file */
15316	           nfs_lock_type4  locktype;
15317	           offset4         offset;
15318	           length4         length;
15319	           lock_owner4     owner;
15320	   };

15322	17.11.3.  RESULTS

15324	   union LOCKT4res switch (nfsstat4 status) {
15325	    case NFS4ERR_DENIED:
15326	            LOCK4denied    denied;
15327	    case NFS4_OK:
15328	            void;
15329	    default:
15330	            void;
15331	   };

15333	17.11.4.  DESCRIPTION

15335	   The LOCKT operation tests the lock as specified in the arguments.  If
15336	   a conflicting lock exists, the owner, offset, length, and type of the
15337	   conflicting lock are returned.  The owner field in the results
15338	   includes the client ID of the owner of conflicting lock, whether this
15339	   is the client ID associated with the current session or a different
15340	   client ID.  If no lock is held, nothing other than NFS4_OK is
15341	   returned.  Lock types READ_LT and READW_LT are processed in the same
15342	   way in that a conflicting lock test is done without regard to
15343	   blocking or non-blocking.  The same is true for WRITE_LT and
15344	   WRITEW_LT.

15346	   The ranges are specified as for LOCK.  The NFS4ERR_INVAL and
15347	   NFS4ERR_BAD_RANGE errors are returned under the same circumstances as
15348	   for LOCK.

15350	   The client ID field of the owner should be specified as zero.  The
15351	   client ID used for ownership comparisons is that associated with the
15352	   session on which the request is issued.  If the client ID field is
15353	   other than zero, the server MUST return the error NFS4ERR_INVAL.

15355	   On success, the current filehandle retains its value.

15357	17.11.5.  IMPLEMENTATION

15359	   If the server is unable to determine the exact offset and length of
15360	   the conflicting lock, the same offset and length that were provided
15361	   in the arguments should be returned in the denied results.  The File
15362	   Locking section contains further discussion of the file locking
15363	   mechanisms.

15365	   LOCKT uses a lock_owner4 rather a stateid4, as is used in LOCK to
15366	   identify the owner.  This is because the client does not have to open
15367	   the file to test for the existence of a lock, so a stateid may not be
15368	   available.

15370	   The test for conflicting locks should exclude locks for the current
15371	   lockowner.  Note that since such locks are not examined the possible
15372	   existence of overlapping ranges may not affect the results of LOCKT.
15373	   If the server does examine locks that match the lockowner for the
15374	   purpose of range checking, NFS4ERR_LOCK_RANGE may be returned..  In
15375	   the event that it returns NFS4_OK, clients may do a LOCK and receive
15376	   NFS4ERR_LOCK_RANGE on the LOCK request because of the flexibility
15377	   provided to the server.

15379	17.12.  Operation 14: LOCKU - Unlock File

15381	17.12.1.  SYNOPSIS

15383	   (cfh) type, seqid, stateid, offset, length -> stateid

15385	17.12.2.  ARGUMENTS

15387	   struct LOCKU4args {
15388	           /* CURRENT_FH: file */
15389	           nfs_lock_type4  locktype;
15390	           seqid4          seqid;
15391	           stateid4        lock_stateid;
15392	           offset4         offset;
15393	           length4         length;
15394	   };

15396	17.12.3.  RESULTS

15398	   union LOCKU4res switch (nfsstat4 status) {
15399	    case   NFS4_OK:
15400	            stateid4       lock_stateid;
15401	    default:
15402	            void;
15403	   };

15405	17.12.4.  DESCRIPTION

15407	   The LOCKU operation unlocks the record lock specified by the
15408	   parameters.  The client may set the locktype field to any value that
15409	   is legal for the nfs_lock_type4 enumerated type, and the server MUST
15410	   accept any legal value for locktype.  Any legal value for locktype
15411	   has no effect on the success or failure of the LOCKU operation.

15413	   The ranges are specified as for LOCK.  The NFS4ERR_INVAL and
15414	   NFS4ERR_BAD_RANGE errors are returned under the same circumstances as
15415	   for LOCK.

15417	   The seqid parameter should be specified as zero.  If any other value
15418	   is specified, the server must return an NFS4ERR_INVAL error.

15420	   On success, the current filehandle retains its value.

15422	17.12.5.  IMPLEMENTATION

15424	   If the area to be unlocked does not correspond exactly to a lock
15425	   actually held by the lockowner the server may return the error
15426	   NFS4ERR_LOCK_RANGE.  This includes the case in which the area is not
15427	   locked, where the area is a sub-range of the area locked, where it
15428	   overlaps the area locked without matching exactly or the area
15429	   specified includes multiple locks held by the lockowner.  In all of
15430	   these cases, allowed by POSIX locking semantics, a client receiving
15431	   this error, should if it desires support for such operations,
15432	   simulate the operation using LOCKU on ranges corresponding to locks
15433	   it actually holds, possibly followed by LOCK requests for the sub-
15434	   ranges not being unlocked.

15436	17.13.  Operation 15: LOOKUP - Lookup Filename

15438	17.13.1.  SYNOPSIS

15440	   (cfh), component -> (cfh)

15442	17.13.2.  ARGUMENTS

15444	   /*
15445	    * LOOKUP: Lookup filename
15446	    */
15447	   struct LOOKUP4args {
15448	           /* CURRENT_FH: directory */
15449	           component4      objname;
15450	   };

15452	17.13.3.  RESULTS

15454	   struct LOOKUP4res {
15455	           /* CURRENT_FH: object */
15456	           nfsstat4        status;
15457	   };

15459	17.13.4.  DESCRIPTION

15461	   This operation LOOKUPs or finds a file system object using the
15462	   directory specified by the current filehandle.  LOOKUP evaluates the
15463	   component and if the object exists the current filehandle is replaced
15464	   with the component's filehandle.

15466	   If the component cannot be evaluated either because it does not exist
15467	   or because the client does not have permission to evaluate the
15468	   component, then an error will be returned and the current filehandle
15469	   will be unchanged.

15471	   If the component is a zero length string or if any component does not
15472	   obey the UTF-8 definition, the error NFS4ERR_INVAL will be returned.

15474	17.13.5.  IMPLEMENTATION

15476	   If the client wants to achieve the effect of a multi-component
15477	   lookup, it may construct a COMPOUND request such as (and obtain each
15478	   filehandle):

15480	         PUTFH  (directory filehandle)
15481	         LOOKUP "pub"
15482	         GETFH
15483	         LOOKUP "foo"
15484	         GETFH
15485	         LOOKUP "bar"
15486	         GETFH

15488	   NFS version 4 servers depart from the semantics of previous NFS
15489	   versions in allowing LOOKUP requests to cross mountpoints on the
15490	   server.  The client can detect a mountpoint crossing by comparing the
15491	   fsid attribute of the directory with the fsid attribute of the
15492	   directory looked up.  If the fsids are different then the new
15493	   directory is a server mountpoint.  UNIX clients that detect a
15494	   mountpoint crossing will need to mount the server's file system.
15495	   This needs to be done to maintain the file object identity checking
15496	   mechanisms common to UNIX clients.

15498	   Servers that limit NFS access to "shares" or "exported" file systems
15499	   should provide a pseudo file system into which the exported file
15500	   systems can be integrated, so that clients can browse the server's
15501	   name space.  The clients view of a pseudo file system will be limited
15502	   to paths that lead to exported file systems.

15504	   Note: previous versions of the protocol assigned special semantics to
15505	   the names "." and "..".  NFS version 4 assigns no special semantics
15506	   to these names.  The LOOKUPP operator must be used to lookup a parent
15507	   directory.

15509	   Note that this operation does not follow symbolic links.  The client
15510	   is responsible for all parsing of filenames including filenames that
15511	   are modified by symbolic links encountered during the lookup process.

15513	   If the current filehandle supplied is not a directory but a symbolic
15514	   link, the error NFS4ERR_SYMLINK is returned as the error.  For all
15515	   other non-directory file types, the error NFS4ERR_NOTDIR is returned.

15517	17.14.  Operation 16: LOOKUPP - Lookup Parent Directory
15518	17.14.1.  SYNOPSIS

15520	   (cfh) -> (cfh)

15522	17.14.2.  ARGUMENTS

15524	   /* CURRENT_FH: object */
15525	   void;

15527	17.14.3.  RESULTS

15529	   /*
15530	    * LOOKUPP: Lookup parent directory
15531	    */
15532	   struct LOOKUPP4res {
15533	           /* CURRENT_FH: directory */
15534	           nfsstat4        status;
15535	   };

15537	17.14.4.  DESCRIPTION

15539	   The current filehandle is assumed to refer to a regular directory or
15540	   a named attribute directory.  LOOKUPP assigns the filehandle for its
15541	   parent directory to be the current filehandle.  If there is no parent
15542	   directory an NFS4ERR_NOENT error must be returned.  Therefore,
15543	   NFS4ERR_NOENT will be returned by the server when the current
15544	   filehandle is at the root or top of the server's file tree.

15546	   As for LOOKUP, LOOKUPP will also cross mountpoints.

15548	   If the current filehandle is not a directory or named attribute
15549	   directory, the error NFS4ERR_NOTDIR is returned.

15551	   If the requester's security flavor does not match that configured for
15552	   the parent directory, then the server SHOULD return NFS4ERR_WRONGSEC
15553	   (a future minor revision of NFSv4 may upgrade this to MUST) in the
15554	   LOOKUPP response.  However, if the server does so, it MUST support
15555	   the new SECINFO_NO_NAME operation, so that the client can gracefully
15556	   determine the correct security flavor.  See the discussion of the
15557	   SECINFO_NO_NAME operation for a description.

15559	   If the current filehandle is a named attribute directory that is
15560	   associated with a file system object via OPENATTR (i.e. not a sub-
15561	   directory of a named attribute directory) LOOKUPP SHOULD return the
15562	   filehandle of the associated file system object.

15564	17.14.5.  IMPLEMENTATION

15566	17.15.  Operation 17: NVERIFY - Verify Difference in Attributes

15568	17.15.1.  SYNOPSIS

15570	   (cfh), fattr -> -

15572	17.15.2.  ARGUMENTS

15574	   /*
15575	    * NVERIFY: Verify attributes different
15576	    */
15577	   struct NVERIFY4args {
15578	           /* CURRENT_FH: object */
15579	           fattr4          obj_attributes;
15580	   };

15582	17.15.3.  RESULTS

15584	   struct NVERIFY4res {
15585	           nfsstat4        status;
15586	   };

15588	17.15.4.  DESCRIPTION

15590	   This operation is used to prefix a sequence of operations to be
15591	   performed if one or more attributes have changed on some file system
15592	   object.  If all the attributes match then the error NFS4ERR_SAME must
15593	   be returned.

15595	   On success, the current filehandle retains its value.

15597	17.15.5.  IMPLEMENTATION

15599	   This operation is useful as a cache validation operator.  If the
15600	   object to which the attributes belong has changed then the following
15601	   operations may obtain new data associated with that object.  For
15602	   instance, to check if a file has been changed and obtain new data if
15603	   it has:

15605	         PUTFH  (public)
15606	         LOOKUP "foobar"
15607	         NVERIFY attrbits attrs
15608	         READ 0 32767

15610	   In the case that a recommended attribute is specified in the NVERIFY
15611	   operation and the server does not support that attribute for the file
15612	   system object, the error NFS4ERR_ATTRNOTSUPP is returned to the
15613	   client.

15615	   When the attribute rdattr_error or any write-only attribute (e.g.
15616	   time_modify_set) is specified, the error NFS4ERR_INVAL is returned to
15617	   the client.

15619	17.16.  Operation 18: OPEN - Open a Regular File

15621	17.16.1.  SYNOPSIS

15623	   <cfh>, share_access, share_deny, owner, openhow, claim
15624	   -> (cfh), stateid, cinfo, rflags, attrset, delegation

15626	17.16.2.  ARGUMENTS

15628	  /*
15629	   * Various definitions for OPEN
15630	   */
15631	  enum createmode4 {
15632	          UNCHECKED4      = 0,
15633	          GUARDED4        = 1,
15634	          EXCLUSIVE4      = 2
15635	  };

15637	  union createhow4 switch (createmode4 mode) {
15638	   case UNCHECKED4:
15639	   case GUARDED4:
15640	           fattr4         createattrs;
15641	   case EXCLUSIVE4:
15642	           verifier4      createverf;
15643	  };

15645	  enum opentype4 {
15646	          OPEN4_NOCREATE  = 0,
15647	          OPEN4_CREATE    = 1
15648	  };

15650	  union openflag4 switch (opentype4 opentype) {
15651	   case OPEN4_CREATE:
15652	           createhow4     how;
15653	   default:
15654	           void;
15655	  };

15657	  /* Next definitions used for OPEN delegation */
15658	  enum limit_by4 {
15659	          NFS_LIMIT_SIZE          = 1,
15660	          NFS_LIMIT_BLOCKS        = 2
15661	          /* others as needed */
15662	  };

15664	  struct nfs_modified_limit4 {
15665	          uint32_t        num_blocks;
15666	          uint32_t        bytes_per_block;
15667	  };

15669	  union nfs_space_limit4 switch (limit_by4 limitby) {
15670	   /* limit specified as file size */
15671	   case NFS_LIMIT_SIZE:
15672	           uint64_t               filesize;
15673	   /* limit specified by number of blocks */
15674	   case NFS_LIMIT_BLOCKS:
15675	           nfs_modified_limit4    mod_blocks;
15676	  } ;

15678	  /*
15679	   * Share Access and Deny constants for open argument
15680	   */
15681	  const OPEN4_SHARE_ACCESS_READ   = 0x00000001;
15682	  const OPEN4_SHARE_ACCESS_WRITE  = 0x00000002;
15683	  const OPEN4_SHARE_ACCESS_BOTH   = 0x00000003;

15685	  const OPEN4_SHARE_DENY_NONE     = 0x00000000;
15686	  const OPEN4_SHARE_DENY_READ     = 0x00000001;
15687	  const OPEN4_SHARE_DENY_WRITE    = 0x00000002;
15688	  const OPEN4_SHARE_DENY_BOTH     = 0x00000003;

15690	  /* new flags for share_access field of OPEN4args */
15691	  const OPEN4_SHARE_ACCESS_WANT_DELEG_MASK        = 0xFF00;
15692	  const OPEN4_SHARE_ACCESS_WANT_NO_PREFERENCE     = 0x0000;
15693	  const OPEN4_SHARE_ACCESS_WANT_READ_DELEG        = 0x0100;
15694	  const OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG       = 0x0200;
15695	  const OPEN4_SHARE_ACCESS_WANT_ANY_DELEG         = 0x0300;
15696	  const OPEN4_SHARE_ACCESS_WANT_NO_DELEG          = 0x0400;
15697	  const OPEN4_SHARE_ACCESS_WANT_CANCEL            = 0x0500;

15699	  const OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL = 0x10000;
15700	  const OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED = 0x20000;

15702	  enum open_delegation_type4 {
15703	          OPEN_DELEGATE_NONE      = 0,
15704	          OPEN_DELEGATE_READ      = 1,
15705	          OPEN_DELEGATE_WRITE     = 2,
15706	          OPEN_DELEGATE_NONE_EXT  = 3 /* new to v4.1 */
15707	  };

15709	  enum open_claim_type4 {
15710	          CLAIM_NULL              = 0,
15711	          CLAIM_PREVIOUS          = 1,
15712	          CLAIM_DELEGATE_CUR      = 2,
15713	          CLAIM_DELEGATE_PREV     = 3,

15715	          /*
15716	           * Like CLAIM_NULL, but object identified
15717	           * by the current filehandle.
15718	           */
15719	          CLAIM_FH                = 4, /* new to v4.1 */

15721	          /*
15722	           * Like CLAIM_DELEGATE_CUR, but object identified
15723	           * by current filehandle.
15724	           */
15725	          CLAIM_DELEG_CUR_FH      = 5, /* new to v4.1 */

15727	          /*
15728	           * Like CLAIM_DELEGATE_PREV, but object identified
15729	           * by current filehandle.
15730	           */
15731	          CLAIM_DELEG_PREV_FH     = 6 /* new to v4.1 */
15732	  };

15734	  struct open_claim_delegate_cur4 {
15735	          stateid4        delegate_stateid;
15736	          component4      file;
15737	  };

15739	  union open_claim4 switch (open_claim_type4 claim) {
15740	   /*
15741	    * No special rights to file. Ordinary OPEN of the specified file.
15742	    */
15743	   case CLAIM_NULL:
15744	          /* CURRENT_FH: directory */
15745	          component4      file;

15747	   /*
15748	    * Right to the file established by an open previous to server
15749	    * reboot.  File identified by filehandle obtained at that time
15750	    * rather than by name.
15751	    */
15752	   case CLAIM_PREVIOUS:
15753	          /* CURRENT_FH: file being reclaimed */
15754	          open_delegation_type4   delegate_type;

15756	   /*
15757	    * Right to file based on a delegation granted by the server.
15758	    * File is specified by name.
15759	    */
15760	   case CLAIM_DELEGATE_CUR:
15761	          /* CURRENT_FH: directory */
15762	          open_claim_delegate_cur4        delegate_cur_info;

15764	   /* Right to file based on a delegation granted to a previous boot
15765	    * instance of the client.  File is specified by name.
15766	    */
15767	   case CLAIM_DELEGATE_PREV:
15768	           /* CURRENT_FH: directory */
15769	          component4      file_delegate_prev;
15770	  };

15772	  /*
15773	   * OPEN: Open a file, potentially receiving an open delegation
15774	   */
15775	  struct OPEN4args {
15776	          seqid4          seqid;
15777	          uint32_t        share_access;
15778	          uint32_t        share_deny;
15779	          open_owner4     owner;
15780	          openflag4       openhow;
15781	          open_claim4     claim;
15782	  };

15784	17.16.3.  RESULTS

15786	   struct open_read_delegation4 {
15787	           stateid4        stateid;        /* Stateid for delegation*/
15788	           bool            recall;         /* Pre-recalled flag for
15789	                                              delegations obtained
15790	                                              by reclaim
15791	                                              (CLAIM_PREVIOUS) */
15792	           nfsace4         permissions;    /* Defines users who don't
15793	                                              need an ACCESS call to
15794	                                              open for read */
15795	   };

15797	   struct open_write_delegation4 {
15798	           stateid4        stateid;        /* Stateid for delegation */
15799	           bool            recall;         /* Pre-recalled flag for
15800	                                              delegations obtained
15801	                                              by reclaim
15802	                                              (CLAIM_PREVIOUS) */
15803	           nfs_space_limit4 space_limit;   /* Defines condition that
15804	                                              the client must check to
15805	                                              determine whether the
15806	                                              file needs to be flushed
15807	                                              to the server on close.
15808	                                              */
15809	           nfsace4         permissions;    /* Defines users who don't
15810	                                              need an ACCESS call as
15811	                                              part of a delegated
15812	                                              open. */
15813	   };

15815	   enum why_no_delegation4 { /* new to v4.1 */
15816	           WND4_NOT_WANTED         = 0,
15817	           WND4_CONTENTION         = 1,
15818	           WND4_RESOURCE           = 2,
15819	           WND4_NOT_SUPP_FTYPE     = 3,
15820	           WND4_WRITE_DELEG_NOT_SUPP_FTYPE = 4,
15821	           WND4_NOT_SUPP_UPGRADE   = 5,
15822	           WND4_NOT_SUPP_DOWNGRADE = 6,
15823	           WND4_CANCELED           = 7,
15824	           WND4_IS_DIR             = 8
15825	   };

15827	   union open_none_delegation4 /* new to v4.1 */
15828	   switch (why_no_delegation4 ond_why) {
15829	           case WND4_CONTENTION:
15830	                   bool ond_server_will_push_deleg;
15831	           case WND4_RESOURCE:
15832	                   bool ond_server_will_signal_avail;
15833	           default:
15834	                   void;
15835	   };

15837	   union open_delegation4
15838	   switch (open_delegation_type4 delegation_type) {
15839	           case OPEN_DELEGATE_NONE:
15840	                   void;
15841	           case OPEN_DELEGATE_READ:
15842	                   open_read_delegation4 read;
15843	           case OPEN_DELEGATE_WRITE:
15844	                   open_write_delegation4 write;
15845	           case OPEN_DELEGATE_NONE_EXT: /* new to v4.1 */
15846	                   open_none_delegation4 od_whynone;
15847	   };
15848	   /*
15849	    * Result flags
15850	    */
15851	   /* Client must confirm open */
15852	   const OPEN4_RESULT_CONFIRM      = 0x00000002;
15853	   /* Type of file locking behavior at the server */
15854	   const OPEN4_RESULT_LOCKTYPE_POSIX = 0x00000004;
15855	   /* Server will preserve file if removed while open */
15856	   const OPEN4_RESULT_PRESERVE_UNLINKED = 0x00000008;
15857	   /* Server may use CB_NOTIFY_LOCK on locks derived from this open */
15858	   const OPEN4_RESULT_MAY_NOTIFY_LOCK = 0x00000020;

15860	   struct OPEN4resok {
15861	           stateid4        stateid;        /* Stateid for open */
15862	           change_info4    cinfo;          /* Directory Change Info */
15863	           uint32_t        rflags;         /* Result flags */
15864	           bitmap4         attrset;        /* attribute set for create*/
15865	           open_delegation4 delegation;    /* Info on any open
15866	                                              delegation */
15867	   };

15869	   union OPEN4res switch (nfsstat4 status) {
15870	    case NFS4_OK:
15871	           /* CURRENT_FH: opened file */
15872	           OPEN4resok      resok4;
15873	    default:
15874	           void;
15875	   };

15877	17.16.4.  DESCRIPTION

15879	   The OPEN operation creates and/or opens a regular file in a directory
15880	   with the provided name.  If the file does not exist at the server and
15881	   creation is desired, specification of the method of creation is
15882	   provided by the openhow parameter.  The client has the choice of
15883	   three creation methods: UNCHECKED, GUARDED, or EXCLUSIVE.

15885	   If the current filehandle is a named attribute directory, OPEN will
15886	   then create or open a named attribute file.  Note that exclusive
15887	   create of a named attribute is not supported.  If the createmode is
15888	   EXCLUSIVE4 and the current filehandle is a named attribute directory,
15889	   the server will return EINVAL.

15891	   UNCHECKED means that the file should be created if a file of that
15892	   name does not exist and encountering an existing regular file of that
15893	   name is not an error.  For this type of create, createattrs specifies
15894	   the initial set of attributes for the file.  The set of attributes
15895	   may include any writable attribute valid for regular files.  When an
15896	   UNCHECKED create encounters an existing file, the attributes
15897	   specified by createattrs are not used, except that when an size of
15898	   zero is specified, the existing file is truncated.  If GUARDED is
15899	   specified, the server checks for the presence of a duplicate object
15900	   by name before performing the create.  If a duplicate exists, an
15901	   error of NFS4ERR_EXIST is returned as the status.  If the object does
15902	   not exist, the request is performed as described for UNCHECKED.  For
15903	   each of these cases (UNCHECKED and GUARDED) where the operation is
15904	   successful, the server will return to the client an attribute mask
15905	   signifying which attributes were successfully set for the object.

15907	   EXCLUSIVE specifies that the server is to follow exclusive creation
15908	   semantics, using the verifier to ensure exclusive creation of the
15909	   target.  The server should check for the presence of a duplicate
15910	   object by name.  If the object does not exist, the server creates the
15911	   object and stores the verifier with the object.  If the object does
15912	   exist and the stored verifier matches the client provided verifier,
15913	   the server uses the existing object as the newly created object.  If
15914	   the stored verifier does not match, then an error of NFS4ERR_EXIST is
15915	   returned.  No attributes may be provided in this case, since the
15916	   server may use an attribute of the target object to store the
15917	   verifier.  If the server uses an attribute to store the exclusive
15918	   create verifier, it will signify which attribute by setting the
15919	   appropriate bit in the attribute mask that is returned in the
15920	   results.

15922	   For the target directory, the server returns change_info4 information
15923	   in cinfo.  With the atomic field of the change_info4 struct, the
15924	   server will indicate if the before and after change attributes were
15925	   obtained atomically with respect to the link creation.

15927	   Upon successful creation, the current filehandle is replaced by that
15928	   of the new object.

15930	   The OPEN operation provides for Windows share reservation capability
15931	   with the use of the share_access and share_deny fields of the OPEN
15932	   arguments.  The client specifies at OPEN the required share_access
15933	   and share_deny modes.  For clients that do not directly support
15934	   SHAREs (i.e.  UNIX), the expected deny value is DENY_NONE.  In the
15935	   case that there is a existing SHARE reservation that conflicts with
15936	   the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED.
15937	   For each OPEN, the client must provide a value for the owner field
15938	   for the OPEN argument.  The client ID associated with the owner is
15939	   not derived from the client field of the owner parameter but is
15940	   instead the client ID associated with the session on which the
15941	   request is issued.  If the client ID field of the owner parameter is
15942	   not zero, the server MUST return an NFS4ERR_INVAL error.  For
15943	   additional discussion of SHARE semantics see Section 8.8.

15945	   The seqid value is not used in NFSv4.1.  If the value passed is not
15946	   zero, the server MUST return an NFS4ERR_INVAL error.

15948	   In the case that the client is recovering state from a server
15949	   failure, the claim field of the OPEN argument is used to signify that
15950	   the request is meant to reclaim state previously held.

15952	   The "claim" field of the OPEN argument is used to specify the file to
15953	   be opened and the state information which the client claims to
15954	   possess.  There are seven claim types as follows:

15956	   +---------------------+---------------------------------------------+
15957	   | open type           | description                                 |
15958	   +---------------------+---------------------------------------------+
15959	   | CLAIM_NULL CLAIM_FH | For the client, this is a new OPEN request  |
15960	   |                     | and there is no previous state associate    |
15961	   |                     | with the file for the client.  With         |
15962	   |                     | CLAIM_NULL the file is identified by the    |
15963	   |                     | current filehandle and the specified        |
15964	   |                     | component name.  With CLAIM_FH (new to      |
15965	   |                     | v4.1) the file is identified by just the    |
15966	   |                     | current filehandle.                         |
15967	   | CLAIM_PREVIOUS      | The client is claiming basic OPEN state for |
15968	   |                     | a file that was held previous to a server   |
15969	   |                     | reboot.  Generally used when a server is    |
15970	   |                     | returning persistent filehandles; the       |
15971	   |                     | client may not have the file name to        |
15972	   |                     | reclaim the OPEN.                           |
15973	   | CLAIM_DELEGATE_CUR  | The client is claiming a delegation for     |
15974	   | CLAIM_DELEG_PREV_FH | OPEN as granted by the server.  Generally   |
15975	   |                     | this is done as part of recalling a         |
15976	   |                     | delegation.  With CLAIM_DELEGATE_CUR, the   |
15977	   |                     | file is identified by the current           |
15978	   |                     | filehandle and the specified component      |
15979	   |                     | name.  With CLAIM_DELEG_PREV_FH (new to     |
15980	   |                     | v4.1), the file is identified by just the   |
15981	   |                     | current filehandle.                         |
15982	   | CLAIM_DELEGATE_PREV | The client is claiming a delegation granted |
15983	   | CLAIM_DELEG_PREV_FH | to a previous client instance; used after   |
15984	   |                     | the client reboots.  The server MAY support |
15985	   |                     | CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH. |
15986	   |                     | If it does support either open type,        |
15987	   |                     | CREATE_SESSION MUST NOT remove the client's |
15988	   |                     | delegation state, and the server MUST       |
15989	   |                     | support the DELEGPURGE operation.           |
15990	   +---------------------+---------------------------------------------+
15991	   For OPEN requests whose claim type is other than CLAIM_PREVIOUS (i.e.
15992	   requests other than those devoted to reclaiming opens after a server
15993	   reboot) that reach the server during its grace or lease expiration
15994	   period, the server returns an error of NFS4ERR_GRACE.

15996	   For any OPEN request, the server may return an open delegation, which
15997	   allows further opens and closes to be handled locally on the client
15998	   as described in the section Open Delegation.  Note that delegation is
15999	   up to the server to decide.  The client should never assume that
16000	   delegation will or will not be granted in a particular instance.  It
16001	   should always be prepared for either case.  A partial exception is
16002	   the reclaim (CLAIM_PREVIOUS) case, in which a delegation type is
16003	   claimed.  In this case, delegation will always be granted, although
16004	   the server may specify an immediate recall in the delegation
16005	   structure.

16007	   The rflags returned by a successful OPEN allow the server to return
16008	   information governing how the open file is to be handled.

16010	   o  OPEN4_RESULT_CONFIRM is deprecated and MUST not be returned by an
16011	      NFSv4.1 server.

16013	   o  OPEN4_RESULT_LOCKTYPE_POSIX indicates the server's file locking
16014	      behavior supports the complete set of Posix locking techniques.
16015	      From this the client can choose to manage file locking state in a
16016	      way to handle a mis-match of file locking management.

16018	   o  OPEN4_RESULT_PRESERVE_UNLINKED indicates the server will preserve
16019	      the open file if the client (or any other client) removes the file
16020	      as long as it is open.  Furthermore, the server promises to
16021	      preserve the file through the grace period after server reboot,
16022	      thereby giving the client the opportunity to reclaim his open.

16024	   o  OPEN4_RESULT_MAY_NOTIFY_LOCK indicates that the server may attempt
16025	      CB_NOTIFY_LOCK callbacks for locks on this file.  This flag is a
16026	      hint only, and may be safely ignored by the client.

16028	   If the component is of zero length, NFS4ERR_INVAL will be returned.
16029	   The component is also subject to the normal UTF-8, character support,
16030	   and name checks.  See the section "UTF-8 Related Errors" for further
16031	   [[Comment.17: add an xref to the UTD-8 section]]. discussion.

16033	   When an OPEN is done and the specified lockowner already has the
16034	   resulting filehandle open, the result is to "OR" together the new
16035	   share and deny status together with the existing status.  In this
16036	   case, only a single CLOSE need be done, even though multiple OPENs
16037	   were completed.  When such an OPEN is done, checking of share
16038	   reservations for the new OPEN proceeds normally, with no exception
16039	   for the existing OPEN held by the same lockowner.

16041	   If the underlying file system at the server is only accessible in a
16042	   read-only mode and the OPEN request has specified ACCESS_WRITE or
16043	   ACCESS_BOTH, the server will return NFS4ERR_ROFS to indicate a read-
16044	   only file system.

16046	   As with the CREATE operation, the server MUST derive the owner, owner
16047	   ACE, group, or group ACE if any of the four attributes are required
16048	   and supported by the server's file system.  For an OPEN with the
16049	   EXCLUSIVE4 createmode, the server has no choice, since such OPEN
16050	   calls do not include the createattrs field.  Conversely, if
16051	   createattrs is specified, and includes owner or group (or
16052	   corresponding ACEs) that the principal in the RPC call's credentials
16053	   does not have authorization to create files for, then the server may
16054	   return NFS4ERR_PERM.

16056	   In the case of a OPEN which specifies a size of zero (e.g.
16057	   truncation) and the file has named attributes, the named attributes
16058	   are left as is.  They are not removed.

16060	   NFSv4.1 gives more precise control to clients over acquisition of
16061	   delegations via the following new flags for the share_access field of
16062	   OPEN4args:

16064	   OPEN4_SHARE_ACCESS_WANT_READ_DELEG

16066	   OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG

16068	   OPEN4_SHARE_ACCESS_WANT_ANY_DELEG

16070	   OPEN4_SHARE_ACCESS_WANT_NO_DELEG

16072	   OPEN4_SHARE_ACCESS_WANT_CANCEL

16074	   OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL

16076	   OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED

16078	   If (share_access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is not zero,
16079	   then the client will have specified one and only one of:

16081	   OPEN4_SHARE_ACCESS_WANT_READ_DELEG

16083	   OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG
16084	   OPEN4_SHARE_ACCESS_WANT_ANY_DELEG

16086	   OPEN4_SHARE_ACCESS_WANT_NO_DELEG

16088	   OPEN4_SHARE_ACCESS_WANT_CANCEL

16090	   Otherwise the client is indicating no desire for a delegation and the
16091	   server MAY or MAY not return a delegation in the OPEN response.

16093	   If the server supports the new _WANT_ flags and the client issues one
16094	   or more of the new flags, then in the event the server does not
16095	   return a delegation, it MUST return a delegation type of
16096	   OPEN_DELEGATE_NONE_EXT. od_whynone indicates why no delegation was
16097	   returned and will be one of:

16099	   WND4_NOT_WANTED  The client specified
16100	      OPEN4_SHARE_ACCESS_WANT_NO_DELEG.

16102	   WND4_CONTENTION  There is a conflicting delegation or open on the
16103	      file.

16105	   WND4_RESOURCE  Resource limitations prevent the server from granting
16106	      a delegation.

16108	   WND4_NOT_SUPP_FTYPE  The server does not support delegations on this
16109	      file type.

16111	   WND4_WRITE_DELEG_NOT_SUPP_FTYPE  The server does not support write
16112	      delegations on this file type.

16114	   WND4_NOT_SUPP_UPGRADE  The server does not support atomic upgrade of
16115	      a read delegation to a write delegation.

16117	   WND4_NOT_SUPP_DOWNGRADE  The server does not support atomic downgrade
16118	      of a write delegation to a read delegation.

16120	   WND4_CANCELED  The client specified OPEN4_SHARE_ACCESS_WANT_CANCEL
16121	      and now any "want" for this file object is cancelled.

16123	   WND4_IS_DIR  The specified file object is a directory, and the
16124	      operation is OPEN or WANT_DELEGATION which do not support
16125	      delegations on directories.

16127	   OPEN4_SHARE_ACCESS_WANT_READ_DELEG,
16128	   OPEN_SHARE_ACCESS_WANT_WRITE_DELEG, or
16129	   OPEN_SHARE_ACCESS_WANT_ANY_DELEG mean, respectively, the client wants
16130	   a read, write, or any delegation regardless which of
16131	   OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or
16132	   OPEN4_SHARE_ACCESS_BOTH is set.  If the client has a read delegation
16133	   on a file, and requests a write delegation, then the client is
16134	   requesting atomic upgrade of its read delegation to a write
16135	   delegation.  If the client has a write delegation on a file, and
16136	   requests a read delegation, then the client is requesting atomic
16137	   downgrade to a read delegation.  A server MAY support atomic upgrade
16138	   or downgrade.  If it does, then the returned delegation_type of
16139	   OPEN_DELEGATE_READ or OPEN_DELEGATE_WRITE that is different than the
16140	   delegation type the client currently has, indicates successful
16141	   upgrade or downgrade.  If it does not support atomic delegation
16142	   upgrade or downgrade, then od_whynone will be WND4_NOT_SUPP_UPGRADE
16143	   or WND4_NOT_SUPP_DOWNGRADE.

16145	   OPEN4_SHARE_ACCESS_WANT_NO_DELEG means the client wants no
16146	   delegation.

16148	   OPEN4_SHARE_ACCESS_WANT_CANCEL means the client wants no delegation
16149	   and wants to cancel any previously registered "want" for a
16150	   delegation.

16152	   The client may set one or both of
16153	   OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL and
16154	   OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED.  However, they
16155	   will have no effect unless one of following are set:

16157	   o  OPEN4_SHARE_ACCESS_WANT_READ_DELEG

16159	   o  OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG

16161	   o  OPEN4_SHARE_ACCESS_WANT_ANY_DELEG

16163	   If the client specifies
16164	   OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL, then it wishes
16165	   to register a "want" for a delegation, in the event the OPEN results
16166	   do not include a delegation.  If so and the server denies the
16167	   delegation due to insufficient resources, the server MAY later inform
16168	   the client, via the CB_RECALLABLE_OBJ_AVAIL operation, that the
16169	   resource limitation condition has eased.  The server will tell the
16170	   client that it intends to send a future CB_RECALLABLE_OBJ_AVAIL
16171	   operation by setting delegation_type in the results to
16172	   OPEN_DELEGATE_NONE_EXT, ond_why to WND4_RESOURCE, and
16173	   ond_server_will_signal_avail set to TRUE.  If
16174	   ond_server_will_signal_avail is set to TRUE, the server MUST later
16175	   send a CB_RECALLABLE_OBJ_AVAIL operation.

16177	   If the client specifies
16178	   OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_UNCONTENDED, then it wishes
16179	   to register a "want" for a delegation, in the event the OPEN results
16180	   do not include a delegation.  If so and the server denies the
16181	   delegation due to insufficient resources, the server MAY later inform
16182	   the client, via the CB_PUSH_DELEG operation, that the resource
16183	   limitation condition has eased.  The server will tell the client that
16184	   it intends to send a future CB_PUSH_DELEG operation by setting
16185	   delegation_type in the results to OPEN_DELEGATE_NONE_EXT, ond_why to
16186	   WND4_CONTENTION, and ond_server_will_push_deleg to TRUE.  If
16187	   ond_server_will_push_deleg is TRUE, the server MUST later send a
16188	   CB_RECALLABLE_OBJ_AVAIL operation.

16190	   If the client has previously registered a want for a delegation on a
16191	   file, and then sends a request to register a want for a delegation on
16192	   the same file, the server MUST return a new error:
16193	   NFS4ERR_DELEG_ALREADY_WANTED.  If the client wishes to register a
16194	   different type of delegation want for the same file, it MUST cancel
16195	   the existing delegation WANT.

16197	17.16.5.  IMPLEMENTATION

16199	   The OPEN operation contains support for EXCLUSIVE create.  The
16200	   mechanism is similar to the support in NFS version 3 [18].  However,
16201	   this mechanism is not needed if a server stores its reply cache in
16202	   stable storage.  If the server indicates (via the csr_persist field
16203	   in the response to CREATE_SESSION) the client SHOULD NOT use OPEN's
16204	   approach to exclusive create.

16206	   In absence of csr_persist being TRUE, the client invokes exclusive
16207	   create by setting the how parameter is EXCLUSIVE.  In this case, the
16208	   client provides a verifier that can reasonably be expected to be
16209	   unique.  A combination of a client identifier, perhaps the client
16210	   network address, and a unique number generated by the client, perhaps
16211	   the RPC transaction identifier, may be appropriate.  This mechanism
16212	   allows reliable exclusive create semantics even when the server does
16213	   not support the storing session reply information in stable storage.

16215	   If the object does not exist, the server creates the object and
16216	   stores the verifier in stable storage.  For file systems that do not
16217	   provide a mechanism for the storage of arbitrary file attributes, the
16218	   server may use one or more elements of the object meta-data to store
16219	   the verifier.  The verifier must be stored in stable storage to
16220	   prevent erroneous failure on retransmission of the request.  It is
16221	   assumed that an exclusive create is being performed because exclusive
16222	   semantics are critical to the application.  Because of the expected
16223	   usage, exclusive CREATE does not rely solely on the normally volatile
16224	   duplicate request cache for storage of the verifier.  The duplicate
16225	   request cache in volatile storage does not survive a crash and may
16226	   actually flush on a long network partition, opening failure windows.
16227	   In the UNIX local file system environment, the expected storage
16228	   location for the verifier on creation is the meta-data (time stamps)
16229	   of the object.  For this reason, an exclusive object create may not
16230	   include initial attributes because the server would have nowhere to
16231	   store the verifier.

16233	   If the server can not support these exclusive create semantics,
16234	   possibly because of the requirement to commit the verifier to stable
16235	   storage, it should fail the OPEN request with the error,
16236	   NFS4ERR_NOTSUPP.

16238	   During an exclusive CREATE request, if the object already exists, the
16239	   server reconstructs the object's verifier and compares it with the
16240	   verifier in the request.  If they match, the server treats the
16241	   request as a success.  The request is presumed to be a duplicate of
16242	   an earlier, successful request for which the reply was lost and that
16243	   the server duplicate request cache mechanism did not detect.  If the
16244	   verifiers do not match, the request is rejected with the status,
16245	   NFS4ERR_EXIST.

16247	   Once the client has performed a successful exclusive create, it must
16248	   issue a SETATTR to set the correct object attributes.  Until it does
16249	   so, it should not rely upon any of the object attributes, since the
16250	   server implementation may need to overload object meta-data to store
16251	   the verifier.  The subsequent SETATTR must not occur in the same
16252	   COMPOUND request as the OPEN.  This separation will guarantee that
16253	   the exclusive create mechanism will continue to function properly in
16254	   the face of retransmission of the request.

16256	   Use of the GUARDED attribute does not provide exactly-once semantics.
16257	   In particular, if a reply is lost and the server does not detect the
16258	   retransmission of the request, the operation can fail with
16259	   NFS4ERR_EXIST, even though the create was performed successfully.
16260	   The client would use this behavior in the case that the application
16261	   has not requested an exclusive create but has asked to have the file
16262	   truncated when the file is opened.  In the case of the client timing
16263	   out and retransmitting the create request, the client can use GUARDED
16264	   to prevent against a sequence like: create, write, create
16265	   (retransmitted) from occurring.

16267	   For SHARE reservations, the client must specify a value for
16268	   share_access that is one of READ, WRITE, or BOTH.  For share_deny,
16269	   the client must specify one of NONE, READ, WRITE, or BOTH.  If the
16270	   client fails to do this, the server must return NFS4ERR_INVAL.

16272	   Based on the share_access value (READ, WRITE, or BOTH) the client
16273	   should check that the requester has the proper access rights to
16274	   perform the specified operation.  This would generally be the results
16275	   of applying the ACL access rules to the file for the current
16276	   requester.  However, just as with the ACCESS operation, the client
16277	   should not attempt to second-guess the server's decisions, as access
16278	   rights may change and may be subject to server administrative
16279	   controls outside the ACL framework.  If the requester is not
16280	   authorized to READ or WRITE (depending on the share_access value),
16281	   the server must return NFS4ERR_ACCESS.  Note that since the NFS
16282	   version 4 protocol does not impose any requirement that READs and
16283	   WRITEs issued for an open file have the same credentials as the OPEN
16284	   itself, the server still must do appropriate access checking on the
16285	   READs and WRITEs themselves.

16287	   If the component provided to OPEN is a symbolic link, the error
16288	   NFS4ERR_SYMLINK will be returned to the client.  If the current
16289	   filehandle is not a directory, the error NFS4ERR_NOTDIR will be
16290	   returned.

16292	   The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows a
16293	   client avoid the common implementation practice of renaming an open
16294	   file to ".nfs<unique value>" after it removes the file.  After the
16295	   server returns OPEN4_RESULT_PRESERVE_UNLINKED, if a client issues a
16296	   REMOVE operation that would reduce the file's link count to zero, the
16297	   server SHOULD report a value of zero for the FATTR4_NUMLINKS
16298	   attribute on the file.

16300	17.16.5.1.  WARNING TO CLIENT IMPLEMENTORS

16302	   OPEN resembles LOOKUP in that it generates a filehandle for the
16303	   client to use.  Unlike LOOKUP though, OPEN creates server state on
16304	   the filehandle.  In normal circumstances, the client can only release
16305	   this state with a CLOSE operation.  CLOSE uses the current filehandle
16306	   to determine which file to close.  Therefore the client MUST follow
16307	   every OPEN operation with a GETFH operation in the same COMPOUND
16308	   procedure.  This will supply the client with the filehandle such that
16309	   CLOSE can be used appropriately.

16311	   Simply waiting for the lease on the file to expire is insufficient
16312	   because the server may maintain the state indefinitely as long as
16313	   another client does not attempt to make a conflicting access to the
16314	   same file.

16316	17.17.  Operation 19: OPENATTR - Open Named Attribute Directory

16318	17.17.1.  SYNOPSIS

16320	   (cfh) createdir -> (cfh)

16322	17.17.2.  ARGUMENTS

16324	   /*
16325	    * OPENATTR: open named attributes directory
16326	    */
16327	   struct OPENATTR4args {
16328	           /* CURRENT_FH: object */
16329	           bool    createdir;
16330	   };

16332	17.17.3.  RESULTS

16334	   struct OPENATTR4res {
16335	           /* CURRENT_FH: named attr directory */
16336	           nfsstat4        status;
16337	   };

16339	17.17.4.  DESCRIPTION

16341	   The OPENATTR operation is used to obtain the filehandle of the named
16342	   attribute directory associated with the current filehandle.  The
16343	   result of the OPENATTR will be a filehandle to an object of type
16344	   NF4ATTRDIR.  From this filehandle, READDIR and LOOKUP operations can
16345	   be used to obtain filehandles for the various named attributes
16346	   associated with the original file system object.  Filehandles
16347	   returned within the named attribute directory will have a type of
16348	   NF4NAMEDATTR.

16350	   The createdir argument allows the client to signify if a named
16351	   attribute directory should be created as a result of the OPENATTR
16352	   operation.  Some clients may use the OPENATTR operation with a value
16353	   of FALSE for createdir to determine if any named attributes exist for
16354	   the object.  If none exist, then NFS4ERR_NOENT will be returned.  If
16355	   createdir has a value of TRUE and no named attribute directory
16356	   exists, one is created.  The creation of a named attribute directory
16357	   assumes that the server has implemented named attribute support in
16358	   this fashion and is not required to do so by this definition.

16360	17.17.5.  IMPLEMENTATION

16362	   If the server does not support named attributes for the current
16363	   filehandle, an error of NFS4ERR_NOTSUPP will be returned to the
16364	   client.

16366	17.18.  Operation 21: OPEN_DOWNGRADE - Reduce Open File Access

16368	17.18.1.  SYNOPSIS

16370	   (cfh), stateid, seqid, access, deny -> stateid

16372	17.18.2.  ARGUMENTS

16374	   /*
16375	    * OPEN_DOWNGRADE: downgrade the access/deny for a file
16376	    */
16377	   struct OPEN_DOWNGRADE4args {
16378	           /* CURRENT_FH: opened file */
16379	           stateid4        open_stateid;
16380	           seqid4          seqid;
16381	           uint32_t        share_access;
16382	           uint32_t        share_deny;
16383	   };

16385	17.18.3.  RESULTS

16387	   struct OPEN_DOWNGRADE4resok {
16388	           stateid4        open_stateid;
16389	   };

16391	   union OPEN_DOWNGRADE4res switch(nfsstat4 status) {
16392	    case NFS4_OK:
16393	           OPEN_DOWNGRADE4resok    resok4;
16394	    default:
16395	            void;
16396	   };

16398	17.18.4.  DESCRIPTION

16400	   This operation is used to adjust the share_access and share_deny bits
16401	   for a given open.  This is necessary when a given lockowner opens the
16402	   same file multiple times with different share_access and share_deny
16403	   flags.  In this situation, a close of one of the opens may change the
16404	   appropriate share_access and share_deny flags to remove bits
16405	   associated with opens no longer in effect.

16407	   The share_access and share_deny bits specified in this operation
16408	   replace the current ones for the specified open file.  The
16409	   share_access and share_deny bits specified must be exactly equal to
16410	   the union of the share_access and share_deny bits specified for some
16411	   subset of the OPENs in effect for current openowner on the current
16412	   file.  If that constraint is not respected, the error NFS4ERR_INVAL
16413	   should be returned.  Since share_access and share_deny bits are
16414	   subsets of those already granted, it is not possible for this request
16415	   to be denied because of conflicting share reservations.

16417	   The seqid value is not used in NFSv4.1.  If the value passed is not
16418	   zero, the server MUST return an NFS4ERR_INVAL error.

16420	   On success, the current filehandle retains its value.

16422	17.19.  Operation 22: PUTFH - Set Current Filehandle

16424	17.19.1.  SYNOPSIS

16426	   filehandle -> (cfh)

16428	17.19.2.  ARGUMENTS

16430	   /*
16431	    * PUTFH: Set current filehandle
16432	    */
16433	   struct PUTFH4args {
16434	           nfs_fh4         object;
16435	   };

16437	17.19.3.  RESULTS

16439	   struct PUTFH4res {
16440	           /* CURRENT_FH: */
16441	           nfsstat4        status;
16442	   };

16444	17.19.4.  DESCRIPTION

16446	   Replaces the current filehandle with the filehandle provided as an
16447	   argument.

16449	   If the security mechanism used by the requester does not meet the
16450	   requirements of the filehandle provided to this operation, the server
16451	   MUST return NFS4ERR_WRONGSEC.

16453	17.19.5.  IMPLEMENTATION

16455	   Commonly used as the first operator in an NFS request to set the
16456	   context for following operations.

16458	17.20.  Operation 23: PUTPUBFH - Set Public Filehandle

16460	17.20.1.  SYNOPSIS

16462	   - -> (cfh)

16464	17.20.2.  ARGUMENT

16466	   void;

16468	17.20.3.  RESULT

16470	   /*
16471	    * PUTPUBFH: Set public filehandle
16472	    */
16473	   struct PUTPUBFH4res {
16474	           /* CURRENT_FH: public fh */
16475	           nfsstat4        status;
16476	   };

16478	17.20.4.  DESCRIPTION

16480	   Replaces the current filehandle with the filehandle that represents
16481	   the public filehandle of the server's name space.  This filehandle
16482	   may be different from the "root" filehandle which may be associated
16483	   with some other directory on the server.

16485	   The public filehandle represents the concepts embodied in RFC2054
16486	   [25], RFC2055 [26], RFC2224 [32].  The intent for NFS version 4 is
16487	   that the public filehandle (represented by the PUTPUBFH operation) be
16488	   used as a method of providing WebNFS server compatibility with NFS
16489	   versions 2 and 3.

16491	   The public filehandle and the root filehandle (represented by the
16492	   PUTROOTFH operation) should be equivalent.  If the public and root
16493	   filehandles are not equivalent, then the public filehandle MUST be a
16494	   descendant of the root filehandle.

16496	17.20.5.  IMPLEMENTATION

16498	   Used as the first operator in an NFS request to set the context for
16499	   following operations.

16501	   With the NFS version 2 and 3 public filehandle, the client is able to
16502	   specify whether the path name provided in the LOOKUP should be
16503	   evaluated as either an absolute path relative to the server's root or
16504	   relative to the public filehandle.  RFC2224 [32] contains further
16505	   discussion of the functionality.  With NFS version 4, that type of
16506	   specification is not directly available in the LOOKUP operation.  The
16507	   reason for this is because the component separators needed to specify
16508	   absolute vs. relative are not allowed in NFS version 4.  Therefore,
16509	   the client is responsible for constructing its request such that the
16510	   use of either PUTROOTFH or PUTPUBFH are used to signify absolute or
16511	   relative evaluation of an NFS URL respectively.

16513	   Note that there are warnings mentioned in RFC2224 [32] with respect
16514	   to the use of absolute evaluation and the restrictions the server may
16515	   place on that evaluation with respect to how much of its namespace
16516	   has been made available.  These same warnings apply to NFS version 4.
16517	   It is likely, therefore that because of server implementation
16518	   details, an NFS version 3 absolute public filehandle lookup may
16519	   behave differently than an NFS version 4 absolute resolution.

16521	   There is a form of security negotiation as described in RFC2755 [33]
16522	   that uses the public filehandle a method of employing SNEGO.  This
16523	   method is not available with NFS version 4 as filehandles are not
16524	   overloaded with special meaning and therefore do not provide the same
16525	   framework as NFS versions 2 and 3.  Clients should therefore use the
16526	   security negotiation mechanisms described in this RFC.

16528	17.20.6.  ERRORS

16530	17.21.  Operation 24: PUTROOTFH - Set Root Filehandle

16532	17.21.1.  SYNOPSIS

16534	   - -> (cfh)

16536	17.21.2.  ARGUMENTS

16538	   void;

16540	17.21.3.  RESULTS

16542	   /*
16543	    * PUTROOTFH: Set root filehandle
16544	    */
16545	   struct PUTROOTFH4res {
16546	           /* CURRENT_FH: root fh */
16547	           nfsstat4        status;
16548	   };

16550	17.21.4.  DESCRIPTION

16552	   Replaces the current filehandle with the filehandle that represents
16553	   the root of the server's name space.  From this filehandle a LOOKUP
16554	   operation can locate any other filehandle on the server.  This
16555	   filehandle may be different from the "public" filehandle which may be
16556	   associated with some other directory on the server.

16558	17.21.5.  IMPLEMENTATION

16560	   Commonly used as the first operator in an NFS request to set the
16561	   context for following operations.

16563	17.22.  Operation 25: READ - Read from File

16565	17.22.1.  SYNOPSIS

16567	   (cfh), stateid, offset, count -> eof, data

16569	17.22.2.  ARGUMENTS

16571	   /*
16572	    * READ: Read from file
16573	    */
16574	   struct READ4args {
16575	           /* CURRENT_FH: file */
16576	           stateid4        stateid;
16577	           offset4         offset;
16578	           count4          count;
16579	   };

16581	17.22.3.  RESULTS

16583	   struct READ4resok {
16584	           bool            eof;
16585	           opaque          data<>;
16586	   };

16588	   union READ4res switch (nfsstat4 status) {
16589	    case NFS4_OK:
16590	            READ4resok     resok4;
16591	    default:
16592	            void;
16593	   };

16595	17.22.4.  DESCRIPTION

16597	   The READ operation reads data from the regular file identified by the
16598	   current filehandle.

16600	   The client provides an offset of where the READ is to start and a
16601	   count of how many bytes are to be read.  An offset of 0 (zero) means
16602	   to read data starting at the beginning of the file.  If offset is
16603	   greater than or equal to the size of the file, the status, NFS4_OK,
16604	   is returned with a data length set to 0 (zero) and eof is set to
16605	   TRUE.  The READ is subject to access permissions checking.

16607	   If the client specifies a count value of 0 (zero), the READ succeeds
16608	   and returns 0 (zero) bytes of data again subject to access
16609	   permissions checking.  The server may choose to return fewer bytes
16610	   than specified by the client.  The client needs to check for this
16611	   condition and handle the condition appropriately.

16613	   The stateid value for a READ request represents a value returned from
16614	   a previous record lock or share reservation request.  The stateid is
16615	   used by the server to verify that the associated share reservation
16616	   and any record locks are still valid and to update lease timeouts for
16617	   the client.

16619	   If the read ended at the end-of-file (formally, in a correctly formed
16620	   READ request, if offset + count is equal to the size of the file), or
16621	   the read request extends beyond the size of the file (if offset +
16622	   count is greater than the size of the file), eof is returned as TRUE;
16623	   otherwise it is FALSE.  A successful READ of an empty file will
16624	   always return eof as TRUE.

16626	   If the current filehandle is not a regular file, an error will be
16627	   returned to the client.  In the case the current filehandle
16628	   represents a directory, NFS4ERR_ISDIR is return; otherwise,
16629	   NFS4ERR_INVAL is returned.

16631	   For a READ with a stateid value of all bits 0, the server MAY allow
16632	   the READ to be serviced subject to mandatory file locks or the
16633	   current share deny modes for the file.  For a READ with a stateid
16634	   value of all bits 1, the server MAY allow READ operations to bypass
16635	   locking checks at the server.

16637	   On success, the current filehandle retains its value.

16639	17.22.5.  IMPLEMENTATION

16641	   It is possible for the server to return fewer than count bytes of
16642	   data.  If the server returns less than the count requested and eof is
16643	   set to FALSE, the client should issue another READ to get the
16644	   remaining data.  A server may return less data than requested under
16645	   several circumstances.  The file may have been truncated by another
16646	   client or perhaps on the server itself, changing the file size from
16647	   what the requesting client believes to be the case.  This would
16648	   reduce the actual amount of data available to the client.  It is
16649	   possible that the server may back off the transfer size and reduce
16650	   the read request return.  Server resource exhaustion may also occur
16651	   necessitating a smaller read return.

16653	   If mandatory file locking is on for the file, and if the region
16654	   corresponding to the data to be read from file is write locked by an
16655	   owner not associated the stateid, the server will return the
16656	   NFS4ERR_LOCKED error.  The client should try to get the appropriate
16657	   read record lock via the LOCK operation before re-attempting the
16658	   READ.  When the READ completes, the client should release the record
16659	   lock via LOCKU.

16661	17.23.  Operation 26: READDIR - Read Directory

16663	17.23.1.  SYNOPSIS

16665	   (cfh), cookie, cookieverf, dircount, maxcount, attr_request ->
16666	   cookieverf { cookie, name, attrs }

16668	17.23.2.  ARGUMENTS

16670	   /*
16671	    * READDIR: Read directory
16672	    */
16673	   struct READDIR4args {
16674	           /* CURRENT_FH: directory */
16675	           nfs_cookie4     cookie;
16676	           verifier4       cookieverf;
16677	           count4          dircount;
16678	           count4          maxcount;
16679	           bitmap4         attr_request;
16680	   };

16682	17.23.3.  RESULTS

16684	   struct entry4 {
16685	           nfs_cookie4     cookie;
16686	           component4      name;
16687	           fattr4          attrs;
16688	           entry4          *nextentry;
16689	   };

16691	   struct dirlist4 {
16692	           entry4          *entries;
16693	           bool            eof;
16694	   };

16696	   struct READDIR4resok {
16697	           verifier4       cookieverf;
16698	           dirlist4        reply;
16699	   };

16701	   union READDIR4res switch (nfsstat4 status) {
16702	    case NFS4_OK:
16703	            READDIR4resok  resok4;
16704	    default:
16705	            void;
16706	   };

16708	17.23.4.  DESCRIPTION

16710	   The READDIR operation retrieves a variable number of entries from a
16711	   file system directory and returns client requested attributes for
16712	   each entry along with information to allow the client to request
16713	   additional directory entries in a subsequent READDIR.

16715	   The arguments contain a cookie value that represents where the
16716	   READDIR should start within the directory.  A value of 0 (zero) for
16717	   the cookie is used to start reading at the beginning of the
16718	   directory.  For subsequent READDIR requests, the client specifies a
16719	   cookie value that is provided by the server on a previous READDIR
16720	   request.

16722	   The cookieverf value should be set to 0 (zero) when the cookie value
16723	   is 0 (zero) (first directory read).  On subsequent requests, it
16724	   should be a cookieverf as returned by the server.  The cookieverf
16725	   must match that returned by the READDIR in which the cookie was
16726	   acquired.  If the server determines that the cookieverf is no longer
16727	   valid for the directory, the error NFS4ERR_NOT_SAME must be returned.

16729	   The dircount portion of the argument is a hint of the maximum number
16730	   of bytes of directory information that should be returned.  This
16731	   value represents the length of the names of the directory entries and
16732	   the cookie value for these entries.  This length represents the XDR
16733	   encoding of the data (names and cookies) and not the length in the
16734	   native format of the server.

16736	   The maxcount value of the argument is the maximum number of bytes for
16737	   the result.  This maximum size represents all of the data being
16738	   returned within the READDIR4resok structure and includes the XDR
16739	   overhead.  The server may return less data.  If the server is unable
16740	   to return a single directory entry within the maxcount limit, the
16741	   error NFS4ERR_TOOSMALL will be returned to the client.

16743	   Finally, attr_request represents the list of attributes to be
16744	   returned for each directory entry supplied by the server.

16746	   On successful return, the server's response will provide a list of
16747	   directory entries.  Each of these entries contains the name of the
16748	   directory entry, a cookie value for that entry, and the associated
16749	   attributes as requested.  The "eof" flag has a value of TRUE if there
16750	   are no more entries in the directory.

16752	   The cookie value is only meaningful to the server and is used as a
16753	   "bookmark" for the directory entry.  As mentioned, this cookie is
16754	   used by the client for subsequent READDIR operations so that it may
16755	   continue reading a directory.  The cookie is similar in concept to a
16756	   READ offset but should not be interpreted as such by the client.
16757	   Ideally, the cookie value should not change if the directory is
16758	   modified since the client may be caching these values.

16760	   In some cases, the server may encounter an error while obtaining the
16761	   attributes for a directory entry.  Instead of returning an error for
16762	   the entire READDIR operation, the server can instead return the
16763	   attribute 'fattr4_rdattr_error'.  With this, the server is able to
16764	   communicate the failure to the client and not fail the entire
16765	   operation in the instance of what might be a transient failure.
16766	   Obviously, the client must request the fattr4_rdattr_error attribute
16767	   for this method to work properly.  If the client does not request the
16768	   attribute, the server has no choice but to return failure for the
16769	   entire READDIR operation.

16771	   For some file system environments, the directory entries "." and ".."
16772	   have special meaning and in other environments, they may not.  If the
16773	   server supports these special entries within a directory, they should
16774	   not be returned to the client as part of the READDIR response.  To
16775	   enable some client environments, the cookie values of 0, 1, and 2 are
16776	   to be considered reserved.  Note that the UNIX client will use these
16777	   values when combining the server's response and local representations
16778	   to enable a fully formed UNIX directory presentation to the
16779	   application.

16781	   For READDIR arguments, cookie values of 1 and 2 should not be used
16782	   and for READDIR results cookie values of 0, 1, and 2 should not be
16783	   returned.

16785	   On success, the current filehandle retains its value.

16787	17.23.5.  IMPLEMENTATION

16789	   The server's file system directory representations can differ
16790	   greatly.  A client's programming interfaces may also be bound to the
16791	   local operating environment in a way that does not translate well
16792	   into the NFS protocol.  Therefore the use of the dircount and
16793	   maxcount fields are provided to allow the client the ability to
16794	   provide guidelines to the server.  If the client is aggressive about
16795	   attribute collection during a READDIR, the server has an idea of how
16796	   to limit the encoded response.  The dircount field provides a hint on
16797	   the number of entries based solely on the names of the directory
16798	   entries.  Since it is a hint, it may be possible that a dircount
16799	   value is zero.  In this case, the server is free to ignore the
16800	   dircount value and return directory information based on the
16801	   specified maxcount value.

16803	   The cookieverf may be used by the server to help manage cookie values
16804	   that may become stale.  It should be a rare occurrence that a server
16805	   is unable to continue properly reading a directory with the provided
16806	   cookie/cookieverf pair.  The server should make every effort to avoid
16807	   this condition since the application at the client may not be able to
16808	   properly handle this type of failure.

16810	   The use of the cookieverf will also protect the client from using
16811	   READDIR cookie values that may be stale.  For example, if the file
16812	   system has been migrated, the server may or may not be able to use
16813	   the same cookie values to service READDIR as the previous server
16814	   used.  With the client providing the cookieverf, the server is able
16815	   to provide the appropriate response to the client.  This prevents the
16816	   case where the server may accept a cookie value but the underlying
16817	   directory has changed and the response is invalid from the client's
16818	   context of its previous READDIR.

16820	   Since some servers will not be returning "." and ".." entries as has
16821	   been done with previous versions of the NFS protocol, the client that
16822	   requires these entries be present in READDIR responses must fabricate
16823	   them.

16825	17.24.  Operation 27: READLINK - Read Symbolic Link

16827	17.24.1.  SYNOPSIS

16829	   (cfh) -> linktext

16831	17.24.2.  ARGUMENTS

16833	   /* CURRENT_FH: symlink */
16834	   void;

16836	17.24.3.  RESULTS

16838	   /*
16839	    * READLINK: Read symbolic link
16840	    */
16841	   struct READLINK4resok {
16842	           linktext4       link;
16843	   };

16845	   union READLINK4res switch (nfsstat4 status) {
16846	    case NFS4_OK:
16847	            READLINK4resok resok4;
16848	    default:
16849	            void;
16850	   };

16852	17.24.4.  DESCRIPTION

16854	   READLINK reads the data associated with a symbolic link.  The data is
16855	   a UTF-8 string that is opaque to the server.  That is, whether
16856	   created by an NFS client or created locally on the server, the data
16857	   in a symbolic link is not interpreted when created, but is simply
16858	   stored.

16860	   On success, the current filehandle retains its value.

16862	17.24.5.  IMPLEMENTATION

16864	   A symbolic link is nominally a pointer to another file.  The data is
16865	   not necessarily interpreted by the server, just stored in the file.
16866	   It is possible for a client implementation to store a path name that
16867	   is not meaningful to the server operating system in a symbolic link.
16868	   A READLINK operation returns the data to the client for
16869	   interpretation.  If different implementations want to share access to
16870	   symbolic links, then they must agree on the interpretation of the
16871	   data in the symbolic link.

16873	   The READLINK operation is only allowed on objects of type NF4LNK.
16874	   The server should return the error, NFS4ERR_INVAL, if the object is
16875	   not of type, NF4LNK.

16877	17.25.  Operation 28: REMOVE - Remove File System Object

16879	17.25.1.  SYNOPSIS

16881	   (cfh), filename -> change_info

16883	17.25.2.  ARGUMENTS

16885	   /*
16886	    * REMOVE: Remove filesystem object
16887	    */
16888	   struct REMOVE4args {
16889	           /* CURRENT_FH: directory */
16890	           component4      target;
16891	   };

16893	17.25.3.  RESULTS

16895	   struct REMOVE4resok {
16896	           change_info4    cinfo;
16897	   };

16899	   union REMOVE4res switch (nfsstat4 status) {
16900	    case NFS4_OK:
16901	            REMOVE4resok   resok4;
16902	    default:
16903	            void;
16904	   };

16906	17.25.4.  DESCRIPTION

16908	   The REMOVE operation removes (deletes) a directory entry named by
16909	   filename from the directory corresponding to the current filehandle.
16910	   If the entry in the directory was the last reference to the
16911	   corresponding file system object, the object may be destroyed.

16913	   For the directory where the filename was removed, the server returns
16914	   change_info4 information in cinfo.  With the atomic field of the
16915	   change_info4 struct, the server will indicate if the before and after
16916	   change attributes were obtained atomically with respect to the
16917	   removal.

16919	   If the target has a length of 0 (zero), or if target does not obey
16920	   the UTF-8 definition, the error NFS4ERR_INVAL will be returned.

16922	   On success, the current filehandle retains its value.

16924	17.25.5.  IMPLEMENTATION

16926	   NFS versions 2 and 3 required a different operator RMDIR for
16927	   directory removal and REMOVE for non-directory removal.  This allowed
16928	   clients to skip checking the file type when being passed a non-
16929	   directory delete system call (e.g. unlink() in POSIX) to remove a
16930	   directory, as well as the converse (e.g. a rmdir() on a non-
16931	   directory) because they knew the server would check the file type.
16932	   NFS version 4 REMOVE can be used to delete any directory entry
16933	   independent of its file type.  The implementor of an NFS version 4
16934	   client's entry points from the unlink() and rmdir() system calls
16935	   should first check the file type against the types the system call is
16936	   allowed to remove before issuing a REMOVE.  Alternatively, the
16937	   implementor can produce a COMPOUND call that includes a LOOKUP/VERIFY
16938	   sequence to verify the file type before a REMOVE operation in the
16939	   same COMPOUND call.

16941	   The concept of last reference is server specific.  However, if the
16942	   numlinks field in the previous attributes of the object had the value
16943	   1, the client should not rely on referring to the object via a
16944	   filehandle.  Likewise, the client should not rely on the resources
16945	   (disk space, directory entry, and so on) formerly associated with the
16946	   object becoming immediately available.  Thus, if a client needs to be
16947	   able to continue to access a file after using REMOVE to remove it,
16948	   the client should take steps to make sure that the file will still be
16949	   accessible.  The usual mechanism used is to RENAME the file from its
16950	   old name to a new hidden name.

16952	   If the server finds that the file is still open when the REMOVE
16953	   arrives:

16955	   o  The server SHOULD NOT delete the file's directory entry if the
16956	      file was opened with OPEN4_SHARE_DENY_WRITE or
16957	      OPEN4_SHARE_DENY_BOTH.

16959	   o  If the file was not opened with OPEN4_SHARE_DENY_WRITE or
16960	      OPEN4_SHARE_DENY_BOTH, the server SHOULD delete the file's
16961	      directory entry.  However, until last CLOSE of the file, the
16962	      server MAY continue to allow access to the file via its
16963	      filehandle.

16965	17.26.  Operation 29: RENAME - Rename Directory Entry

16967	17.26.1.  SYNOPSIS

16969	   (sfh), oldname, (cfh), newname -> source_change_info,
16970	   target_change_info

16972	17.26.2.  ARGUMENTS

16974	   /*
16975	    * RENAME: Rename directory entry
16976	    */
16977	   struct RENAME4args {
16978	           /* SAVED_FH: source directory */
16979	           component4      oldname;
16980	           /* CURRENT_FH: target directory */
16981	           component4      newname;
16982	   };

16984	17.26.3.  RESULTS

16986	   struct RENAME4resok {
16987	           change_info4    source_cinfo;
16988	           change_info4    target_cinfo;
16989	   };

16991	   union RENAME4res switch (nfsstat4 status) {
16992	    case NFS4_OK:
16993	           RENAME4resok    resok4;
16994	    default:
16995	           void;
16996	   };

16998	17.26.4.  DESCRIPTION

17000	   The RENAME operation renames the object identified by oldname in the
17001	   source directory corresponding to the saved filehandle, as set by the
17002	   SAVEFH operation, to newname in the target directory corresponding to
17003	   the current filehandle.  The operation is required to be atomic to
17004	   the client.  Source and target directories must reside on the same
17005	   file system on the server.  On success, the current filehandle will
17006	   continue to be the target directory.

17008	   If the target directory already contains an entry with the name,
17009	   newname, the source object must be compatible with the target: either
17010	   both are non-directories or both are directories and the target must
17011	   be empty.  If compatible, the existing target is removed before the
17012	   rename occurs (See the IMPLEMENTATION subsection of the section
17013	   "Operation 28: REMOVE - Remove File System Object" for client and
17014	   server actions whenever a target is removed).  If they are not
17015	   compatible or if the target is a directory but not empty, the server
17016	   will return the error, NFS4ERR_EXIST.

17018	   If oldname and newname both refer to the same file (they might be
17019	   hard links of each other), then RENAME should perform no action and
17020	   return success.

17022	   For both directories involved in the RENAME, the server returns
17023	   change_info4 information.  With the atomic field of the change_info4
17024	   struct, the server will indicate if the before and after change
17025	   attributes were obtained atomically with respect to the rename.

17027	   If the oldname refers to a named attribute and the saved and current
17028	   filehandles refer to different file system objects, the server will
17029	   return NFS4ERR_XDEV just as if the saved and current filehandles
17030	   represented directories on different file systems.

17032	   If the oldname or newname has a length of 0 (zero), or if oldname or
17033	   newname does not obey the UTF-8 definition, the error NFS4ERR_INVAL
17034	   will be returned.

17036	17.26.5.  IMPLEMENTATION

17038	   The RENAME operation must be atomic to the client.  The statement
17039	   "source and target directories must reside on the same file system on
17040	   the server" means that the fsid fields in the attributes for the
17041	   directories are the same.  If they reside on different file systems,
17042	   the error, NFS4ERR_XDEV, is returned.

17044	   Based on the value of the fh_expire_type attribute for the object,
17045	   the filehandle may or may not expire on a RENAME.  However, server
17046	   implementors are strongly encouraged to attempt to keep filehandles
17047	   from expiring in this fashion.

17049	   On some servers, the file names "." and ".." are illegal as either
17050	   oldname or newname, and will result in the error NFS4ERR_BADNAME.  In
17051	   addition, on many servers the case of oldname or newname being an
17052	   alias for the source directory will be checked for.  Such servers
17053	   will return the error NFS4ERR_INVAL in these cases.

17055	   If either of the source or target filehandles are not directories,
17056	   the server will return NFS4ERR_NOTDIR.

17058	17.27.  Operation 31: RESTOREFH - Restore Saved Filehandle

17060	17.27.1.  SYNOPSIS

17062	   (sfh) -> (cfh)

17064	17.27.2.  ARGUMENTS

17066	   /* SAVED_FH: */
17067	   void;

17069	17.27.3.  RESULTS

17071	   /*
17072	    * RESTOREFH: Restore saved filehandle
17073	    */

17075	   struct RESTOREFH4res {
17076	           /* CURRENT_FH: value of saved fh */
17077	           nfsstat4        status;
17078	   };

17080	17.27.4.  DESCRIPTION

17082	   Set the current filehandle to the value in the saved filehandle.  If
17083	   there is no saved filehandle then return the error NFS4ERR_RESTOREFH.

17085	17.27.5.  IMPLEMENTATION

17087	   Operations like OPEN and LOOKUP use the current filehandle to
17088	   represent a directory and replace it with a new filehandle.  Assuming
17089	   the previous filehandle was saved with a SAVEFH operator, the
17090	   previous filehandle can be restored as the current filehandle.  This
17091	   is commonly used to obtain post-operation attributes for the
17092	   directory, e.g.

17094	         PUTFH (directory filehandle)
17095	         SAVEFH
17096	         GETATTR attrbits     (pre-op dir attrs)
17097	         CREATE optbits "foo" attrs
17098	         GETATTR attrbits     (file attributes)
17099	         RESTOREFH
17100	         GETATTR attrbits     (post-op dir attrs)

17102	17.27.6.  ERRORS

17104	17.28.  Operation 32: SAVEFH - Save Current Filehandle

17106	17.28.1.  SYNOPSIS

17108	   (cfh) -> (sfh)

17110	17.28.2.  ARGUMENTS

17112	   /* CURRENT_FH: */
17113	   void;

17115	17.28.3.  RESULTS

17117	   /*
17118	    * SAVEFH: Save current filehandle
17119	    */
17120	   struct SAVEFH4res {
17121	           /* SAVED_FH: value of current fh */
17122	           nfsstat4        status;
17123	   };

17125	17.28.4.  DESCRIPTION

17127	   Save the current filehandle.  If a previous filehandle was saved then
17128	   it is no longer accessible.  The saved filehandle can be restored as
17129	   the current filehandle with the RESTOREFH operator.

17131	   On success, the current filehandle retains its value.

17133	17.28.5.  IMPLEMENTATION

17135	17.29.  Operation 33: SECINFO - Obtain Available Security

17137	17.29.1.  SYNOPSIS

17139	   (cfh), name -> { secinfo }

17141	17.29.2.  ARGUMENTS

17143	   /*
17144	    * SECINFO: Obtain Available Security Mechanisms
17145	    */
17146	   struct SECINFO4args {
17147	           /* CURRENT_FH: directory */
17148	           component4      name;
17149	   };

17151	17.29.3.  RESULTS

17153	   /*
17154	    * From RFC 2203
17155	    */
17156	   enum rpc_gss_svc_t {
17157	           RPC_GSS_SVC_NONE        = 1,
17158	           RPC_GSS_SVC_INTEGRITY   = 2,
17159	           RPC_GSS_SVC_PRIVACY     = 3
17160	   };

17162	   struct rpcsec_gss_info {
17163	           sec_oid4        oid;
17164	           qop4            qop;
17165	           rpc_gss_svc_t   service;
17166	   };

17168	   /* RPCSEC_GSS has a value of '6' - See RFC 2203 */
17169	   union secinfo4 switch (uint32_t flavor) {
17170	    case RPCSEC_GSS:
17171	            rpcsec_gss_info        flavor_info;
17172	    default:
17173	            void;
17174	   };

17176	   typedef secinfo4 SECINFO4resok<>;

17178	   union SECINFO4res switch (nfsstat4 status) {
17179	    case NFS4_OK:
17180	            SECINFO4resok resok4;
17181	    default:
17182	            void;
17183	   };

17185	17.29.4.  DESCRIPTION

17187	   The SECINFO operation is used by the client to obtain a list of valid
17188	   RPC authentication flavors for a specific directory filehandle, file
17189	   name pair.  SECINFO should apply the same access methodology used for
17190	   LOOKUP when evaluating the name.  Therefore, if the requester does
17191	   not have the appropriate access to LOOKUP the name then SECINFO must
17192	   behave the same way and return NFS4ERR_ACCESS.

17194	   The result will contain an array which represents the security
17195	   mechanisms available, with an order corresponding to the server's
17196	   preferences, the most preferred being first in the array.  The client
17197	   is free to pick whatever security mechanism it both desires and
17198	   supports, or to pick in the server's preference order the first one
17199	   it supports.  The array entries are represented by the secinfo4
17200	   structure.  The field 'flavor' will contain a value of AUTH_NONE,
17201	   AUTH_SYS (as defined in RFC1831 [4]), or RPCSEC_GSS (as defined in
17202	   RFC2203 [5]).  The field flavor can also any other security flavor
17203	   registered with IANA.

17205	   For the flavors AUTH_NONE and AUTH_SYS, no additional security
17206	   information is returned.  The same is true of many (if not most)
17207	   other security flavors, including AUTH_DH.  For a return value of
17208	   RPCSEC_GSS, a security triple is returned that contains the mechanism
17209	   object id (as defined in RFC2743 [8]), the quality of protection (as
17210	   defined in RFC2743 [8]) and the service type (as defined in RFC2203
17211	   [5]).  It is possible for SECINFO to return multiple entries with
17212	   flavor equal to RPCSEC_GSS with different security triple values.

17214	   On success, the current filehandle retains its value.

17216	   If the name has a length of 0 (zero), or if name does not obey the
17217	   UTF-8 definition, the error NFS4ERR_INVAL will be returned.

17219	17.29.5.  IMPLEMENTATION

17221	   The SECINFO operation is expected to be used by the NFS client when
17222	   the error value of NFS4ERR_WRONGSEC is returned from another NFS
17223	   operation.  This signifies to the client that the server's security
17224	   policy is different from what the client is currently using.  At this
17225	   point, the client is expected to obtain a list of possible security
17226	   flavors and choose what best suits its policies.

17228	   As mentioned, the server's security policies will determine when a
17229	   client request receives NFS4ERR_WRONGSEC.  The operations which may
17230	   receive this error are: LINK, LOOKUP, LOOKUPP, OPEN, PUTFH, PUTPUBFH,
17231	   PUTROOTFH, RESTOREFH, RENAME, and indirectly READDIR.  LINK and
17232	   RENAME will only receive this error if the security used for the
17233	   operation is inappropriate for saved filehandle.  With the exception
17234	   of READDIR, these operations represent the point at which the client
17235	   can instantiate a filehandle into the "current filehandle" at the
17236	   server.  The filehandle is either provided by the client (PUTFH,
17237	   PUTPUBFH, PUTROOTFH) or generated as a result of a name to filehandle
17238	   translation (LOOKUP and OPEN).  RESTOREFH is different because the
17239	   filehandle is a result of a previous SAVEFH.  Even though the
17240	   filehandle, for RESTOREFH, might have previously passed the server's
17241	   inspection for a security match, the server will check it again on
17242	   RESTOREFH to ensure that the security policy has not changed.

17244	   If the client wants to resolve an error return of NFS4ERR_WRONGSEC,
17245	   the following will occur:

17247	   o  For LOOKUP and OPEN, the client will use SECINFO with the same
17248	      current filehandle and name as provided in the original LOOKUP or
17249	      OPEN to enumerate the available security triples.

17251	   o  For LINK, PUTFH, PUTROOTFH, PUTPUBFH, RENAME, and RESTOREFH, the
17252	      client will use SECINFO_NO_NAME { style =
17253	      SECINFO_STYLE4_CURRENT_FH }.  The client will prefix the
17254	      SECINFO_NO_NAME operation with the appropriate PUTFH, PUTPUBFH, or
17255	      PUTROOTFH operation that provides the filehandle originally
17256	      provided by the PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH, or for
17257	      the failed LINK or RENAME, the SAVEFH.

17259	   o  NOTE: In NFSv4.0, the client was required to use SECINFO, and had
17260	      to reconstruct the parent of the original file handle, and the
17261	      component name of the original filehandle.

17263	   o  For LOOKUPP, the client will use SECINFO_NO_NAME { style =
17264	      SECINFO_STYLE4_PARENT } and provide the filehandle with equals the
17265	      filehandle originally provided to LOOKUPP.

17267	   The READDIR operation will not directly return the NFS4ERR_WRONGSEC
17268	   error.  However, if the READDIR request included a request for
17269	   attributes, it is possible that the READDIR request's security triple
17270	   did not match that of a directory entry.  If this is the case and the
17271	   client has requested the rdattr_error attribute, the server will
17272	   return the NFS4ERR_WRONGSEC error in rdattr_error for the entry.

17274	   See the section "Security Considerations" for a discussion on the
17275	   recommendations for security flavor used by SECINFO and
17276	   SECINFO_NO_NAME.

17278	17.30.  Operation 34: SETATTR - Set Attributes

17280	17.30.1.  SYNOPSIS

17282	   (cfh), stateid, attrmask, attr_vals -> attrsset

17284	17.30.2.  ARGUMENTS

17286	   /*
17287	    * SETATTR: Set attributes
17288	    */
17289	   struct SETATTR4args {
17290	           /* CURRENT_FH: target object */
17291	           stateid4        stateid;
17292	           fattr4          obj_attributes;
17293	   };

17295	17.30.3.  RESULTS

17297	   struct SETATTR4res {
17298	           nfsstat4        status;
17299	           bitmap4         attrsset;
17300	   };

17302	17.30.4.  DESCRIPTION

17304	   The SETATTR operation changes one or more of the attributes of a file
17305	   system object.  The new attributes are specified with a bitmap and
17306	   the attributes that follow the bitmap in bit order.

17308	   The stateid argument for SETATTR is used to provide file locking
17309	   context that is necessary for SETATTR requests that set the size
17310	   attribute.  Since setting the size attribute modifies the file's
17311	   data, it has the same locking requirements as a corresponding WRITE.
17312	   Any SETATTR that sets the size attribute is incompatible with a share
17313	   reservation that specifies DENY_WRITE.  The area between the old end-
17314	   of-file and the new end-of-file is considered to be modified just as
17315	   would have been the case had the area in question been specified as
17316	   the target of WRITE, for the purpose of checking conflicts with
17317	   record locks, for those cases in which a server is implementing
17318	   mandatory record locking behavior.  A valid stateid should always be
17319	   specified.  When the file size attribute is not set, the special
17320	   stateid consisting of all bits zero should be passed.

17322	   On either success or failure of the operation, the server will return
17323	   the attrsset bitmask to represent what (if any) attributes were
17324	   successfully set.  The attrsset in the response is a subset of the
17325	   bitmap4 that is part of the obj_attributes in the argument.

17327	   On success, the current filehandle retains its value.

17329	17.30.5.  IMPLEMENTATION

17331	   If the request specifies the owner attribute to be set, the server
17332	   should allow the operation to succeed if the current owner of the
17333	   object matches the value specified in the request.  Some servers may
17334	   be implemented in a way as to prohibit the setting of the owner
17335	   attribute unless the requester has privilege to do so.  If the server
17336	   is lenient in this one case of matching owner values, the client
17337	   implementation may be simplified in cases of creation of an object
17338	   followed by a SETATTR.

17340	   The file size attribute is used to request changes to the size of a
17341	   file.  A value of 0 (zero) causes the file to be truncated, a value
17342	   less than the current size of the file causes data from new size to
17343	   the end of the file to be discarded, and a size greater than the
17344	   current size of the file causes logically zeroed data bytes to be
17345	   added to the end of the file.  Servers are free to implement this
17346	   using holes or actual zero data bytes.  Clients should not make any
17347	   assumptions regarding a server's implementation of this feature,
17348	   beyond that the bytes returned will be zeroed.  Servers must support
17349	   extending the file size via SETATTR.

17351	   SETATTR is not guaranteed atomic.  A failed SETATTR may partially
17352	   change a file's attributes.

17354	   Changing the size of a file with SETATTR indirectly changes the
17355	   time_modify.  A client must account for this as size changes can
17356	   result in data deletion.

17358	   The attributes time_access_set and time_modify_set are write-only
17359	   attributes constructed as a switched union so the client can direct
17360	   the server in setting the time values.  If the switched union
17361	   specifies SET_TO_CLIENT_TIME4, the client has provided an nfstime4 to
17362	   be used for the operation.  If the switch union does not specify
17363	   SET_TO_CLIENT_TIME4, the server is to use its current time for the
17364	   SETATTR operation.

17366	   If server and client times differ, programs that compare client time
17367	   to file times can break.  A time maintenance protocol should be used
17368	   to limit client/server time skew.

17370	   Use of a COMPOUND containing a VERIFY operation specifying only the
17371	   change attribute, immediately followed by a SETATTR, provides a means
17372	   whereby a client may specify a request that emulates the
17373	   functionality of the SETATTR guard mechanism of NFS version 3.  Since
17374	   the function of the guard mechanism is to avoid changes to the file
17375	   attributes based on stale information, delays between checking of the
17376	   guard condition and the setting of the attributes have the potential
17377	   to compromise this function, as would the corresponding delay in the
17378	   NFS version 4 emulation.  Therefore, NFS version 4 servers should
17379	   take care to avoid such delays, to the degree possible, when
17380	   executing such a request.

17382	   If the server does not support an attribute as requested by the
17383	   client, the server should return NFS4ERR_ATTRNOTSUPP.

17385	   A mask of the attributes actually set is returned by SETATTR in all
17386	   cases.  That mask must not include attributes bits not requested to
17387	   be set by the client, and must be equal to the mask of attributes
17388	   requested to be set only if the SETATTR completes without error.

17390	17.31.  Operation 37: VERIFY - Verify Same Attributes

17392	17.31.1.  SYNOPSIS

17394	   (cfh), fattr -> -

17396	17.31.2.  ARGUMENTS

17398	   /*
17399	    * VERIFY: Verify attributes same
17400	    */
17401	   struct VERIFY4args {
17402	           /* CURRENT_FH: object */
17403	           fattr4          obj_attributes;
17404	   };

17406	17.31.3.  RESULTS

17408	   struct VERIFY4res {
17409	           nfsstat4        status;
17410	   };

17412	17.31.4.  DESCRIPTION

17414	   The VERIFY operation is used to verify that attributes have a value
17415	   assumed by the client before proceeding with following operations in
17416	   the compound request.  If any of the attributes do not match then the
17417	   error NFS4ERR_NOT_SAME must be returned.  The current filehandle
17418	   retains its value after successful completion of the operation.

17420	17.31.5.  IMPLEMENTATION

17422	   One possible use of the VERIFY operation is the following compound
17423	   sequence.  With this the client is attempting to verify that the file
17424	   being removed will match what the client expects to be removed.  This
17425	   sequence can help prevent the unintended deletion of a file.

17427	         PUTFH (directory filehandle)
17428	         LOOKUP (file name)
17429	         VERIFY (filehandle == fh)
17430	         PUTFH (directory filehandle)
17431	         REMOVE (file name)

17433	   This sequence does not prevent a second client from removing and
17434	   creating a new file in the middle of this sequence but it does help
17435	   avoid the unintended result.

17437	   In the case that a recommended attribute is specified in the VERIFY
17438	   operation and the server does not support that attribute for the file
17439	   system object, the error NFS4ERR_ATTRNOTSUPP is returned to the
17440	   client.

17442	   When the attribute rdattr_error or any write-only attribute (e.g.
17443	   time_modify_set) is specified, the error NFS4ERR_INVAL is returned to
17444	   the client.

17446	17.32.  Operation 38: WRITE - Write to File

17448	17.32.1.  SYNOPSIS

17450	   (cfh), stateid, offset, stable, data -> count, committed, writeverf

17452	17.32.2.  ARGUMENTS

17454	   /*
17455	    * WRITE: Write to file
17456	    */
17457	   enum stable_how4 {
17458	           UNSTABLE4       = 0,
17459	           DATA_SYNC4      = 1,
17460	           FILE_SYNC4      = 2
17461	   };

17463	   struct WRITE4args {
17464	           /* CURRENT_FH: file */
17465	           stateid4        stateid;
17466	           offset4         offset;
17467	           stable_how4     stable;
17468	           opaque          data<>;
17469	   };

17471	17.32.3.  RESULTS

17473	   struct WRITE4resok {
17474	           count4          count;
17475	           stable_how4     committed;
17476	           verifier4       writeverf;
17477	   };

17479	   union WRITE4res switch (nfsstat4 status) {
17480	    case NFS4_OK:
17481	            WRITE4resok    resok4;
17482	    default:
17483	            void;
17484	   };

17486	17.32.4.  DESCRIPTION

17488	   The WRITE operation is used to write data to a regular file.  The
17489	   target file is specified by the current filehandle.  The offset
17490	   specifies the offset where the data should be written.  An offset of
17491	   0 (zero) specifies that the write should start at the beginning of
17492	   the file.  The count, as encoded as part of the opaque data
17493	   parameter, represents the number of bytes of data that are to be
17494	   written.  If the count is 0 (zero), the WRITE will succeed and return
17495	   a count of 0 (zero) subject to permissions checking.  The server may
17496	   choose to write fewer bytes than requested by the client.

17498	   Part of the write request is a specification of how the write is to
17499	   be performed.  The client specifies with the stable parameter the
17500	   method of how the data is to be processed by the server.  If stable
17501	   is FILE_SYNC4, the server must commit the data written plus all file
17502	   system metadata to stable storage before returning results.  This
17503	   corresponds to the NFS version 2 protocol semantics.  Any other
17504	   behavior constitutes a protocol violation.  If stable is DATA_SYNC4,
17505	   then the server must commit all of the data to stable storage and
17506	   enough of the metadata to retrieve the data before returning.  The
17507	   server implementor is free to implement DATA_SYNC4 in the same
17508	   fashion as FILE_SYNC4, but with a possible performance drop.  If
17509	   stable is UNSTABLE4, the server is free to commit any part of the
17510	   data and the metadata to stable storage, including all or none,
17511	   before returning a reply to the client.  There is no guarantee
17512	   whether or when any uncommitted data will subsequently be committed
17513	   to stable storage.  The only guarantees made by the server are that
17514	   it will not destroy any data without changing the value of verf and
17515	   that it will not commit the data and metadata at a level less than
17516	   that requested by the client.

17518	   The stateid value for a WRITE request represents a value returned
17519	   from a previous record lock or share reservation request.  The
17520	   stateid is used by the server to verify that the associated share
17521	   reservation and any record locks are still valid and to update lease
17522	   timeouts for the client.

17524	   Upon successful completion, the following results are returned.  The
17525	   count result is the number of bytes of data written to the file.  The
17526	   server may write fewer bytes than requested.  If so, the actual
17527	   number of bytes written starting at location, offset, is returned.

17529	   The server also returns an indication of the level of commitment of
17530	   the data and metadata via committed.  If the server committed all
17531	   data and metadata to stable storage, committed should be set to
17532	   FILE_SYNC4.  If the level of commitment was at least as strong as
17533	   DATA_SYNC4, then committed should be set to DATA_SYNC4.  Otherwise,
17534	   committed must be returned as UNSTABLE4.  If stable was FILE4_SYNC,
17535	   then committed must also be FILE_SYNC4: anything else constitutes a
17536	   protocol violation.  If stable was DATA_SYNC4, then committed may be
17537	   FILE_SYNC4 or DATA_SYNC4: anything else constitutes a protocol
17538	   violation.  If stable was UNSTABLE4, then committed may be either
17539	   FILE_SYNC4, DATA_SYNC4, or UNSTABLE4.

17541	   The final portion of the result is the write verifier.  The write
17542	   verifier is a cookie that the client can use to determine whether the
17543	   server has changed instance (boot) state between a call to WRITE and
17544	   a subsequent call to either WRITE or COMMIT.  This cookie must be
17545	   consistent during a single instance of the NFS version 4 protocol
17546	   service and must be unique between instances of the NFS version 4
17547	   protocol server, where uncommitted data may be lost.

17549	   If a client writes data to the server with the stable argument set to
17550	   UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or
17551	   UNSTABLE4, the client will follow up some time in the future with a
17552	   COMMIT operation to synchronize outstanding asynchronous data and
17553	   metadata with the server's stable storage, barring client error.  It
17554	   is possible that due to client crash or other error that a subsequent
17555	   COMMIT will not be received by the server.

17557	   For a WRITE with a stateid value of all bits 0, the server MAY allow
17558	   the WRITE to be serviced subject to mandatory file locks or the
17559	   current share deny modes for the file.  For a WRITE with a stateid
17560	   value of all bits 1, the server MUST NOT allow the WRITE operation to
17561	   bypass locking checks at the server and are treated exactly the same
17562	   as if a stateid of all bits 0 were used.

17564	   On success, the current filehandle retains its value.

17566	17.32.5.  IMPLEMENTATION

17568	   It is possible for the server to write fewer bytes of data than
17569	   requested by the client.  In this case, the server should not return
17570	   an error unless no data was written at all.  If the server writes
17571	   less than the number of bytes specified, the client should issue
17572	   another WRITE to write the remaining data.

17574	   It is assumed that the act of writing data to a file will cause the
17575	   time_modified of the file to be updated.  However, the time_modified
17576	   of the file should not be changed unless the contents of the file are
17577	   changed.  Thus, a WRITE request with count set to 0 should not cause
17578	   the time_modified of the file to be updated.

17580	   The definition of stable storage has been historically a point of
17581	   contention.  The following expected properties of stable storage may
17582	   help in resolving design issues in the implementation.  Stable
17583	   storage is persistent storage that survives:

17585	   1.  Repeated power failures.

17587	   2.  Hardware failures (of any board, power supply, etc.).

17589	   3.  Repeated software crashes, including reboot cycle.

17591	   This definition does not address failure of the stable storage module
17592	   itself.

17594	   The verifier is defined to allow a client to detect different
17595	   instances of an NFS version 4 protocol server over which cached,
17596	   uncommitted data may be lost.  In the most likely case, the verifier
17597	   allows the client to detect server reboots.  This information is
17598	   required so that the client can safely determine whether the server
17599	   could have lost cached data.  If the server fails unexpectedly and
17600	   the client has uncommitted data from previous WRITE requests (done
17601	   with the stable argument set to UNSTABLE4 and in which the result
17602	   committed was returned as UNSTABLE4 as well) it may not have flushed
17603	   cached data to stable storage.  The burden of recovery is on the
17604	   client and the client will need to retransmit the data to the server.

17606	   A suggested verifier would be to use the time that the server was
17607	   booted or the time the server was last started (if restarting the
17608	   server without a reboot results in lost buffers).

17610	   The committed field in the results allows the client to do more
17611	   effective caching.  If the server is committing all WRITE requests to
17612	   stable storage, then it should return with committed set to
17613	   FILE_SYNC4, regardless of the value of the stable field in the
17614	   arguments.  A server that uses an NVRAM accelerator may choose to
17615	   implement this policy.  The client can use this to increase the
17616	   effectiveness of the cache by discarding cached data that has already
17617	   been committed on the server.

17619	   Some implementations may return NFS4ERR_NOSPC instead of
17620	   NFS4ERR_DQUOT when a user's quota is exceeded.  In the case that the
17621	   current filehandle is a directory, the server will return
17622	   NFS4ERR_ISDIR.  If the current filehandle is not a regular file or a
17623	   directory, the server will return NFS4ERR_INVAL.

17625	   If mandatory file locking is on for the file, and corresponding
17626	   record of the data to be written file is read or write locked by an
17627	   owner that is not associated with the stateid, the server will return
17628	   NFS4ERR_LOCKED.  If so, the client must check if the owner
17629	   corresponding to the stateid used with the WRITE operation has a
17630	   conflicting read lock that overlaps with the region that was to be
17631	   written.  If the stateid's owner has no conflicting read lock, then
17632	   the client should try to get the appropriate write record lock via
17633	   the LOCK operation before re-attempting the WRITE.  When the WRITE
17634	   completes, the client should release the record lock via LOCKU.

17636	   If the stateid's owner had a conflicting read lock, then the client
17637	   has no choice but to return an error to the application that
17638	   attempted the WRITE.  The reason is that since the stateid's owner
17639	   had a read lock, the server either attempted to temporarily
17640	   effectively upgrade this read lock to a write lock, or the server has
17641	   no upgrade capability.  If the server attempted to upgrade the read
17642	   lock and failed, it is pointless for the client to re-attempt the
17643	   upgrade via the LOCK operation, because there might be another client
17644	   also trying to upgrade.  If two clients are blocked trying upgrade
17645	   the same lock, the clients deadlock.  If the server has no upgrade
17646	   capability, then it is pointless to try a LOCK operation to upgrade.

17648	17.33.  Operation 40: BACKCHANNEL_CTL - Backchannel control

17650	   Control aspects of the backchannel

17652	17.33.1.  SYNOPSIS

17654	   callback program number, credentials -> -

17656	17.33.2.  ARGUMENT

17658	   /*
17659	    * NFSv4.1 arguments and results
17660	    */
17661	   struct gss_cb_handles4 {
17662	           rpc_gss_svc_t           gcbp_service; /* RFC 2203 */
17663	           opaque                  gcbp_handle_from_server<>;
17664	           opaque                  gcbp_handle_from_client<>;
17665	   };

17667	   union callback_sec_parms4 switch (uint32_t cb_secflavor) {
17668	   case AUTH_NONE:
17669	           void;
17670	   case AUTH_SYS:
17671	           authsys_parms cbsp_sys_cred; /* RFC 1831 */
17672	   case RPCSEC_GSS:
17673	           gss_cb_handles4 cbsp_gss_handles;
17674	   };

17676	   struct BACKCHANNEL_CTL4args {
17677	           uint32_t                bca_cb_program;
17678	           callback_sec_parms4     bca_sec_parms<>;
17679	   };

17681	17.33.3.  RESULT

17683	   struct BACKCHANNEL_CTL4res {
17684	           nfsstat4                bcr_status;
17685	   };

17687	17.33.4.  DESCRIPTION

17689	   The BACKCHANNEL_CTL operation replaces the backchannel's callback
17690	   program number and adds (not replaces) RPCSEC_GSS contexts for use by
17691	   the callback path.

17693	   The arguments and results of the BACKCHANNEL_CTL call are a subset of
17694	   the CREATE_SESSION parameters and have the same meaning.  See the
17695	   descriptions of csa_cb_program and csa_cb_sec_parms in
17696	   Section 17.36.5.

17698	   BACKCHANNEL_CTL MUST appear in a COMPOUND that starts with SEQUENCE.

17700	17.33.5.  ERRORS

17702	   TBD

17704	17.34.  Operation 41: BIND_CONN_TO_SESSION

17706	17.34.1.  SYNOPSIS

17708	   sessionid, nonce, digest -> nonce, digest

17710	17.34.2.  ARGUMENT

17712	   struct bctsa_digest_input4 {
17713	           sessionid4 bdai_sessid;
17714	           uint64_t   bdai_nonce1;
17715	           uint64_t   bdai_nonce2;
17716	   };

17718	   enum channel_dir_from_client4 {
17719	           CDFC4_FORE              = 0x1,
17720	           CDFC4_BACK              = 0x2,
17721	           CDFC4_FORE_OR_BOTH      = 0x3,
17722	           CDFC4_BACK_OR_BOTH      = 0x7
17723	   };

17725	   struct BIND_CONN_TO_SESSION4args {
17726	           sessionid4                      bctsa_sessid;
17727	           bool                            bctsa_step1;
17728	           channel_dir_from_client4        bctsa_dir;
17729	           bool                            bctsa_use_conn_in_rdma_mode;
17730	           uint64_t                        bctsa_nonce;
17731	           opaque                          bctsa_digest<>;
17732	   };

17734	17.34.3.  RESULT

17736	   struct bctsr_digest_input4 {
17737	           sessionid4 bdri_sessid;
17738	           uint64_t   bdri_nonce1;
17739	           uint64_t   bdri_nonce2;
17740	   };

17742	   enum channel_dir_from_server4 {
17743	           CDFS4_FORE        = 0x1,
17744	           CDFS4_BACK        = 0x2,
17745	           CDFS4_BOTH        = 0x3
17746	   };

17748	   struct BIND_CONN_TO_SESSION4resok {
17749	           sessionid4                      bctsr_sessid;
17750	           bool                            bctsr_challenge;
17751	           channel_dir_from_server4        bctsr_dir;
17752	           bool                            bctsr_use_conn_in_rdma_mode;
17753	           uint64_t                        bctsr_nonce;
17754	           opaque                          bctsr_digest<>;
17755	   };

17757	   union BIND_CONN_TO_SESSION4res switch (nfsstat4 bctsr_status) {
17758	   case NFS4_OK:
17759	           BIND_CONN_TO_SESSION4resok      bctsr_resok4;
17760	   default:
17761	           void;
17762	   };

17764	17.34.4.  DESCRIPTION

17766	   BIND_CONN_TO_SESSION is used to bind additional connections to a
17767	   session.  It MUST be used on the connection being bound.  It MUST be
17768	   the only operation in the COMPOUND procedure.  Any principal,
17769	   security flavor, or RPCSEC_GSS context can invoke the operation.

17771	   If when the session was created, the client opted to not enable
17772	   enforcement of connection binding (see Section 17.36), the client is
17773	   not required to use BIND_CONN_TO_SESSION, unless the client wishes to
17774	   the bind the connection to the backchannel.  In that case, because
17775	   the client did not enable connection binding enforcement, it selected
17776	   no hash algorithms for digest computation.  Thus bctsa_digest and
17777	   bctsr_digest will be zero length, and the neither the client or
17778	   server verifies either digest.

17780	   If the client enabled enforcement of connection binding, then to
17781	   prevent replay attacks, BIND_CONN_TO_SESSION implements a challenge
17782	   response protocol.  This means that the client may be directed to
17783	   issue BIND_CONN_TO_SESSION a second time on the same connection
17784	   before the connection is bound to the session.  The client is first
17785	   returned a challenge value in bctsr_nonce, and the client must then
17786	   calculate a digest using SSV as the key, and the challenge value and
17787	   other information as the input text.  Since the server is free to
17788	   generate nonce values that are unlikely to be re-used, this prevents
17789	   attackers from engaging in replay attacks to bind rogue connections
17790	   to the session.

17792	   bctsa_sessid identifies the session the connection is to be bound to.

17794	   If bctsa_step1 is TRUE, then the client is trying to initiate a
17795	   binding of a connection to a session.

17797	   bctsa_nonce is a nonce used to deter replay attacks on the server.
17798	   If bctsa_step1 is FALSE, bctsa_nonce MUST be different from the
17799	   bctsa_nonce value for a previous BIND_CONN_TO_SESSION operation that
17800	   had bctsa_step1 set to TRUE.

17802	   bctsa_digest is computed as the output of the HMAC RFC2104 [14] using
17803	   the current SSV as the key, and the XDR encoded value of data of type
17804	   bctsa_digest_input4 as the input text.

17806	   bdai_sessid is the same as bctsa_sessid. bdai_nonce1 is the same as
17807	   bctsa_nonce.  If bctsa_step1 was TRUE, then bdai_nonce2 is zero.
17808	   Otherwise, bdai_nonce2 is the same as bctsr_nonce from previous
17809	   response to BIND_CONN_TO_SESSION on the same connection and
17810	   sessionid.

17812	   In the response, bctsr_challenge is set to TRUE if the server is
17813	   challenging the client to prove it is not attempting a replay attack.
17814	   If it is set to true, the client MUST follow up with a
17815	   BIND_CONN_TO_SESSION request with bctsda_step1 set to FALSE.  If
17816	   bctstr_challenge is set to FALSE, the server is either not
17817	   challenging the client, or the response is in response to a
17818	   challenge.

17820	   bctsr_nonce, MUST NOT be equal to bctsa_nonce and is a nonce used to
17821	   deter replay attacks on the client and server.

17823	   bctsr_digest is the output of the HMAC using the SSV as the key, and
17824	   the XDR encoded value of data type bctsr_digest_input as the input
17825	   text.

17827	   bdri_sessid is the same as bctsr_sessid which in turn should be the
17828	   same as bctsa_sessid. bdri_nonce1 is the same as bctsr_nonce.

17830	   bdri_nonce2 is the same as bctsa_nonce.  If bctsr_challenge is TRUE,
17831	   bdri_nonce3 is zero.  Otherwise bdri_nonce3 is equal to the value of
17832	   bctsa_nonce as sent in the preceding BIND_CONN_TO_SESSION that had
17833	   bctsa_step1 set to TRUE.

17835	   If server's computation of bctsa_digest does not match that in the
17836	   arguments, the server MUST return NFS4ERR_BAD_SESSION_DIGEST.

17838	   bctsa_dir indicates whether the client wants to bind the connection
17839	   to the fore (operations) channel or back channel or both channels.
17840	   The value CDFC4_FORE_OR_BOTH indicates the client wants to bind to
17841	   the both the fore and back channel, but will accept the connection
17842	   being bound to just the fore channel.  The value CDFC4_BACK_OR_BOTH
17843	   indicates the client wants to bind to the both the fore and back
17844	   channel, but will accept the connection being bound to the back
17845	   channel.  The server replies in bctsr_dir which channel(s) the
17846	   connection is bound to (but bctsr_dir is only meaningful if
17847	   bctsr_challenge is FALSE).  If the client specified CDFC4_FORE, the
17848	   server MUST return CDFS4_FORE.  If the client specified CDFC4_BACK,
17849	   the server MUST return CDFS4_BACK.  If the client specified
17850	   CDFC4_FORE_OR_BOTH, the MUST return CDFS4_FORE ur CDFS4_BOTH.  If the
17851	   client specified CDFC4_BACK_OR_BOTH, the server MUST return
17852	   CDFS4_BACK or CDFS4_BOTH.  Note that if BIND_CONN_TO_SESSION has to
17853	   be called in two steps, the server only processes the bctsa_dir value
17854	   from the second step, and the client only processes the bctsr_dir
17855	   from the second step.

17857	   See the CREATE_SESSION operation (Section 17.36), and the description
17858	   of the argument csa_use_conn_in_rdma_mode to understand
17859	   bctsa_use_conn_in_rdma_mode, and the description of
17860	   csr_use_conn_in_rdma_mode to understand bctsr_use_conn_in_rdma_mode.

17862	17.34.5.  IMPLEMENTATION

17864	   If the client's computation of bctsr_digest does not match that in
17865	   the results, the client SHOULD NOT accept successful
17866	   BIND_CONN_TO_SESSION results, and SHOULD assume there has been an
17867	   attack.  Possibilities include:

17869	   o  The attacker has managed to change the SSV, by binding another
17870	      connection.

17872	   o  The attacker has not managed to change the SSV.

17874	   The client recovers from a possible attack as follows.

17876	   The client can issue SET_SSV to attempt to change the SSV.  If SSV is
17877	   changed successfully, including verification of the digest in the
17878	   response to SET_SSV, then this means the attacker did not change the
17879	   SSV.  Thus the attacker has managed to hijack the connection.  The
17880	   client's only recourse is to disconnect, and bind a new connection.
17881	   Using IPsec to protect the connection will prevent connection
17882	   hijacking.

17884	   If SET_SSV fails, or the verification of the digest in the response
17885	   fails, the attacker has changed the SSV.  The client's only recourse
17886	   is to recreate the session.

17888	   If the client loses all connections, it needs to use
17889	   BIND_CONN_TO_SESSION to bind a new connection.  The server will not
17890	   have the SSV if the server has rebooted and the server doesn't keep
17891	   the replay cache in stable storage.  In that event, the preceding
17892	   SEQUENCE op in the same compound will have returned
17893	   NFS4ERR_BADSESSION, so the client's state machine goes back to
17894	   CREATE_SESSION.

17896	   There is an issue if SET_SSV is sent, no response is returned, and
17897	   the last bound connection disconnects.  The client, per the sessions
17898	   model, needs to retry the SET_SSV.  But it needs a new connection to
17899	   do so, and needs to bind that connection to the session.  The problem
17900	   is that the digest calculation for BIND_CONN_TO_SESSION uses the SSV
17901	   as the key, and the SSV may have changed.  While there are multiple
17902	   recovery strategies, a single, general strategy is described here.
17903	   First the client reconnects.  The client issues BIND_CONN_TO_SESSION
17904	   with the new SSV used as the digest.  If the server returns
17905	   NFS4ERR_BAD_SESSION_DIGEST then this means the server's current SSV
17906	   was not changed, and the SET_SSV was not executed.  The client then
17907	   tries BIND_CONN_TO_SESSION with the old SSV as the digest key.  This
17908	   should not return NFS4ERR_BAD_SESSION_DIGEST.  If it does, an
17909	   implementation error has occurred on either the client or server, and
17910	   the client has to create a new session.

17912	17.34.6.  ERRORS

17914	   error list

17916	17.35.  Operation 42: EXCHANGE_ID - Instantiate Client ID

17918	   Exchange long hand client and server identifiers (owners), and create
17919	   a client ID

17921	17.35.1.  SYNOPSIS

17923	   client owner -> client ID, server owner

17925	17.35.2.  ARGUMENT

17927	   const EXCHGID4_FLAG_SUPP_MOVED_REFER    = 0x00000001;
17928	   const EXCHGID4_FLAG_SUPP_MOVED_MIGR     = 0x00000002;

17930	   const EXCHGID4_FLAG_USE_NON_PNFS        = 0x00010000;
17931	   const EXCHGID4_FLAG_USE_PNFS_MDS        = 0x00020000;
17932	   const EXCHGID4_FLAG_USE_PNFS_DS         = 0x00040000;

17934	   struct EXCHANGE_ID4args {
17935	           client_owner4   eia_clientowner;
17936	           uint32_t        eia_flags;
17937	           nfs_impl_id4    eia_client_impl_id<1>;
17938	   };

17940	17.35.3.  RESULT

17942	   struct server_owner4 {
17943	           uint64_t        so_minor_id;
17944	           opaque          so_major_id<NFS4_OPAQUE_LIMIT>;
17945	   };

17947	   struct EXCHANGE_ID4resok {
17948	           clientid4               eir_clientid;
17949	           sequenceid4             eir_sequenceid;
17950	           uint32_t                eir_flags;
17951	           server_owner4           eir_server_owner;
17952	           opaque                  eir_server_scope<NFS4_OPAQUE_LIMIT>;
17953	           nfs_impl_id4            eir_server_impl_id<1>;
17954	   };

17956	   union EXCHANGE_ID4res switch (nfsstat4 eir_status) {
17957	   case NFS4_OK:
17958	           EXCHANGE_ID4resok       eir_resok4;
17959	   default:
17960	           void;
17961	   };

17963	17.35.4.  DESCRIPTION

17965	   The client uses the EXCHANGE_ID operation to register a particular
17966	   client owners with the server.  The client ID returned from this
17967	   operation will be necessary for requests that create state on the
17968	   server and will serve as a parent object to sessions created by the
17969	   client.  In order to confirm the client ID it and the returned
17970	   sequenceid must first be used as an argument to CREATE_SESSION.

17972	   The flags passed as part of the arguments and results to the
17973	   EXCHANGE_ID operation allow the client and server inform each other
17974	   of their capabilities as well as indicate how the client ID will be
17975	   used.  Whether a bit is set or cleared on the arguments' flags does
17976	   not force the server to set or clear the same bit on the results'
17977	   side.  Bits not defined above should not be set in the eia_flags
17978	   field.  If they are, the server MUST reject the operation with
17979	   NFS4ERR_INVAL.

17981	   When the EXCHGID4_FLAG_SUPP_MOVED_REFER is set, the client indicates
17982	   that it is capable of dealing with an NFS4ERR_MOVED error as part of
17983	   a referral sequence.  When this bit is not set, it is still legal for
17984	   the server to perform a referral sequence.  However, a server may use
17985	   the fact that the client is incapable of correctly responding to a
17986	   referral, by avoiding it for that particular client.  It may, for
17987	   instance, act as a proxy for that particular file system, at some
17988	   cost in performance, although it is not obligated to do so.  If the
17989	   server will potentially perform a referral, it MUST set
17990	   EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags.

17992	   When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, the client indicates
17993	   that it is capable of dealing with an NFS4ERR_MOVED error as part of
17994	   a file system migration sequence.  When this bit is not set, it is
17995	   still legal for the server to indicate that a file system has moved,
17996	   when this in fact happens.  However, a server may use the fact that
17997	   the client is incapable of correctly responding to a migration in its
17998	   scheduling of file systems to migrate so as to avoid migration of
17999	   file systems being actively used.  It may also hide actual migrations
18000	   from clients unable to deal with them by acting as a proxy for a
18001	   migrated file system for particular clients, at some cost in
18002	   performance, although it is not obligated to do so.  If the server
18003	   will potentially perform a migration, it MUST set
18004	   EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags.

18006	   When EXCHGID4_FLAG_USE_NON_PNFS is set in eia_flags, the client
18007	   indicates it wants to use the server in a conventional, non-parallel
18008	   NFS mode of operation.  When EXCHGID4_FLAG_USE_NON_PNFS is set in
18009	   eir_flags, the server is indicating it supports a conventional mode
18010	   of operation.

18012	   When EXCHGID4_FLAG_USE_PNFS_MDS is set in eia_flags, the client
18013	   indicates it wants to use the server as a metadata server of a
18014	   parallel NFS cluster.  When EXCHGID4_FLAG_USE_PNFS_MDS is set in
18015	   eir_flags, the server is indicating it supports a metadata server.

18017	   When EXCHGID4_FLAG_USE_PNFS_DS is set in eia_flags, the client
18018	   indicates it wants to use the server as a data server of a parallel
18019	   NFS cluster.  When EXCHGID4_FLAG_USE_PNFS_DS is set in eir_flags, the
18020	   server is indicating it supports a data server.

18022	   A client SHOULD indicate at least one of EXCHGID4_FLAG_USE_NON_PNFS,
18023	   EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_USE_PNFS_DS so that a server
18024	   willing to meet the client's desires can indicate it is doing so.  A
18025	   server MUST return at least one of the three bits, even if the bit is
18026	   not among the flag bits sent from the client.

18028	   The capabilities indicated in the flags word apply to all sessions
18029	   created for the resulting client ID and are presumed by the server to
18030	   remain valid until a new client instance with the same client
18031	   instance string does an EXCHANGE_ID.  The server may update its view
18032	   of such capabilities when a new EXCHANGE_ID is done by the same
18033	   client instance but clients should not depend upon such an update
18034	   being effective until the server receives an EXCHANGE_ID for a new
18035	   client instance.

18037	   The arguments includes an array of up to one element in length called
18038	   eia_client_impl_id.  If eia_client_impl_id is present it contains the
18039	   information identifying the implementation of the client.  Similarly,
18040	   the results include an array of up to one element in length called
18041	   eir_server_impl_id that identifies the implementation of the server.
18042	   Servers MUST allow a zero length eia_client_impl_id array, and
18043	   clients MUST allow a zero length eir_server_impl_id array.  Being
18044	   able to identify specific implementations can help in planning by
18045	   administrators or implementors.  For example, diagnostic software may
18046	   extract this information in an attempt to identify interoperability
18047	   problems, performance workload behaviors or general usage statistics.
18048	   Since the intent of having access to this information is for planning
18049	   or general diagnosis only, the client and server MUST NOT interpret
18050	   this implementation identity information in a way that affects
18051	   interoperational behavior of the implementation.  The reason is the
18052	   if clients and servers did such a thing, they might use fewer
18053	   capabilities of the protocol than the peer can support, or the client
18054	   and server might refuse to interoperate.

18056	   Because it is likely some implementations will violate the protocol
18057	   specification and interpret the identity information, implementations
18058	   MUST allow the users of the NFSv4 client and server to set the
18059	   contents of the sent nfs_impl_id structure to any value.

18061	17.35.5.  IMPLEMENTATION

18063	   A server's client record is a 5-tuple:

18065	   1.  co_ownerid
18066	          The client identifier string, from the eia_clientowner
18067	          structure of the EXCHANGE_ID4args structure

18069	   2.  co_verifier:

18071	          A client-specific value used to indicate reboots, from the
18072	          eia_clientowner structure of the EXCHANGE_ID4args structure

18074	   3.  principal:

18076	          The RPCSEC_GSS principal sent via the RPC headers

18078	   4.  client ID:

18080	          The shorthand client identifier, generated by the server and
18081	          returned via the eir_clientid field in the EXCHANGE_ID4resok
18082	          structure

18084	   5.  confirmed:

18086	          A private field on the server indicating whether or not a
18087	          client record has been confirmed.  A client record is
18088	          confirmed if there has been a successful CREATE_SESSION
18089	          operation to confirm it.  Otherwise it is unconfirmed.  An
18090	          unconfirmed record is established by a EXCHANGE_ID call.  Any
18091	          unconfirmed record that is not confirmed within a lease period
18092	          may be removed.

18094	   The following identifiers represent special values for the fields in
18095	   the records.

18097	   ownerid_arg:

18099	      The value of the eia_clientowner.co_ownerid subfield of the
18100	      EXCHANGE_ID4args structure of the current request.

18102	   verifier_arg:

18104	      The value of the eia_clientowner.co_verifier subfield of the
18105	      EXCHANGE_ID4args structure of the current request.

18107	   old_verifier_arg:

18109	      A value of the eia_clientowner.co_verifier field of a client
18110	      record received in a previous request; this is distinct from
18111	      verifier_arg.

18113	   principal_arg:

18115	      The value of the RPCSEC_GSS principal for the current request.

18117	   old_principal_arg:

18119	      A value of the RPCSEC_GSS principal received for a previous
18120	      request.  This is distinct from principal_arg.

18122	   clientid_ret:

18124	      The value of the eir_clientid field the server will return in the
18125	      EXCHANGE_ID4resok structure for the current request.

18127	   old_clientid_ret:

18129	      The value of the eir_clientid field the server returned in the
18130	      EXCHANGE_ID4resok structure for a previous request.  This is
18131	      distinct from clientid_ret.

18133	   Since EXCHANGE_ID is a non-idempotent operation, we must consider the
18134	   possibility that replays might occur as a result of a client reboot,
18135	   network partition, malfunctioning router, etc.  Replays are
18136	   identified by the value of the client field of EXCHANGE_ID4args and
18137	   the method for dealing with them is outlined in the scenarios below.

18139	   The scenarios are described in terms of what client records whose
18140	   eia_clientowner.co_ownerid subfield have a value equal to ownerid_arg
18141	   existing in the server's set of client records.  Any cases in which
18142	   there is more than one record with identical values for ownerid_arg
18143	   represent a server implementation error.  Operation in the potential
18144	   valid cases is summarized as follows.

18146	   1.  Common case

18148	          If no client records with eia_clientowner.co_ownerid matching
18149	          ownerid_arg exist, a new shorthand client identifier
18150	          clientid_ret is generated, and the following unconfirmed
18151	          record is added to the server's state.

18153	          { ownerid_arg, verifier_arg, principal_arg, clientid_ret,
18154	          FALSE }

18156	          Subsequently, the server returns clientid_ret.

18158	   2.  Router Replay
18159	          If the server has the following confirmed record, then this
18160	          request is likely the result of a replayed request due to a
18161	          faulty router or lost connection.

18163	          { ownerid_arg, verifier_arg, principal_arg, clientid_ret, TRUE
18164	          }

18166	          Since the record has been confirmed, the client must have
18167	          received the server's reply from the initial EXCHANGE_ID
18168	          request.  Since this is simply a spurious request, there is no
18169	          modification to the server's state, and the server makes no
18170	          reply to the client.

18172	   3.  Client Collision

18174	          If the server has the following confirmed record, then this
18175	          request is likely the result of a chance collision between the
18176	          values of the eia_clientowner.co_ownerid subfield of
18177	          EXCHANGE_ID4args for two different clients.

18179	          { ownerid_arg, *, old_principal_arg, clientid_ret, TRUE }

18181	          Since the value of the eia_clientowner.co_ownerid subfield of
18182	          each client record must be unique, there is no modification of
18183	          the server's state.  The server either returns
18184	          NFS4ERR_CLID_INUSE is to indicate the client should retry with
18185	          a different value for the eia_clientowner.co_ownerid subfield
18186	          of EXCHANGE_ID4args, or the server considers the principal and
18187	          ownerid together as the client owner, and treats the
18188	          EXCHANGE_ID as a unique client owner.

18190	          This scenario may also represent a malicious attempt to
18191	          destroy a client's state on the server.  For security reasons,
18192	          the server MUST NOT remove the client's state when there is a
18193	          principal mismatch.

18195	   4.  Replay

18197	          If the server has the following unconfirmed record then this
18198	          request is likely the result of a client replay due to a
18199	          network partition or some other connection failure.

18201	          { ownerid_arg, verifier_arg, principal_arg, clientid_ret,
18202	          FALSE }

18204	          Since the response to the EXCHANGE_ID request that created
18205	          this record may have been lost, it is not acceptable to drop
18206	          this replayed request.  However, rather than processing it
18207	          normally, the existing record is left unchanged and
18208	          clientid_ret, which was generated for the previous request, is
18209	          returned.

18211	   5.  Change of Principal

18213	          If the server has the following unconfirmed record then this
18214	          request is likely the result of a client which has for
18215	          whatever reasons changed principals (possibly to change
18216	          security flavor) after calling EXCHANGE_ID, but before calling
18217	          CREATE_SESSION.

18219	          { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret,
18220	          FALSE}

18222	          Since the client has not changed, the principal field of the
18223	          unconfirmed record is updated to principal_arg and
18224	          clientid_ret is again returned.  There is a small possibility
18225	          that this is merely a collision on the client field of
18226	          EXCHANGE_ID4args between unrelated clients, but since that is
18227	          unlikely, and an unconfirmed record does not generally have
18228	          any file system pertinent state, we can assume it is the same
18229	          client without risking loss of any important state.

18231	          After processing, the following record will exist on the
18232	          server.

18234	          { ownerid_arg, verifier_arg, principal_arg, clientid_ret,
18235	          FALSE}

18237	   6.  Client Reboot

18239	          If the server has the following confirmed client record, then
18240	          this request is likely from a previously confirmed client
18241	          which has rebooted.

18243	          { ownerid_arg, old_verifier_arg, principal_arg, clientid_ret,
18244	          TRUE }

18246	          Since the previous incarnation of the same client will no
18247	          longer be making requests, lock and share reservations should
18248	          be released immediately rather than forcing the new
18249	          incarnation to wait for the lease time on the previous
18250	          incarnation to expire.  Furthermore, session state should be
18251	          removed since if the client had maintained that information
18252	          across reboot, this request would not have been issued.  If
18253	          the server does not support the CLAIM_DELEGATE_PREV claim
18254	          type, associated delegations should be purged as well;
18255	          otherwise, delegations are retained and recovery proceeds
18256	          according to the section Delegation Recovery (Section 9.2.1).
18257	          The client record is updated with the new verifier and its
18258	          status is changed to unconfirmed.

18260	          After processing, clientid_ret is returned to the client and
18261	          the following record will exist on the server.

18263	          { ownerid_arg, verifier_arg, principal_arg, clientid_ret,
18264	          FALSE }

18266	   7.  Reboot before confirmation

18268	          If the server has the following unconfirmed record, then this
18269	          request is likely from a client which rebooted before sending
18270	          a CREATE_SESSION request.

18272	          { ownerid_arg, old_verifier_arg, *, clientid_ret, FALSE }

18274	          Since this is believed to be a request from a new incarnation
18275	          of the original client, the server updates the value of
18276	          eia_clientowner.co_verifier and returns the original
18277	          clientid_ret.  After processing, the following state exists on
18278	          the server.

18280	          { ownerid_arg, verifier_arg, *, clientid_ret, FALSE }

18282	   In addition to the client ID and sequenceid, the server returns a
18283	   server owner (eir_server_owner) and eir_server_scope.  The former
18284	   field is used for network trunking as described in
18285	   Section 2.10.3.4.1.  The latter field is used to allow clients to
18286	   determine when clientids issued by one server may be recognized by
18287	   another in the event of file system migration (see Section 10.6.7).

18289	17.36.  Operation 43: CREATE_SESSION - Create New Session and Confirm
18290	        Client ID

18292	   Start up session and confirm client ID.

18294	17.36.1.  SYNOPSIS

18296	   client ID, session_args -> sessionid, session_args

18298	17.36.2.  ARGUMENT

18300	   struct channel_attrs4 {
18301	           count4                  ca_maxrequestsize;
18302	           count4                  ca_maxresponsesize;
18303	           count4                  ca_maxresponsesize_cached;
18304	           count4                  ca_maxoperations;
18305	           count4                  ca_maxrequests;
18306	           uint32_t                ca_rdma_ird<1>;
18307	   };

18309	   union conn_binding4args switch (bool cba_enforce) {
18310	   case TRUE:
18311	           sec_oid4                cba_hash_algs<>;
18312	   case FALSE:
18313	           void;
18314	   };

18316	   const CREATE_SESSION4_FLAG_PERSIST              = 0x00000001;
18317	   const CREATE_SESSION4_FLAG_CONN_BACK_CHAN       = 0x00000002;
18318	   const CREATE_SESSION4_FLAG_CONN_RDMA            = 0x00000004;

18320	   struct CREATE_SESSION4args {
18321	           clientid4               csa_clientid;
18322	           sequenceid4             csa_sequence;
18323	           uint32_t                csa_flags;

18325	           count4                  csa_headerpadsize;

18327	           conn_binding4args       csa_conn_binding_opts;

18329	           channel_attrs4          csa_fore_chan_attrs;
18330	           channel_attrs4          csa_back_chan_attrs;

18332	           uint32_t                csa_cb_program;
18333	           callback_sec_parms4     csa_cb_sec_parms<>;
18334	   };

18336	17.36.3.  RESULT

18338	   struct hash_alg_info4 {
18339	           uint32_t                hai_hash_alg;
18340	           uint32_t                hai_ssv_len;
18341	   };

18343	   union conn_binding4res switch (bool cbr_enforce) {
18344	   case TRUE:
18345	           hash_alg_info4          cbr_hash_alg_info;
18346	   case FALSE:
18347	           void;
18348	   };

18350	   struct CREATE_SESSION4resok {
18351	           sessionid4              csr_sessionid;
18352	           sequenceid4             csr_sequence;

18354	           uint32_t                csr_flags;
18355	           count4                  csr_headerpadsize;

18357	           conn_binding4res        csr_conn_binding_opts;

18359	           channel_attrs4          csr_fore_chan_attrs;
18360	           channel_attrs4          csr_back_chan_attrs;
18361	   };

18363	   union CREATE_SESSION4res switch (nfsstat4 csr_status) {
18364	   case NFS4_OK:
18365	           CREATE_SESSION4resok    csr_resok4;
18366	   default:
18367	           void;
18368	   };

18370	17.36.4.  DESCRIPTION

18372	   This operation is used by the client to create new session objects on
18373	   the server.  The server MUST accept a CREATE_SESSION operation with
18374	   no preceding SEQUENCE operation in the COMPOUND procedure.  A client
18375	   MAY precede CREATE_SESSION with SEQUENCE in a COMPOUND procedure; if
18376	   so, any session created by CREATE_SESSION has no direct relation to
18377	   the session specified in the SEQUENCE operation.

18379	   In addition to creating a session, CREATE_SESSION has the following
18380	   effects:

18382	   o  The first session created with a new shorthand client identifier
18383	      (client ID) serves to confirm the creation of that client's state
18384	      on the server.  The server returns the parameter values for the
18385	      new session.

18387	   o  The connection CREATE_SESSION is issued over is bound to the
18388	      session and to the session's forward channel.

18390	17.36.5.  IMPLEMENTATION

18392	   To describe the implementation, the same notation for client records
18393	   introduced in the description of EXCHANGE_ID is used with the
18394	   following addition:

18396	      clientid_arg: The value of the csa_clientid field of the
18397	      CREATE_SESSION4args structure of the current request.

18399	   Since CREATE_SESSION is a non-idempotent operation, we must consider
18400	   the possibility that replays may occur as a result of a client
18401	   reboot, network partition, malfunctioning router, etc.  For each
18402	   client ID created by EXCHANGE_ID, the server maintains a separate
18403	   replay cache similar to the session replay cache used for SEQUENCE
18404	   operations, with two distinctions.  First this is a replay cache just
18405	   for detecting and processing CREATE_SESSION requests for a given
18406	   client ID.  Second, the size of the client ID replay cache is of one
18407	   slot (and as a result, the CREATE_SESSION request does not carry a
18408	   slot number).  This means that at most one CREATE_SESSION request for
18409	   a given client ID can be outstanding.  When client issues a
18410	   successful EXCHANGE_ID it is returned eir_sequenceid, and the client
18411	   is expected to set the value of csa_sequenceid in the next
18412	   CREATE_SESSION it sends with that client ID to the value of
18413	   eir_sequenceid.  After EXCHANGE_ID, the server initializes the client
18414	   ID slot to be equal to eir_sequenceid - 1 (accounting for underflow),
18415	   and records a contrived CREATE_SESSION result with a "cached" result
18416	   of NFS4ERR_SEQ_MISORDERED.  With the slot thus initialized, the
18417	   processing of the CREATE_SESSION operation is divided into four
18418	   phases:

18420	   1.  Replay cache lookup.  The server verifies it has a replay cache
18421	       for the client ID.  If the server contains no records with client
18422	       ID equal to clientid_arg, then most likely the client's state has
18423	       been purged during a period of inactivity, possibly due to a loss
18424	       of connectivity.  NFS4ERR_STALE_CLIENTID is returned, and no
18425	       changes are made to any client records on the server.

18427	   2.  Sequenceid processing.  If csa_sequenceid is equal to the
18428	       sequenceid in the client's slot, then this is a replay of the
18429	       previous CREATE_SESSION request, and the server returns the
18430	       cached result.  If csa_sequenceid is not equal to the sequenceid
18431	       in the slot, and is more than one greater (accounting for
18432	       wraparound), then the server returns the error
18433	       NFS4ERR_SEQ_MISORDERED, and does not change the slot.  If
18434	       csa_sequenceid is equal to the slot's sequenceid + 1 (accounting
18435	       for wraparound), then the slot's sequenceid is set to
18436	       csa_sequenceid, and the CREATE_SESSION processing goes to the
18437	       next phase.  A subsequent new CREATE_SESSION call, MUST use a
18438	       csa_sequence that is one greater than that recorded in the slot.

18440	   3.  Client ID confirmation.  In case the state for the provided
18441	       client ID has not been verified, it is confirmed before the
18442	       session is created.  Otherwise the client ID confirmation phase
18443	       is skipped and only the session creation phase occurs.  The
18444	       operational cases are described in terms of what client records
18445	       whose client ID field have value equal to clientid_arg exist in
18446	       the server's set of client records.  Any cases in which there is
18447	       more than one record with identical values for client ID
18448	       represent a server implementation error.  Operation in the
18449	       potential valid cases is summarized as follows.

18451	       *  Common Case

18453	             If the server has the following unconfirmed record, then
18454	             this is the expected confirmation of an unconfirmed record.

18456	             { *, *, principal_arg, clientid_arg, FALSE }

18458	             The confirmed field of the record is set to TRUE.

18460	             The processing of the operation continues to session
18461	             creation.

18463	       *  Principal Change or Collision

18465	             If the server has the following record, then the client has
18466	             changed principals after the previous EXCHANGE_ID request,
18467	             or there has been a chance collision between shorthand
18468	             client identifiers.

18470	             { *, *, old_principal_arg, clientid_arg, *, sequence_arg }

18472	             Neither of these cases are permissible.  Processing stops
18473	             and NFS4ERR_CLID_INUSE is returned to the client.  No
18474	             changes are made to any client records on the server.

18476	   4.  Session creation.  The server confirmed the client ID, either in
18477	       this CREATE_SESSION operation, or a previous CREATE_SESSION
18478	       operation.  The server examines the remaining fields of the
18479	       arguments.  For each argument field, if the value is acceptable
18480	       to the server, it is recommended that the server use the provided
18481	       value to create the new session.  If it is not acceptable, the
18482	       server may use a different value, but must return the value used
18483	       to the client.  These parameters have the following
18484	       interpretation.

18486	       csa_flags:

18488	          The csa_flags field contains a list of the following flag
18489	          bits:

18491	          CREATE_SESSION4_FLAG_PERSIST:

18493	             If CREATE_SESSION4_FLAG_PERSIST is set, the client desires
18494	             server support for "reliable" semantics.  For sessions in
18495	             which only idempotent operations will be used (e.g. a read-
18496	             only session), clients should not set
18497	             CREATE_SESSION4_FLAG_PERSIST.  If the server does not or
18498	             cannot provide "reliable" semantics the result field
18499	             csr_flags must not set CREATE_SESSION4_FLAG_PERSIST.

18501	             If the server is a pNFS metadata server, for reasons
18502	             described in Section 12.5.2 it MUST support
18503	             CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint
18504	             (Section 5.13.4) attribute.

18506	          CREATE_SESSION4_FLAG_CONN_BACK_CHAN:

18508	             If CREATE_SESSION4_FLAG_CONN_BACK_CHAN is set in csa_flags,
18509	             the client is requesting that the server use the connection
18510	             CREATE_SESSION is called over for the back channel as well
18511	             as the forward channel.  The server sets
18512	             CREATE_SESSION4_FLAG_CONN_BACK_CHAN in the result field
18513	             csr_flags if it agrees.  If
18514	             CREATE_SESSION4_FLAG_CONN_BACK_CHAN is not set in
18515	             csa_flags, then CREATE_SESSION4_FLAG_CONN_BACK_CHAN MUST
18516	             NOT be set in csr_flags.

18518	          CREATE_SESSION4_FLAG_CONN_RDMA:

18520	             If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, the
18521	             connection CREATE_SESSION is called over is currently in
18522	             non-RDMA mode, but has the capability to operate in RDMA
18523	             mode, and the client is requesting the server agree to
18524	             "step up" to RDMA mode on the connection.  The server sets
18525	             CREATE_SESSION4_FLAG_CONN_RDMA in the result field
18526	             csr_flags if it agrees.  If CREATE_SESSION4_FLAG_CONN_RDMA
18527	             is not set in csa_flags, then
18528	             CREATE_SESSION4_FLAG_CONN_RDMA MUST NOT be set in
18529	             csr_flags.  Note that once the server agrees to step up, it
18530	             and the client MUST exchange all future traffic on the
18531	             connection with RPC RDMA framing and not Record Marking.
18532	             [[Comment.18: add xref]]

18534	       csa_headerpadsize:

18536	          The maximum amount of padding the client is willing to apply
18537	          to ensure that write payloads are aligned on some boundary at
18538	          the server.  The server should reply in csr_headerpadsize with
18539	          its preferred value, or zero if padding is not in use.  The
18540	          server may decrease this value but MUST NOT increase it.

18542	       csa_conn_binding_opts:

18544	          This argument indicates whether the client wants the server to
18545	          enforce connection binding ( see Section 2.10.6.3), and if so,
18546	          which one way hash algorithms to use.  The corresponding
18547	          result is csr_conn_binding_opts.  The argument contains the
18548	          following fields.

18550	          cba_enforce:

18552	             Clients SHOULD set cba_enforce to TRUE so that servers
18553	             reject the use of connections that are not explicitly bound
18554	             to the session.  If TRUE, the server MUST require the
18555	             client to issue BIND_CONN_TO_SESSION before using a
18556	             connection on a channel.  If FALSE, then the digests used
18557	             in SET_SSV and BIND_CONN_TO_SESSION MUST be zero length.

18559	             The corresponding result is cbr_enforce which MUST be equal
18560	             to cba_enforce.

18562	          cba_hash_algs:

18564	             This is the set of algorithms the client supports for the
18565	             purpose of computing the digests needed for the SET_SSV and
18566	             BIND_CONN_TO_SESSION operations.  Each algorithm is
18567	             specified as an object identifier (OID).  The REQUIRED
18568	             algorithms for a server are id-sha1, id-sha224, id-sha256,
18569	             id-sha384, and id-sha512 RFC4055 [15].

18571	             If the server does not support any of the offered hash
18572	             algorithms, CREATE_SESSION fails with error status
18573	             NFS4ERR_OP_HASH_ALG_UNSUPP.  Otherwise, the corresponding
18574	             result is cbr_hash_alg_info, which contains two fields,
18575	             hai_hash_alg and hai_ssv_len.  The former is the index of
18576	             the algorithm list of cba_hash_algs that the server has
18577	             selected and the client MUST use for SET_SSV and
18578	             BIND_CONN_TO_SESSION.  The latter is the length in octets
18579	             of the SSV the client MUST use in SET_SSV.  The result
18580	             hai_ssv_len MUST be greater than or equal to the length of
18581	             the hash produced by the selected algorithm.

18583	       csa_fore_chan_attrs

18585	       csa_back_chan_attrs

18587	          These two fields apply to attributes of the fore channel (aka
18588	          the operations channel, which conveys requests originating
18589	          from the client to the server), and the back channel (the
18590	          channel that conveys callback requests originating from the
18591	          server to the client).  The results are in corresponding
18592	          structures called csr_fore_chan_attrs and csr_back_chan_attrs.
18593	          Each structure has the following fields:

18595	          ca_maxrequestsize:

18597	             The maximum size of a COMPOUND or CB_COMPOUND request that
18598	             will be sent.  This size represents the XDR encoded size of
18599	             the request, including the RPC headers (including security
18600	             flavor credentials and, verifiers) but excludes any 32 bit
18601	             Record Marking headers.  Imagine a request has a single
18602	             Record Marking header preceding it.  The maximum allowable
18603	             count encoded in the header will be ca_maxrequestsize.  If
18604	             a sender sends a request that exceeds ca_maxrequestsize,
18605	             the error NFS4ERR_REQ_TOO_BIG will be returned per the
18606	             description in Section 2.10.4.4.

18608	          ca_maxresponsesize:

18610	             The maximum size of a COMPOUND or CB_COMPOUND reply that
18611	             the receiver will accept from the sender including RPC
18612	             headers (see the ca_maxrequestsize definition).  The
18613	             NFSv4.1 server MUST NOT increase the value of this
18614	             parameter in the CREATE_SESSION results.  If a sender sends
18615	             a request for which the size of the reply would exceed this
18616	             value, the receiver will return NFS4ERR_REP_TOO_BIG, per
18617	             the description in Section 2.10.4.4.

18619	          ca_maxresponsesize_cached:

18621	             Like ca_maxresponsesize, but the maximum size of a reply
18622	             that will be stored in the reply cache (Section 2.10.4.1).
18623	             If ca_maxresponsesize_cached is less than
18624	             ca_maxresponsesize, then this is an indication to the
18625	             client that it needs to be selective about which replies it
18626	             tells the server to cache; large replies (e.g.  READ
18627	             results), should not be cached.  The client can decide
18628	             which replies to cache via the SEQUENCE (Section 17.46) or
18629	             CB_SEQUENCE (Section 19.9) operations.  If a sender sends a
18630	             request for which the size of the reply would exceed this
18631	             value, the receiver will return
18632	             NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in
18633	             Section 2.10.4.4.

18635	          ca_maxoperations:

18637	             The maximum number of operations requests the receiver will
18638	             accept in a COMPOUND or CB_COMPOUND.  If client or server
18639	             do not have a limit, they will set ca_maxoperations to
18640	             0xFFFFFFFF.  The server MUST NOT increase ca_maxoperations
18641	             in the reply to CREATE_SESSION.  If the requester issues a
18642	             COMPOUND or CB_COMPOUND with more operations than
18643	             ca_maxoperations, the replier MUST return
18644	             NFS4ERR_TOO_MANY_OPS.

18646	          ca_maxrequests:

18648	             The maximum number of concurrent COMPOUND or CB_COMPOUND
18649	             requests the sender will issue on the session.  Subsequent
18650	             requests will each be assigned a slot identifier by the
18651	             client within the range 0 to ca_maxrequests - 1 inclusive.

18653	          ca_rdma_ird:

18655	             This array has a maximum of one element.  If this array has
18656	             one element, then the element contains the inbound RDMA
18657	             read queue depth (IRD).

18659	       csa_cb_program

18661	          This is the program number the server must use in any
18662	          callbacks sent through the back channel to the client.

18664	       csa_cb_sec_parms

18666	          This is an array of acceptable security credentials.  Three
18667	          security flavors are supported: AUTH_NONE, AUTH_SYS, and
18668	          RPCSEC_GSS.  If AUTH_NONE is specified for a credential, then
18669	          this says the client is allowed to use AUTH_NONE on all
18670	          callbacks for the session.  If AUTH_SYS is specified, then the
18671	          client is allowed to use AUTH_SYS on all callbacks, using the
18672	          credential specified cbsp_sys_cred.  If RPCSEC_GSS is
18673	          specified, then the server is allowed to use the RPCSEC_GSS
18674	          context specified in cbsp_gss_parms as the RPCSEC_GSS context
18675	          in the credential of the RPC header of callbacks to the
18676	          client.

18678	          The RPCSEC_GSS context is specified with two RPCSEC_GSS
18679	          handles.  The first handle, gcbp_handle_from_server, is the
18680	          fore handle the server returned to the client when the
18681	          RPCSEC_GSS context was created on the server.  The second
18682	          handle, gcbp_handle_from_client is the back handle the client
18683	          will map to the RPCSEC_GSS context to.  The server can
18684	          immediately use the RPCSEC_GSS context using
18685	          gcbp_handle_from_client as the value for "handle" in the
18686	          structure rpc_gss_cred_vers_1_t of the RPCSEC_GSS handle, and
18687	          gss_proc set to RPCSEC_GSS_DATA.  Note that while the GSS
18688	          context state is shared between the fore and back RPCSEC_GSS
18689	          contexts, the fore and back RPCSEC_GSS context state are
18690	          independent of each other as far as the RPCSEC_GSS sequence
18691	          number.

18693	          Implementing RPCSEC_GSS callback support requires the client
18694	          and server change their RPCSEC_GSS implementations.  One
18695	          possible set of changes includes:

18697	          +  Adding a data structure that wraps the GSS-API context with
18698	             a reference count.

18700	          +  New functions to increment and decrement the reference
18701	             count.  If the reference count is decremented to zero, the
18702	             wrapper data structure and the GSS-API context it refers to
18703	             would be freed.

18705	          +  Change RPCSEC_GSS to create the wrapper data structure upon
18706	             receiving GSS-API context from gss_accept_sec_context() and
18707	             gss_init_sec_context().  The reference count would be
18708	             initialized to 1.

18710	          +  Adding a function to map an existing RPCSEC_GSS handle to a
18711	             pointer to the wrapper data structure.  The reference count
18712	             would be incremented.

18714	          +  Adding a function to create a new RPCSEC_GSS handle from a
18715	             pointer to the wrapper data structure.  The reference count
18716	             would be incremented.

18718	          +  Replacing calls from RPCSEC_GSS that free GSS-API contexts,
18719	             with calls to decrement the reference count on the wrapper
18720	             data structure.

18722	   5.  The server creates the session by recording the parameter values
18723	       used (including whether the CREATE_SESSION4_FLAG_PERSIST flag is
18724	       set and has been accepted by the server) and allocating space for
18725	       the session replay cache.  For each slot in the replay cache, the
18726	       server sets the sequenceid to zero (0), and records a result
18727	       containing a result for a COMPOUND with a single SEQUENCE
18728	       operation, with the cached error of NFS4ERR_SEQ_MISORDERED.
18729	       Thus, the first SEQUENCE operation a client issues on a slot
18730	       after the session is created MUST start with a sequenceid of one
18731	       (1).  The client initializes its replay cache for receiving
18732	       callbacks in the same way, and similarly, the first CB_SEQUENCE
18733	       operation on a slot after session creation must have a sequenceid
18734	       of one.

18736	   6.  If the session state is created successfully, the server
18737	       associates the session with the client ID provided by the client.

18739	17.37.  Operation 44: DESTROY_SESSION - Destroy existing session

18741	   Destroy existing session.

18743	17.37.1.  SYNOPSIS

18745	   sessionid -> status

18747	17.37.2.  ARGUMENT

18749	   struct DESTROY_SESSION4args {
18750	           sessionid4      dsa_sessionid;
18751	   };

18753	17.37.3.  RESULT

18755	   struct DESTROY_SESSION4res {
18756	           nfsstat4        dsr_status;
18757	   };

18759	17.37.4.  DESCRIPTION

18761	   The DESTROY_SESSION operation closes the session and discards the
18762	   replay cache.  Any remaining connections bound to the session are
18763	   immediately unbound and may additionally be closed by the server.
18764	   Locks, delegations, layouts, wants, and the lease, which are all tied
18765	   to the client ID, are not affected by DESTROY_SESSION.

18767	   If the COMPOUND request starts with SEQUENCE, then DESTROY_SESSION
18768	   MUST be the final, or only operation, unless the sessionid specified
18769	   in SEQUENCE is different from the sessionid specified in
18770	   DESTROY_SESSION.  DESTROY_SESSION MAY be the only operation in a
18771	   COMPOUND request.  Because the operation results in destruction of
18772	   the session, any reply caching for this request, as well as
18773	   previously completed requests, will be lost.  For this reason, it is
18774	   advisable to not place this operation in a COMPOUND request with
18775	   other state-modifying operations (unless those operations are for a
18776	   different session, as specified by SEQUENCE).

18778	   Because the session is destroyed, a client that retransmits the
18779	   request may receive an error in response, even though the original
18780	   request was successful.

18782	   If there is a backchannel on the session and the server has
18783	   outstanding CB_SEQUENCE operations, then the server MAY refuse to
18784	   destroy the session and return NFS4ERR_BACK_CHAN_BUSY.  The client
18785	   SHOULD respond to all outstanding CB_COMPOUNDs before re-issuing
18786	   DESTROY_SESSION.

18788	17.37.5.  IMPLEMENTATION

18790	   No discussion at this time.

18792	17.38.  Operation 45: FREE_STATEID - Free stateid with no locks

18794	   Test a series of stateids for validity.

18796	17.38.1.  SYNOPSIS

18798	   stateid ->

18800	17.38.2.  ARGUMENT

18802	   struct FREE_STATEID4args {
18803	           stateid4        fsa_stateid;
18804	   };

18806	17.38.3.  RESULT

18808	   struct FREE_STATEID4res {
18809	           nfsstat4        fsr_status;
18810	   };

18812	17.38.4.  DESCRIPTION

18814	   The FREE_STATEID operation is used to free a stateid which no longer
18815	   has any associated locks (including opens, record locks, delegations,
18816	   layouts).  This may be cause of client unlock operations or because
18817	   of server revocation.  If there are valid locks (of any kind)
18818	   associated with the stateid in question, the error NFS4ERR_LOCKS_HELD
18819	   will be returned, and the associated stateid will not be freed.

18821	   When a stateid is freed which had been associated with revoked locks,
18822	   the client, by doing the FREE_STATEID acknowledges the loss of those
18823	   locks, allowing the server, once all such revoked state, is
18824	   acknowledged to allow that client again to reclaim locks, without
18825	   encountering the edge conditions discussed in Section 8.6.2.

18827	   Once a successful FREE_STATEID is done for a given stateid, any
18828	   subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID
18829	   error.

18831	17.38.5.  IMPLEMENTATION

18833	   No discussion at this time.

18835	17.39.  Operation 46: GET_DIR_DELEGATION - Get a directory delegation

18837	   Obtain a directory delegation.

18839	17.39.1.  SYNOPSIS

18841	   (cfh), requested notification ->
18842	           (cfh), cookieverf, stateid, supported notification

18844	17.39.2.  ARGUMENT

18846	   /*
18847	    * Notification types.
18848	    */
18849	   const DIR_NOTIFICATION4_NONE                    = 0x00000000;
18850	   const DIR_NOTIFICATION4_CHANGE_CHILD_ATTRIBUTES = 0x00000001;
18851	   const DIR_NOTIFICATION4_CHANGE_DIR_ATTRIBUTES   = 0x00000002;
18852	   const DIR_NOTIFICATION4_REMOVE_ENTRY            = 0x00000004;
18853	   const DIR_NOTIFICATION4_ADD_ENTRY               = 0x00000008;
18854	   const DIR_NOTIFICATION4_RENAME_ENTRY            = 0x00000010;
18855	   const DIR_NOTIFICATION4_CHANGE_COOKIE_VERIFIER  = 0x00000020;

18857	   typedef uint32_t dir_notification_type4;

18859	   typedef nfstime4 attr_notice4;

18861	   struct GET_DIR_DELEGATION4args {
18862	           bool                    gdda_signal_deleg_avail;
18863	           dir_notification_type4  gdda_notification_type;
18864	           attr_notice4            gdda_child_attr_delay;
18865	           attr_notice4            gdda_dir_attr_delay;
18866	           bitmap4                 gdda_child_attributes;
18867	           bitmap4                 gdda_dir_attributes;
18868	   };

18870	17.39.3.  RESULT

18872	 struct GET_DIR_DELEGATION4resok {
18873	         verifier4               gddr_cookieverf;
18874	         /* Stateid for get_dir_delegation */
18875	         stateid4                gddr_stateid;
18876	         /* Which notifications can the server support */
18877	         dir_notification_type4  gddr_notification;
18878	         bitmap4                 gddr_child_attributes;
18879	         bitmap4                 gddr_dir_attributes;
18880	 };

18882	 enum gddrnf4_status {
18883	         GDD4_OK         = 0,
18884	         GDD4_UNAVAIL    = 1
18885	 };

18887	 union GET_DIR_DELEGATION4res_non_fatal
18888	     switch (gddrnf4_status gddrnf_status) {
18889	     case GDD4_OK:
18890	         GET_DIR_DELEGATION4resok        gddrnf_resok4;
18891	     case GDD4_UNAVAIL:
18892	         bool                            gddrnf_will_signal_deleg_avail;
18893	 };

18895	 union GET_DIR_DELEGATION4res
18896	     switch (nfsstat4 gddr_status) {
18897	     case NFS4_OK:
18898	         /* CURRENT_FH: delegated dir */
18899	         GET_DIR_DELEGATION4res_non_fatal        gddr_res_non_fatal4;
18900	     default:
18901	         void;
18902	 };

18904	17.39.4.  DESCRIPTION

18906	   The GET_DIR_DELEGATION operation is used by a client to request a
18907	   directory delegation.  The directory is represented by the current
18908	   filehandle.  The client also specifies whether it wants the server to
18909	   notify it when the directory changes in certain ways by setting one
18910	   or more bits in a bitmap.  The server may also choose not to grant
18911	   the delegation.  In that case the server will return
18912	   NFS4ERR_DIRDELEG_UNAVAIL.  If the server decides to hand out the
18913	   delegation, it will return a cookie verifier for that directory.  If
18914	   the cookie verifier changes when the client is holding the
18915	   delegation, the delegation will be recalled unless the client has
18916	   asked for notification for this event.  In that case a notification
18917	   will be sent to the client.

18919	   The server will also return a directory delegation stateid in
18920	   addition to the cookie verifier as a result of the GET_DIR_DELEGATION
18921	   operation.  This stateid will appear in callback messages related to
18922	   the delegation, such as notifications and delegation recalls.  The
18923	   client will use this stateid to return the delegation voluntarily or
18924	   upon recall.  A delegation is returned by calling the DELEGRETURN
18925	   operation.

18927	   The server may not be able to support notifications of certain
18928	   events.  If the client asks for such notifications, the server must
18929	   inform the client of its inability to do so as part of the
18930	   GET_DIR_DELEGATION reply by not setting the appropriate bits in the
18931	   supported notifications bitmask contained in the reply.

18933	   The GET_DIR_DELEGATION operation can be used for both normal and
18934	   named attribute directories.  It covers all the entries in the
18935	   directory except the ".." entry.  That means if a directory and its
18936	   parent both hold directory delegations, any changes to the parent
18937	   will not cause a notification to be sent for the child even though
18938	   the child's ".." entry points to the parent.

18940	   If client sets gdda_signal_deleg_avail to TRUE, then it is
18941	   registering with the client a "want" for a directory delegation.  If
18942	   the server supports and will honor the "want", the results will have
18943	   gddrnf_will_signal_deleg_avail set to TRUE.  If so the client should
18944	   expect a future CB_RECALLABLE_OBJ_AVAIL operation to indicate that a
18945	   directory delegation is available.

18947	17.39.5.  IMPLEMENTATION

18949	   Directory delegation provides the benefit of improving cache
18950	   consistency of namespace information.  This is done through
18951	   synchronous callbacks.  A server must support synchronous callbacks
18952	   in order to support directory delegations.  In addition to that,
18953	   asynchronous notifications provide a way to reduce network traffic as
18954	   well as improve client performance in certain conditions.
18955	   Notifications would not be requested when the goal is just cache
18956	   consistency.

18958	   Notifications are specified in terms of potential changes to the
18959	   directory.  A client can ask to be notified of events by setting one
18960	   or more flags in gdda_notification_type.  The client can ask for
18961	   notifications on addition of entries to a direction (by setting the
18962	   DIR_NOTIFICATION4_ADD_ENTRY in gdda_notification_type), notifications
18963	   on entry removal (DIR_NOTIFICATION4_REMOVE_ENTRY), renames
18964	   (DIR_NOTIFICATION4_RENAME_ENTRY), directory attribute changes
18965	   (DIR_NOTIFICATION4_CHANGE_DIR_ATTRIBUTES), and cookie verifier
18966	   changes (DIR_NOTIFICATION4_CHANGE_COOKIE_VERIFIER) by setting one
18967	   more corresponding flags in the gdda_notification_type field.

18969	   The client can also ask for notifications of changes to attributes of
18970	   directory entries (DIR_NOTIFICATION4_CHANGE_CHILD_ATTRIBUTES) in
18971	   order to keep its attribute cache up to date.  However any changes
18972	   made to child attributes do not cause the delegation to be recalled.
18973	   If a client is interested in directory entry caching, or negative
18974	   name caching, it can set the gdda_notification_type appropriately and
18975	   the server will notify it of all changes that would otherwise
18976	   invalidate its name cache.  The kind of notification a client asks
18977	   for may depend on the directory size, its rate of change and the
18978	   applications being used to access that directory.  However, the
18979	   conditions under which a client might ask for a notification, is out
18980	   of the scope of this specification.

18982	   For attribute notifications, the client will set bits in the
18983	   gdda_dir_attributes bitmap to indicate which attributes it wants to
18984	   be notified of.  If the server does not support notifications for
18985	   changes to a certain attribute, it should not set that attribute in
18986	   the supported attribute bitmap specified in the reply
18987	   (gddr_dir_attributes).  The client will also set in the
18988	   gdda_child_attributes bitmap the attributes of directory entries it
18989	   wants to be notified of, and the server will indicate in
18990	   gddr_child_attributes which attributes of directory entries it will
18991	   notify the client of.

18993	   The client will also let the server know if it wants to get the
18994	   notification as soon as the attribute change occurs or after a
18995	   certain delay by setting a delay factor; gdda_child_attr_delay is for
18996	   attribute changes to directory entries and gdda_dir_attr_delay is for
18997	   attribute changes to the directory.  If this delay factor is set to
18998	   zero, that indicates to the server that the client wants to be
18999	   notified of any attribute changes as soon as they occur.  If the
19000	   delay factor is set to N seconds, the server will make a best effort
19001	   guarantee that attribute updates are not out of sync by more than
19002	   that.  If the client asks for a delay factor that the server does not
19003	   support or that may cause significant resource consumption on the
19004	   server by causing the server to send a lot of notifications, the
19005	   server should not commit to sending out notifications for attributes
19006	   and therefore must not set the appropriate bit in the
19007	   gddr_child_attributes and gddr_dir_attributes bitmaps in the
19008	   response.

19010	   The client should use a security flavor that the file system is
19011	   exported with.  If it uses a different flavor, the server should
19012	   return NFS4ERR_WRONGSEC to the operation that precedes
19013	   GET_DIR_DELEGATION and sets the current filehandle.

19015	17.40.  Operation 47: GETDEVICEINFO - Get Device Information

19017	17.40.1.  SYNOPSIS

19019	   (cfh), device_id, layout_type, maxcount -> device_addr

19021	17.40.2.  ARGUMENT

19023	   struct GETDEVICEINFO4args {
19024	           /* CURRENT_FH: file */
19025	           deviceid4               gdia_device_id;
19026	           layouttype4             gdia_layout_type;
19027	           count4                  gdia_maxcount;
19028	   };

19030	17.40.3.  RESULT

19032	   struct GETDEVICEINFO4resok {
19033	           device_addr4            gdir_device_addr;
19034	   };

19036	   union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
19037	   case NFS4_OK:
19038	           GETDEVICEINFO4resok     gdir_resok4;
19039	   default:
19040	           void;
19041	   };

19043	17.40.4.  DESCRIPTION

19045	   Returns device address information for a specified device.  The
19046	   device address MUST correspond to the layout type specified by the
19047	   GETDEVICEINFO4args.  The current filehandle (cfh) is used to identify
19048	   the file system; device IDs are unique per file system (FSID) and are
19049	   qualified by the layout type.

19051	   See Section 12.2.12 for more details on device ID assignment.

19053	   If the size of the device address exceeds gdia_maxcount bytes, the
19054	   metadata server will return the error NFS4ERR_TOOSMALL.  If an
19055	   invalid device ID is given, the metadata server will respond with
19056	   NFS4ERR_INVAL.

19058	17.40.5.  IMPLEMENTATION

19060	17.41.  Operation 48: GETDEVICELIST

19062	17.41.1.  SYNOPSIS

19064	   (cfh), layout_type, maxcount, cookie, cookieverf ->
19065	           cookie, cookieverf, device info list<>

19067	17.41.2.  ARGUMENT

19069	   struct GETDEVICELIST4args {
19070	           /* CURRENT_FH: file */
19071	           layouttype4             gdla_layout_type;
19072	           count4                  gdla_maxcount;
19073	           nfs_cookie4             gdla_cookie;
19074	           verifier4               gdla_cookieverf;
19075	   };

19077	17.41.3.  RESULT

19079	   struct GETDEVICELIST4resok {
19080	           nfs_cookie4             gdlr_cookie;
19081	           verifier4               gdlr_cookieverf;
19082	           devlist_item4           gdlr_devinfo_list<>;
19083	           bool                    gdlr_eof;
19084	   };

19086	   union GETDEVICELIST4res switch (nfsstat4 gdlr_status) {
19087	   case NFS4_OK:
19088	           GETDEVICELIST4resok     gdlr_resok4;
19089	   default:
19090	           void;
19091	   };

19093	17.41.4.  DESCRIPTION

19095	   In some applications, especially SAN environments, it is convenient
19096	   to find out about all the devices associated with a file system.
19097	   This lets a client determine if it has access to these devices, e.g.,
19098	   at mount time.

19100	   This operation returns an array of items (devlist_item4) that
19101	   establish the association between the short deviceid4 and the
19102	   addressing information for that device, for a particular layout type.
19103	   This operation may not be able to fetch all device information at
19104	   once, thus it uses a cookie based approach, similar to READDIR, to
19105	   fetch additional device information (see Section 17.23).  The "eof"
19106	   flag has a value of TRUE if there are no more entries to fetch.  As
19107	   in GETDEVICEINFO, the current filehandle (cfh) is used to identify
19108	   the file system.

19110	   As in GETDEVICEINFO, gdla_maxcount specifies the maximum number of
19111	   bytes to return.  If the metadata server is unable to return a single
19112	   device address, it will return the error NFS4ERR_TOOSMALL.  If an
19113	   invalid device ID is given, the metadata server will respond with
19114	   NFS4ERR_INVAL.

19116	17.41.5.  IMPLEMENTATION

19118	17.42.  Operation 49: LAYOUTCOMMIT - Commit writes made using a layout

19120	17.42.1.  SYNOPSIS

19122	   (client ID), (cfh), offset, length, reclaim, last_write_offset,
19123	           time_modify, time_access, layoutupdate -> newsize

19125	17.42.2.  ARGUMENT

19127	   union newtime4 switch (bool nt_timechanged) {
19128	   case TRUE:
19129	           nfstime4           nt_time;
19130	   case FALSE:
19131	           void;
19132	   };

19134	   union newoffset4 switch (bool no_newoffset) {
19135	   case TRUE:
19136	           offset4           no_offset;
19137	   case FALSE:
19138	           void;
19139	   };

19141	   struct LAYOUTCOMMIT4args {
19142	           /* CURRENT_FH: file */
19143	           offset4                 loca_offset;
19144	           length4                 loca_length;
19145	           bool                    loca_reclaim;
19146	           newoffset4              loca_last_write_offset;
19147	           newtime4                loca_time_modify;
19148	           newtime4                loca_time_access;
19149	           layoutupdate4           loca_layoutupdate;
19150	   };

19152	17.42.3.  RESULT

19154	   union newsize4 switch (bool ns_sizechanged) {
19155	   case TRUE:
19156	           length4         ns_size;
19157	   case FALSE:
19158	           void;
19159	   };

19161	   struct LAYOUTCOMMIT4resok {
19162	           newsize4                locr_newsize;
19163	   };

19165	   union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) {
19166	   case NFS4_OK:
19167	           LAYOUTCOMMIT4resok      locr_resok4;
19168	   default:
19169	           void;
19170	   };

19172	17.42.4.  DESCRIPTION

19174	   Commits changes in the layout segment represented by the current file
19175	   handle, client ID (derived from the sessionid in the preceding
19176	   SEQUENCE operation), and octet range.  Since layout segments are sub-
19177	   dividable, a smaller portion of a layout segment, retrieved via
19178	   LAYOUTGET, may be committed.  The region being committed is specified
19179	   through the octet range (loca_offset and loca_length).

19181	   The LAYOUTCOMMIT operation indicates that the client has completed
19182	   writes using a layout obtained by a previous LAYOUTGET.  The client
19183	   may have only written a subset of the data range it previously
19184	   requested.  LAYOUTCOMMIT allows it to commit or discard provisionally
19185	   allocated space and to update the server with a new end of file.  The
19186	   layout segment referenced by LAYOUTCOMMIT is still valid after the
19187	   operation completes and can be continued to be referenced by the
19188	   client ID, filehandle, octet range, and layout type.

19190	   If the loca_reclaim field is set to TRUE, this indicates that the
19191	   client is attempting to commit changes to a layout after the reboot
19192	   of the metadata server during the metadata server's recovery grace
19193	   period.  This type of request may be necessary when the client has
19194	   uncommitted writes to provisionally allocated regions of a file which
19195	   were sent to the storage devices before the reboot of the metadata
19196	   server.  In this case the layout provided by the client MUST be a
19197	   subset of a writable layout that the client held immediately before
19198	   the reboot of the metadata server.  The metadata server is free to
19199	   accept or reject this request based on its own internal metadata
19200	   consistency checks.  If the metadata server finds that the layout
19201	   provided by the client does not pass its consistency checks, it MUST
19202	   reject the request with the status NFS4ERR_RECLAIM_BAD.  The
19203	   successful completion of the LAYOUTCOMMIT request with loca_reclaim
19204	   set to TRUE does NOT provide the client with a layout segment for the
19205	   file.  It simply commits the changes to the layout segment specified
19206	   in the loca_layoutupdate field.  To obtain a layout segment for the
19207	   file the client must issue a LAYOUTGET request to the server after
19208	   the server's grace period has expired.  If the metadata server
19209	   receives a LAYOUTCOMMIT request with loca_reclaim set to TRUE when
19210	   the metadata server is not in its recovery grace period, it MUST
19211	   reject the request with the status NFS4ERR_NO_GRACE.

19213	   Setting the loca_reclaim field to TRUE is required if and only if the
19214	   committed layout was acquired before the metadata server reboot.  If
19215	   the client is committing a layout segment that was acquired during
19216	   the metadata server's grace period, it MUST set the "reclaim" field
19217	   to FALSE.

19219	   The loca_last_write_offset field specifies the offset of the last
19220	   octet written by the client previous to the LAYOUTCOMMIT.  Note: this
19221	   value is never equal to the file's size (at most it is one octet less
19222	   than the file's size).  The metadata server may use this information
19223	   to determine whether the file's size needs to be updated.  If the
19224	   metadata server updates the file's size as the result of the
19225	   LAYOUTCOMMIT operation, it must return the new size
19226	   (locr_newsize.ns_size) as part of the results.

19228	   The loca_time_modify and loca_time_access [[Comment.19: If
19229	   LAYOUTCOMMIT is only for writes, then why update access time?]]
19230	   fields allow the client to suggest times it would like the metadata
19231	   server to set.  The metadata server may use these time values or it
19232	   may use the time of the LAYOUTCOMMIT operation to set these time
19233	   values.  If the metadata server uses the client provided times, it
19234	   should ensure time does not flow backwards.  If the client wants to
19235	   force the metadata server to set an exact time, the client should use
19236	   a SETATTR operation in a compound right after LAYOUTCOMMIT.  See
19237	   Section 12.5.3 for more details.  If the new client desires the
19238	   resultant mtime or atime, it should construct the COMPOUND so that a
19239	   GETATTR follows the LAYOUTCOMMIT.

19241	   The loca_layoutupdate argument to LAYOUTCOMMIT provides a mechanism
19242	   for a client to provide layout specific updates to the metadata
19243	   server.  For example, the layout update can describe what regions of
19244	   the original layout have been used and what regions can be
19245	   deallocated.  There is no NFSv4.1 file layout-specific layoutupdate4
19246	   structure.

19248	   The layout information is more verbose for block devices than for
19249	   objects and files because the latter two hide the details of block
19250	   allocation behind their storage protocols.  At the minimum, the
19251	   client needs to communicate changes to the end of file location back
19252	   to the server, and, if desired, its view of the file modify and
19253	   access time.  For block/volume layouts, it needs to specify precisely
19254	   which blocks have been used.

19256	   If the layout segment identified in the arguments does not exist, the
19257	   error NFS4ERR_BADLAYOUT is returned.  The layout segment being
19258	   committed may also be rejected if it does not correspond to an
19259	   existing layout with an iomode of LAYOUTIOMODE4_RW.

19261	   On success, the current filehandle retains its value.

19263	17.42.5.  IMPLEMENTATION

19265	   Optionally, the client can also use LAYOUTCOMMIT with the
19266	   loca_reclaim field set to TRUE to convey hints to modified file
19267	   attributes or to report layout-type specific information such as I/O
19268	   errors for object-based storage layouts, as normally done during
19269	   normal operation.  Doing so may help the metadata server to recover
19270	   files more efficiently after reboot.  For example, some file system
19271	   implementations may require expansive recovery of file system objects
19272	   if the metadata server does not get a positive indication from all
19273	   clients holding a write layout that they have successfully completed
19274	   all their writes.  Sending a LAYOUTCOMMIT (if required) and then
19275	   following with LAYOUTRETURN can provide such an indication and allow
19276	   for graceful and efficient recovery.

19278	17.43.  Operation 50: LAYOUTGET - Get Layout Information

19280	17.43.1.  SYNOPSIS

19282	   (cfh), signal_avail, layout_type, iomode, offset,
19283	          length, minlength, maxcount -> layout example synopsis

19285	17.43.2.  ARGUMENT

19287	   struct LAYOUTGET4args {
19288	           /* CURRENT_FH: file */
19289	           bool                    loga_signal_layout_avail;
19290	           layouttype4             loga_layout_type;
19291	           layoutiomode4           loga_iomode;
19292	           offset4                 loga_offset;
19293	           length4                 loga_length;
19294	           length4                 loga_minlength;
19295	           count4                  loga_maxcount;
19296	   };

19298	17.43.3.  RESULT

19300	   struct LAYOUTGET4resok {
19301	           bool               logr_return_on_close;
19302	           layout4            logr_layout;
19303	   };

19305	   union LAYOUTGET4res switch (nfsstat4 logr_status) {
19306	   case NFS4_OK:
19307	           LAYOUTGET4resok     logr_resok4;
19308	   case NFS4ERR_LAYOUTTRYLATER:
19309	           bool                logr_will_signal_layout_avail;
19310	   default:
19311	           void;
19312	   };

19314	17.43.4.  DESCRIPTION

19316	   Requests a layout segment from the metadata server for reading or
19317	   writing (and reading) the file given by the filehandle at the octet
19318	   range specified by offset and length.  Layouts are identified by the
19319	   client ID (derived from the sessionid in the preceding SEQUENCE
19320	   operation), current filehandle, and layout type (loga_layout_type).
19321	   The use of the loga_iomode depends upon the layout type, but should
19322	   reflect the client's data access intent.

19324	   If the metadata server is in a grace period, and does not persist
19325	   layout segments and device ID to device address mappings, then it
19326	   MUST return NFS4ERR_GRACE (see Section 8.6.2.1).

19328	   The LAYOUTGET operation returns layout information for the specified
19329	   octet range: a layout segment.  To get a layout segment from a
19330	   specific offset through the end-of-file, regardless of the file's
19331	   length, a loga_length field with all bits set to 1 (one) should be
19332	   used.  If loga_length is zero, or if a loga_length which is not all
19333	   bits set to one is specified, and loga_length when added to
19334	   loga_offset exceeds the maximum 64-bit unsigned integer value, the
19335	   error NFS4ERR_INVAL will result.

19337	   The loga_minlength field specifies the minimum size overlap with the
19338	   requested offset and length that is to be returned.  If this
19339	   requirement cannot be met, no layout must be returned; the error
19340	   NFS4ERR_LAYOUTTRYLATER can be returned.

19342	   The loga_maxcount field specifies the maximum layout size (in octets)
19343	   that the client can handle.  If the size of the layout structure
19344	   exceeds the size specified by maxcount, the metadata server will
19345	   return the NFS4ERR_TOOSMALL error.

19347	   As well, the metadata server may adjust the range of the returned
19348	   layout segment based on striping patterns and usage implied by the
19349	   loga_iomode.  The client must be prepared to get a layout segment
19350	   that does not line up exactly with its request; there MUST be at
19351	   least an overlap of loga_minlength between the layout returned by the
19352	   server and the client's request, or the server SHOULD reject the
19353	   request.  See Section 12.5.2 for more details.

19355	   The metadata server may also return a layout segment with an
19356	   lo_iomode other than that requested by the client.  If it does so, it
19357	   must ensure that the lo_iomode is more permissive than the
19358	   loga_iomode requested.  E.g., this allows an implementation to
19359	   upgrade read-only requests to read/write requests at its discretion,
19360	   within the limits of the layout type specific protocol.  A lo_iomode
19361	   of either LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW must be returned.

19363	   The logr_return_on_close result field is a directive to return the
19364	   layout before closing the file.  When the server sets this return
19365	   value to TRUE, it must be prepared to recall the layout in the case
19366	   the client fails to return the layout before close.  For the server
19367	   that knows a layout must be returned before a close of the file, this
19368	   return value can be used to communicate the desired behavior to the
19369	   client and thus removing one extra step from the client and server's
19370	   interaction.

19372	   The format of the returned layout (lo_content) is specific to the
19373	   underlying file system.  Layout types other than the NFSv4.1 file
19374	   layout type are specified outside this document.

19376	   If layouts are not supported for the requested file or its containing
19377	   file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE.  If
19378	   the layout type is not supported, the metadata server should return
19379	   NFS4ERR_UNKNOWN_LAYOUTTYPE.  If layouts are supported but no layout
19380	   matches the client provided layout identification, the server should
19381	   return NFS4ERR_BADLAYOUT.  If an invalid loga_iomode is specified, or
19382	   a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should
19383	   return NFS4ERR_BADIOMODE.

19385	   If the layout for the file is unavailable due to transient
19386	   conditions, e.g. file sharing prohibits layouts, the server must
19387	   return NFS4ERR_LAYOUTTRYLATER.

19389	   If the layout request is rejected due to an overlapping layout
19390	   recall, the server must return NFS4ERR_RECALLCONFLICT.  See
19391	   Section 12.5.4.2 for details.

19393	   If the layout conflicts with a mandatory octet range lock held on the
19394	   file, and if the storage devices have no method of enforcing
19395	   mandatory locks, other than through the restriction of layouts, the
19396	   metadata server should return NFS4ERR_LOCKED.

19398	   If client sets loga_signal_layout_avail to TRUE, then it is
19399	   registering with the client a "want" for a layout in the event the
19400	   layout cannot be obtained due to resource exhaustion.  If the server
19401	   supports and will honor the "want", the results will have
19402	   logr_will_signal_layout_avail set to TRUE.  If so the client should
19403	   expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a layout
19404	   is available.

19406	   On success, the current filehandle retains its value.

19408	17.43.5.  IMPLEMENTATION

19410	   Typically, LAYOUTGET will be called as part of a compound RPC after
19411	   an OPEN operation and results in the client having location
19412	   information for the file; a client may also hold a layout across
19413	   multiple OPENs.  The client specifies a layout type that limits what
19414	   kind of layout the server will return.  This prevents servers from
19415	   issuing layouts that are unusable by the client.

19417	17.44.  Operation 51: LAYOUTRETURN - Release Layout Information

19419	17.44.1.  SYNOPSIS

19421	   (cfh), layout_type, iomode, layoutreturn, reclaim -> -

19423	17.44.2.  ARGUMENT

19425	   struct LAYOUTRETURN4args {
19426	           /* CURRENT_FH: file */
19427	           bool                    lora_reclaim;
19428	           layouttype4             lora_layout_type;
19429	           layoutiomode4           lora_iomode;
19430	           layoutreturn4           lora_layoutreturn;
19431	   };

19433	17.44.3.  RESULT

19435	   struct LAYOUTRETURN4res {
19436	           nfsstat4            lorr_status;
19437	   };

19439	17.44.4.  DESCRIPTION

19441	   Returns one or more layouts or layout segments represented by the
19442	   client ID (derived from the sessionid in the preceding SEQUENCE
19443	   operation), lora_layout_type, and lora_iomode.  When layoutreturn is
19444	   LAYOUTRETURN4_FILE the returned layout segment is further identified
19445	   by the current filehandle, lrf_offset, and lrf_length.  When
19446	   layoutreturn is LAYOUTRETURN4_FSID the current filehandle is used to
19447	   identify the file system and all layouts or layout segments matching
19448	   the client ID, lora_layout_type, and lora_iomode are returned.  When
19449	   layoutreturn is LAYOUTRETURN4_ALL all layouts or layout segments
19450	   matching the client ID, lora_layout_type, and lora_iomode are
19451	   returned and the current filehandle is not used.  After this call,
19452	   the client MUST NOT use the returned layout segment(s) or layout(s)
19453	   and the associated storage protocol to access the file data.  A
19454	   layout segment being returned may be a subdivision of a layout
19455	   segment previously fetched through LAYOUTGET.  As well, it may be a
19456	   subset or superset of a layout segment specified by CB_LAYOUTRECALL.
19457	   However, if it is a subset, the recall is not complete until the full
19458	   recalled scope (LAYOUTRETURN4_FILE octet range, LAYOUTRETURN4_FSID,
19459	   or LAYOUTRETURN4_ALL) has been returned.  It is also permissible, and
19460	   no error should result, for a client to return a octet range covering
19461	   a layout it does not hold.  If the lrf_length is all 1s, the layout
19462	   covers the range from lrf_offset to EOF.  An iomode of
19463	   LAYOUTIOMODE4_ANY specifies that all layouts that match the other
19464	   arguments to LAYOUTRETURN (i.e., client ID, lora_layout_type, and one
19465	   of current filehandle and range; fsid derived from current
19466	   filehandle; or LAYOUTRETURN4_ALL) are being returned.

19468	   When lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL
19469	   the client also invalidates all the storage device ID to storage
19470	   device address in the affected file system(s).  Any device ID
19471	   returned by a subsequent LAYOUTGET in the affected file system(s)
19472	   will have to be resolved using either GETDEVICEINFO or GETDEVICELIST.

19474	   The lora_reclaim field set to TRUE in a LAYOUTRETURN request
19475	   specifies that the client is attempting to return a layout that was
19476	   acquired before the reboot of the metadata server during the metadata
19477	   server's grace period.  When returning layouts that were acquired
19478	   during the metadata server's grace period MUST set the lora_reclaim
19479	   field to FALSE.  The lora_reclaim field MUST be set to FALSE also
19480	   when lr_layoutreturn is LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL.  See
19481	   LAYOUTCOMMIT (Section 17.42) for more details.

19483	   Layouts may be returned when recalled or voluntarily (i.e., before
19484	   the server has recalled them).  In either case the client must
19485	   properly propagate state changed under the context of the layout to
19486	   the storage device(s)or to the metadata server before returning the
19487	   layout.

19489	   If a client fails to return a layout in a timely manner, then the
19490	   metadata server should use its control protocol with the storage
19491	   devices to fence the client from accessing the data referenced by the
19492	   layout.  See Section 12.5.4 for more details.

19494	   If the layout identified in the arguments does not exist, the error
19495	   NFS4ERR_BADLAYOUT is returned.  If a layout exists, but the iomode
19496	   does not match, NFS4ERR_BADIOMODE is returned.

19498	   If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after
19499	   the metadata server's grace period, NFS4ERR_NO_GRACE is returned.

19501	   If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and
19502	   lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL,
19503	   NFS4ERR_INVAL is returned.

19505	   On success, the current filehandle retains its value.

19507	   [[Comment.20: Should LAYOUTRETURN be modified to handle FSID
19508	   callbacks?]]

19510	17.44.5.  IMPLEMENTATION

19512	   The final LAYOUTRETURN operation in response to a CB_LAYOUTRECALL
19513	   callback MUST be serialized with any outstanding, intersecting
19514	   LAYOUTRETURN operations.  Note that it is possible that while a
19515	   client is returning the layout for some recalled range the server may
19516	   recall a superset of that range (e.g.  LAYOUTRECALL4_ALL); the final
19517	   return operation for the latter must block until the former layout
19518	   recall is done - when its corresponding final return operation is
19519	   replied.

19521	   Returning all layouts in a file system using LAYOUTRETURN4_FSID is
19522	   typically done in response to a CB_LAYOUTRECALL for that file system
19523	   as the final return operation.  Similarly, LAYOUTRETURN4_ALL is used
19524	   in response to a recall callback for all layouts.  It is possible
19525	   that the client already returned some outstanding layouts via
19526	   individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or
19527	   LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence.  See
19528	   Section 12.5.4.1 for more details.

19530	17.45.  Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object

19532	   Obtain available security mechanisms with the use of the parent of an
19533	   object or the current filehandle.

19535	17.45.1.  SYNOPSIS

19537	   (cfh), secinfo_style -> { secinfo }

19539	17.45.2.  ARGUMENT

19541	   enum secinfo_style4 {
19542	           SECINFO_STYLE4_CURRENT_FH       = 0,
19543	           SECINFO_STYLE4_PARENT           = 1
19544	   };

19546	   typedef secinfo_style4 SECINFO_NO_NAME4args;

19548	17.45.3.  RESULT

19550	   typedef SECINFO4res SECINFO_NO_NAME4res;

19552	17.45.4.  DESCRIPTION

19554	   Like the SECINFO operation, SECINFO_NO_NAME is used by the client to
19555	   obtain a list of valid RPC authentication flavors for a specific file
19556	   object.  Unlike SECINFO, SECINFO_NO_NAME only works with objects are
19557	   accessed by filehandle.

19559	   There are two styles of SECINFO_NO_NAME, as determined by the value
19560	   of the secinfo_style4 enumeration.  If SECINFO_STYLE4_CURRENT_FH is
19561	   passed, then SECINFO_NO_NAME is querying for the required security
19562	   for the current filehandle.  If SECINFO_STYLE4_PARENT is passed, then
19563	   SECINFO_NO_NAME is querying for the required security of the current
19564	   filehandles's parent.  If the style selected is
19565	   SECINFO_STYLE4_PARENT, then SECINFO should apply the same access
19566	   methodology used for LOOKUPP when evaluating the traversal to the
19567	   parent directory.  Therefore, if the requester does not have the
19568	   appropriate access to LOOKUPP the parent then SECINFO_NO_NAME must
19569	   behave the same way and return NFS4ERR_ACCESS.

19571	   Note that if PUTFH, PUTPUBFH, or PUTROOTFH return NFS4ERR_WRONGSEC,
19572	   this is tantamount to the server asserting that the client will have
19573	   to guess what the required security is, because there is no way to
19574	   query.  Therefore, the client must iterate through the security
19575	   triples available at the client and reattempt the PUTFH, PUTROOTFH or
19576	   PUTPUBFH operation.  In the unfortunate event none of the MANDATORY
19577	   security triples are supported by the client and server, the client
19578	   SHOULD try using others that support integrity.  Failing that, the
19579	   client can try using other forms (e.g.  AUTH_SYS and AUTH_NONE), but
19580	   because such forms lack integrity checks, this puts the client at
19581	   risk.

19583	   The server implementor should pay particular attention to Section 2.6
19584	   for instructions on avoiding NFS4ERR_WRONGSEC error returns from
19585	   PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH.

19587	   Everything else about SECINFO_NO_NAME is the same as SECINFO.  See
19588	   the discussion on SECINFO (Section 17.29.4).

19590	17.45.5.  IMPLEMENTATION

19592	   See the discussion on SECINFO (Section 17.29.5).

19594	17.46.  Operation 53: SEQUENCE - Supply per-procedure sequencing and
19595	        control

19597	   Supply per-procedure sequencing and control

19599	17.46.1.  SYNOPSIS

19601	   control -> control

19603	17.46.2.  ARGUMENT

19605	   struct SEQUENCE4args {
19606	           sessionid4     sa_sessionid;
19607	           sequenceid4    sa_sequenceid;
19608	           slotid4        sa_slotid;
19609	           slotid4        sa_highest_slotid;
19610	           bool           sa_cachethis;
19611	   };

19613	17.46.3.  RESULT

19615	   const SEQ4_STATUS_CB_PATH_DOWN                  = 0x00000001;
19616	   const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING      = 0x00000002;
19617	   const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED       = 0x00000004;
19618	   const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED     = 0x00000008;
19619	   const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED    = 0x00000010;
19620	   const SEQ4_STATUS_ADMIN_STATE_REVOKED           = 0x00000020;
19621	   const SEQ4_STATUS_RECALLABLE_STATE_REVOKED      = 0x00000040;
19622	   const SEQ4_STATUS_LEASE_MOVED                   = 0x00000080;
19623	   const SEQ4_STATUS_RESTART_RECLAIM_NEEDED        = 0x00000100;

19625	   struct SEQUENCE4resok {
19626	           sessionid4      sr_sessionid;
19627	           sequenceid4     sr_sequenceid;
19628	           slotid4         sr_slotid;
19629	           slotid4         sr_highest_slotid;
19630	           slotid4         sr_target_highest_slotid;
19631	           uint32_t        sr_status_flags;
19632	   };

19634	   union SEQUENCE4res switch (nfsstat4 sr_status) {
19635	   case NFS4_OK:
19636	           SEQUENCE4resok  sr_resok4;
19637	   default:
19638	           void;
19639	   };

19641	17.46.4.  DESCRIPTION

19643	   The SEQUENCE operation is used to manage operational accounting for
19644	   the session on which the operation is sent.  The contents include the
19645	   client and session to which this request belongs, slotid and
19646	   sequenceid, used by the server to implement session request control
19647	   and the duplicate reply cache semantics, and exchanged slot counts
19648	   which are used to adjust these values.

19650	   This operation MUST appear as the first operation of any COMPOUND in
19651	   which it appears.  The error NFS4ERR_SEQUENCE_POS will be returned
19652	   when if it is found in any position in a COMPOUND beyond the first.
19653	   Operations other than SEQUENCE, BIND_CONN_TO_SESSION, EXCHANGE_ID,
19654	   CREATE_SESSION, and DESTROY_SESSION, may not appear as the first
19655	   operation in a COMPOUND.  Such operations will get the error
19656	   NFS4ERR_OP_NOT_IN_SESSION if they do appear at the start of a
19657	   COMPOUND.

19659	   If SEQUENCE is received on a connection not bound to the session via
19660	   CREATE_SESSION or BIND_CONN_TO_SESSION, and the client specified
19661	   connecting binding enforcement when the session was created (see
19662	   Section 17.36), then the server returns
19663	   NFS4ERR_CONN_NOT_BOUND_TO_SESSION.

19665	   If sa_cachethis is TRUE, then the client is requesting that the
19666	   server cache the reply in the server's reply cache.  The server MUST
19667	   cache the reply (see Section 2.10.4.1.2).

19669	   The response to the SEQUENCE operation contains a word of status
19670	   flags (sr_status_flags) that that can provide to the client
19671	   information related to the status of the client's lock state and
19672	   communications paths.  Note that any status bits relating to lock
19673	   state are MAY reset when lock state is lost due to a server reboot or
19674	   the establishment of a new client instance.  Note that if the client
19675	   ID implied by sa_sessionid was established with

19677	   (
19678	    eir_flags
19679	    & (
19680	         EXCHGID4_FLAG_USE_PNFS_DS
19681	       | EXCHGID4_FLAG_USE_PNFS_MDS
19682	       | EXCHGID4_FLAG_USE_NON_PNFS
19683	      )
19684	   ) ==  EXCHGID4_FLAG_USE_PNFS_DS)

19686	   in the EXCHANGE_ID results (i.e the client ID is only for data
19687	   servers), then sr_status_flags MUST always be zero.

19689	   SEQ4_STATUS_CB_PATH_DOWN
19690	      When set, indicates that the client has no operational callback
19691	      path, making it necessary for the client to re-establish one,
19692	      return his recallable locks, or both.  This bit remains set until
19693	      the callback path is again available.

19695	   SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING
19696	      When set, indicates that the GSS contexts to be used for callbacks
19697	      are expected to expire within a period equal to the lease time.
19698	      This bit remains set until the expiration time of the contexts is
19699	      beyond the lease period from the current time.

19701	   SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED
19702	      When set, indicates the GSS contexts to be used for callbacks have
19703	      expired.  This bit remains set until new non-expired contexts are
19704	      provided.

19706	   SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED
19707	      When set, indicates that the lease has expired and as a result the
19708	      server released all of the client's locking state.  This status
19709	      bit remains set until the loss of all such locks has been
19710	      acknowledged by use of FREE_BADLOCK, or by establishing a new
19711	      client instance by destroying all sessions (via DESTROY_SESSION),
19712	      the client ID (via DESTROY_CLIENT), and then invoking EXCHANGE_ID
19713	      and CREATE_SESSION to establish a new client ID.

19715	   SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED
19716	      When set indicates that some subset of the client's locks have
19717	      been revoked due to expiration of the lease period followed by
19718	      another client's conflicting lock request.  This status bit
19719	      remains set until the loss of all such locks has been acknowledged
19720	      by use of FREE_BADLOCK.

19722	   SEQ4_STATUS_ADMIN_STATE_REVOKED
19723	      When set indicates that one or more locks have been revoked
19724	      without expiration of the lease period, due to administrative
19725	      action.  This status bit remains set until the loss of all such
19726	      locks has been acknowledged by use of FREE_BADLOCK.

19728	   SEQ4_STATUS_RECALLABLE_STATE_REVOKED
19729	      When set indicates that one or more recallable locks have been
19730	      revoked without expiration of the lease period, due to the
19731	      client's failure to return them when recalled.  This status bit
19732	      remains set until the loss of all such locks has been acknowledged
19733	      by use of FREE_BADLOCK.

19735	   SEQ4_STATUS_LEASE_MOVED
19736	      When set indicates that responsibility for lease renewal has been
19737	      transferred to one or more new servers.  This condition will
19738	      continue until the client receives an NFS4ERR_MOVED error and the
19739	      server receives the subsequent GETATTR for the fs_locations or
19740	      fs_locations_info attribute for an access to each file system for
19741	      which a lease has been moved to a new server.

19743	   SEQ4_STATUS_RESTART_RECLAIM_NEEDED
19744	      When set indicates that due to server retart or reboot.  The
19745	      reason SEQ4_STATUS_RESTART_RECLAIM_NEEDED is not reset after
19746	      server restart or reboot is that the the session and client ID
19747	      have persisted (usually due the CREATE_SESSION result having
19748	      returned the CREATE_SESSION4_FLAG_PERSIST flag in csr_flags), all
19749	      other leased state has been lost.  The client must reclaim the
19750	      lost state via the procedure described in Section 8.6.2, although
19751	      re-establishing a clientid and session is neither necessary nor
19752	      recommended.

19754	   If the difference between sa_sequenceid and the sequenceid the server
19755	   has for the slot is two (2) or more, then server MUST return
19756	   NFS4ERR_SEQ_MISORDERED.  If sa_sequenceid is less than the server's
19757	   cached sequenceid (accounting for wraparound of the unsigned
19758	   sequenceid value), then the server MUST return
19759	   NFS4ERR_SEQ_MISORDERED.  If sa_sequenceid and the cached sequenceid
19760	   are the same, this is a replay, and the server returns the response
19761	   to the COMPOUND that is cached.  Otherwise, if sa_sequenceid is one
19762	   greater (accounting for wraparound) than the cached sequenceid, then
19763	   this is a new request, and the slot's sequenceid is incremented.  The
19764	   operations subsequent to SEQUENCE, if any, are processed.  If there
19765	   are no other operations, the only other effects are to cache the
19766	   SEQUENCE reply in the slot, maintain the session's activity, and
19767	   renew the lease of state related to the client ID.

19769	   If SEQUENCE returns an error, then the state of the slot (sequenceid,
19770	   cached reply) is not changed, nor is the associated lease renewed.

19772	   If SEQUENCE returns NFS4_OK, then the associated lease is renewed,
19773	   except if SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in the
19774	   status word.

19776	   The server returns two "highest_slotid" values: sr_highest_slotid,
19777	   and sr_target_highest_slotid.  The former is the highest slotid the
19778	   server will accept in future SEQUENCE operation, and must not be less
19779	   than the the value of sa_highest_slotid.  The latter is the highest
19780	   slotid the server would prefer the client use on a future SEQUENCE
19781	   operation.

19783	17.46.5.  IMPLEMENTATION

19785	   The server MUST maintain a mapping of sessionid to client ID in order
19786	   to validate any operations that follow SEQUENCE that take a stateid
19787	   as an argument and/or result.

19789	   If the client establishes a persistent session, then the server MUST
19790	   also persist the client ID, such that it is valid through server
19791	   reboot or restart.  If the session and client ID are not persistent,
19792	   then in the event of server reboot or restart, if the client ID is no
19793	   longer valid, upon encountering an sa_sessionid that maps to a stale
19794	   client ID, the server SHOULD return NFS4ERR_STATE_CLIENTID, which
19795	   indicates that both the client ID and sessionid are stale.

19797	   The server's implementation constraints may require constructing a
19798	   sessionid such that it is impossible to discern a sessionid that is
19799	   invalid due to malformation from one that is invalid due to server
19800	   restart.  In that event, when the client receives NFS4ERR_BADSESSION,
19801	   it may check for stale client ID by issuing a CREATE_SESSION with the
19802	   client ID.  If CREATE_SESSION succeeds, the client has a session to
19803	   use, and it MAY retry the original COMPOUND with the new sessionid
19804	   (unless SEQ4_STATUS_RESTART_RECLAIM_NEEDED is returned in
19805	   sr_status_flags; in which case the client MUST first reclaim state as
19806	   described in Section 8.6.2.1).

19808	17.47.  Operation 54: SET_SSV

19810	17.47.1.  SYNOPSIS

19812	   ssv, digest -> digest

19814	17.47.2.  ARGUMENT

19816	   struct ssa_digest_input4 {
19817	           SEQUENCE4args sdi_seqargs;
19818	   };

19820	   struct SET_SSV4args {
19821	           opaque          ssa_ssv<>;
19822	           opaque          ssa_digest<>;
19823	   };

19825	17.47.3.  RESULT

19827	   struct ssr_digest_input4 {
19828	           SEQUENCE4res sdi_seqres;
19829	   };

19831	   struct SET_SSV4resok {
19832	           opaque          ssr_digest<>;
19833	   };

19835	   union SET_SSV4res switch (nfsstat4 ssr_status) {
19836	   case NFS4_OK:
19837	           SET_SSV4resok   ssr_resok4;
19838	   default:
19839	           void;
19840	   };

19842	17.47.4.  DESCRIPTION

19844	   This operation is used to set or update the SSV for a session.  It
19845	   MUST be preceded by SEQUENCE in the same COMPOUND.  It MUST be
19846	   invoked only on a connection bound to the session.  It MUST NOT be
19847	   used if the client did not enable connecting binding enforcement when
19848	   the session was created (see Section 17.36); the server returns
19849	   NFS4ERR_OP_CONN_BINDING_NOT_ENFORCED in that case.  If the client
19850	   enabled connection binding enforcement, then SET_SSV MUST be invoked
19851	   at least once prior to a BIND_CONN_TO_SESSION operation.

19853	   ssa_digest is computed as the output of the HMAC RFC2104 [14] using
19854	   the current SSV as the key, and an XDR encoded value of data type
19855	   ssa_digest_input4.  The field sdi_seqargs is equal to the arguments
19856	   of the SEQUENCE operation for the COMPOUND procedure that SET_SSV is
19857	   within.

19859	   The ssa_ssv is XORed with the current SSV to produce the new SSV.

19861	   In the response, ssr_digest is the output of the HMAC using the new
19862	   SSV as the key, and an XDR encoded value of data type
19863	   ssr_digest_input4.  The field sdi_seqres is equal to the results of
19864	   the SEQUENCE operation for the COMPOUND procedure that SET_SSV is
19865	   within.

19867	17.47.5.  IMPLEMENTATION

19869	   When the server receives ssa_digest, it MUST verify the digest by
19870	   computing the digest the same way the client did and comparing it
19871	   with ssa_digest.  If the server gets a different result, this is an
19872	   error, NFS4ERR_BAD_SESSION_DIGEST.  Generally, in order to change the
19873	   SSV or bind new connections to the session, the client has no
19874	   recourse but to recreate the session with CREATE_SESSION.  However,
19875	   the IMPLEMENTATION section BIND_CONN_TO_SESSION describes a scenario
19876	   where a client can legitimately get NFS4ERR_BAD_SESSION_DIGEST for a
19877	   SET_SSV, and how to recover from it.

19879	   Clients SHOULD NOT send an ssa_ssv that is equal to a previous
19880	   ssa_ssv, nor equal to a previous SSV.

19882	   Clients SHOULD issue SET_SSV with RPCSEC_GSS privacy.  Servers MUST
19883	   support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE,
19884	   SET_SSV }.

19886	17.48.  Operation 55: TEST_STATEID - Test stateids for validity

19888	   Test a series of stateids for validity.

19890	17.48.1.  SYNOPSIS

19892	   stateids<> -> error_codes<>

19894	17.48.2.  ARGUMENT

19896	   struct TEST_STATEID4args {
19897	           stateid4        ts_stateids<>;
19898	   };

19900	17.48.3.  RESULT

19902	   struct TEST_STATEID4resok {
19903	           nfsstat4        tsr_status_codes<>;
19904	   };

19906	   union TEST_STATEID4res switch (nfsstat4 tsr_status) {
19907	       case NFS4_OK:
19908	           TEST_STATEID4resok tsr_resok4;
19909	       default:
19910	           void;
19911	   };

19913	17.48.4.  DESCRIPTION

19915	   The TEST_STATEID operation is used to check the validity of a set of
19916	   stateids.  It is intended to be used when the client receives an
19917	   indication that one or more of its stateids have been invalidated due
19918	   to lock revocation.  TEST_STATEID allows a large set of such stateids
19919	   to be tested and allows problems with earlier stateids not to
19920	   interfere with checking of subsequent ones as would happen if
19921	   individual stateids are tested by operation in a COMPOUND.

19923	   For each stateid, the server provides the status code that would be
19924	   returned if that stateid were to be used in normal operation.
19925	   Returning such an status indication is not an error and does not
19926	   cause processing to terminate.  Checks for the validity of the
19927	   stateid proceed as they would for normal operations with two
19928	   exceptions.  There is no check for the type of stateid object, as
19929	   would be the case for normal and there is no reference to the current
19930	   filehandle.

19932	   The errors which are validly returned within the status_code array
19933	   are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_EXPIRED,
19934	   NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED.

19936	17.48.5.  IMPLEMENTATION

19938	   No discussion at this time.

19940	17.49.  Operation 56: WANT_DELEGATION

19942	17.49.1.  SYNOPSIS

19944	   (cfh), (client ID) -> stateid, delegation

19946	17.49.2.  ARGUMENT

19948	   union deleg_claim4 switch (open_claim_type4 dc_claim) {
19949	   /*
19950	    * No special rights to object. Ordinary delegation
19951	    * request of the specified object. Object identified
19952	    * by filehandle.
19953	    */
19954	   case CLAIM_FH: /* new to v4.1 */
19955	           void;

19957	   /*
19958	    * Right to file based on a delegation granted to a previous boot
19959	    * instance of the client.  File is specified by filehandle.
19960	    */
19961	   case CLAIM_DELEG_PREV_FH: /* new to v4.1 */
19962	           /* CURRENT_FH: file being opened */
19963	           void;

19965	   /*
19966	    * Right to the file established by an open previous to server
19967	    * reboot.  File identified by filehandle.
19968	    * Used during server reclaim grace period.
19969	    */
19970	   case CLAIM_PREVIOUS:
19971	           /* CURRENT_FH: file being reclaimed */
19972	           open_delegation_type4   dc_delegate_type;
19973	   };

19975	   struct WANT_DELEGATION4args {
19976	           uint32_t        wda_want;
19977	           deleg_claim4    wda_claim;
19978	   };

19980	17.49.3.  RESULT

19982	   union WANT_DELEGATION4res switch (nfsstat4 wdr_status) {
19983	   case NFS4_OK:
19984	           open_delegation4 wdr_resok4;
19985	   default:
19986	           void;
19987	   };

19989	17.49.4.  DESCRIPTION

19991	   Where this description mandates the return of a specific error code
19992	   for a specific condition, and where multiple conditions apply, the
19993	   server MAY return any of the mandated error codes.

19995	   This operation allows a client to get a delegation on all types of
19996	   files except directories.  The server MAY support this operation.  If
19997	   the server does not support this operation, it MUST return
19998	   NFS4ERR_NOTSUPP.  This operation also allows the client to register a
19999	   "want" for a delegation for the specified file object, and be
20000	   notified via a callback when the delegation is available.  The server
20001	   MAY support notifications of availability via callbacks.  If the
20002	   server does not support registration of wants it MUST NOT return an
20003	   error to indicate that.

20005	   The client SHOULD NOT set OPEN4_SHARE_ACCESS_READ and SHOULD NOT set
20006	   OPEN4_SHARE_ACCESS_WRITE in wda_want.  If it does, the server MUST
20007	   ignore them.

20009	   The meanings of the following flags in wda_want are the same as they
20010	   are in OPEN:

20012	   OPEN4_SHARE_ACCESS_WANT_READ_DELEG

20014	   OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG

20016	   OPEN4_SHARE_ACCESS_WANT_ANY_DELEG

20018	   OPEN4_SHARE_ACCESS_WANT_NO_DELEG

20020	   OPEN4_SHARE_ACCESS_WANT_CANCEL

20022	   OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL

20024	   OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED

20026	   The handling of the above flags in WANT_DELEGATION is the same as in
20027	   OPEN.

20029	   A request for a conflicting delegation MUST NOT trigger the recall of
20030	   the existing delegation.

20032	   The successful results of WANT_DELEG are of type open_delegation4
20033	   which is the same type as the "delegation" field in the results of
20034	   the OPEN operation.  The server constructs wdr_resok4 the same way it
20035	   constructs OPEN's "delegation" with one differences: WANT_DELEGATION
20036	   MUST NOT return a delegation type of OPEN_DELEGATE_NONE.  As with
20037	   OPEN, if (wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is zero then
20038	   the client is indicating no desire for a delegation and the server
20039	   MAY or MAY not return a delegation in the WANT_DELEG response.

20041	17.49.5.  IMPLEMENTATION

20043	   TBD

20045	17.50.  Operation 57: DESTROY_CLIENTID - Destroy existing client ID

20047	   Destroy existing client ID.

20049	17.50.1.  SYNOPSIS

20051	   client ID -> -

20053	17.50.2.  ARGUMENT

20055	   struct DESTROY_CLIENTID4args {
20056	           clientid4       dca_clientid;
20057	   };

20059	17.50.3.  RESULT

20061	   struct DESTROY_CLIENTID4res {
20062	           nfsstat4        dcr_status;
20063	   };

20065	17.50.4.  DESCRIPTION

20067	   The DESTROY_CLIENTID operation destroys the client ID if there are no
20068	   sessions, opens, locks, delegations, layouts, and wants, associated
20069	   with the client ID.

20071	   If the COMPOUND request starts with SEQUENCE, then the session
20072	   identified in SEQUENCE must not be one bound to the client ID
20073	   identified in DESTROY_CLIENTID or the DESTROY_CLIENTID operation will
20074	   fail because there is still a session bound to the client ID.
20075	   DESTROY_CLIENTID MAY be the only operation in a COMPOUND request.

20077	   Note that because the operation can be sent outside of a session, a
20078	   client that retransmits the request may receive an error in response,
20079	   because though the original request resulted in the successful
20080	   destruction of the client ID.

20082	17.50.5.  IMPLEMENTATION

20084	   DESTROY_CLIENTID allows a server to immediately reclaim the resources
20085	   consumed by an unsued client ID, and also to forget that it ever
20086	   generated the client ID.  By forgetting it ever generated the the
20087	   client ID the server can safely reuse the client ID on a future
20088	   EXCHANGE_ID operation.

20090	17.51.  Operation 10044: ILLEGAL - Illegal operation

20092	17.51.1.  SYNOPSIS

20094	   -> ()

20096	17.51.2.  ARGUMENTS

20098	   void;

20100	17.51.3.  RESULTS

20102	   /*
20103	    * ILLEGAL: Response for illegal operation numbers
20104	    */
20105	   struct ILLEGAL4res {
20106	           nfsstat4        status;
20107	   };

20109	17.51.4.  DESCRIPTION

20111	   This operation is a placeholder for encoding a result to handle the
20112	   case of the client sending an operation code within COMPOUND that is
20113	   not supported.  See the COMPOUND procedure description for more
20114	   details.

20116	   The status field of ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.

20118	17.51.5.  IMPLEMENTATION

20120	   A client will probably not send an operation with code OP_ILLEGAL but
20121	   if it does, the response will be ILLEGAL4res just as it would be with
20122	   any other invalid operation code.  Note that if the server gets an
20123	   illegal operation code that is not OP_ILLEGAL, and if the server
20124	   checks for legal operation codes during the XDR decode phase, then
20125	   the ILLEGAL4res would not be returned.

20127	18.  NFS version 4.1 Callback Procedures

20129	   The procedures used for callbacks are defined in the following
20130	   sections.  In the interest of clarity, the terms "client" and
20131	   "server" refer to NFS clients and servers, despite the fact that for
20132	   an individual callback RPC, the sense of these terms would be
20133	   precisely the opposite.

20135	18.1.  Procedure 0: CB_NULL - No Operation

20137	18.1.1.  SYNOPSIS

20139	18.1.2.  ARGUMENTS

20141	   void;

20143	18.1.3.  RESULTS

20145	   void;

20147	18.1.4.  DESCRIPTION

20149	   Standard NULL procedure.  Void argument, void response.  Even though
20150	   there is no direct functionality associated with this procedure, the
20151	   server will use CB_NULL to confirm the existence of a path for RPCs
20152	   from server to client.

20154	18.1.5.  ERRORS

20156	   None.

20158	18.2.  Procedure 1: CB_COMPOUND - Compound Operations

20160	18.2.1.  SYNOPSIS

20162	   compoundargs -> compoundres

20164	18.2.2.  ARGUMENTS

20166	   enum nfs_cb_opnum4 {
20167	       OP_CB_GETATTR           = 3,
20168	       OP_CB_RECALL            = 4,
20169	       OP_CB_ILLEGAL           = 10044
20170	   };

20172	   union nfs_cb_argop4 switch (unsigned argop) {
20173	       case OP_CB_GETATTR:    CB_GETATTR4args opcbgetattr;
20174	       case OP_CB_RECALL:     CB_RECALL4args  opcbrecall;
20175	       case OP_CB_ILLEGAL:    void            opcbillegal;
20176	   };

20178	   struct CB_COMPOUND4args {
20179	       utf8str_cs      tag;
20180	       uint32_t        minorversion;
20181	       nfs_cb_argop4   argarray<>;
20182	   };

20184	18.2.3.  RESULTS

20186	   union nfs_cb_resop4 switch (unsigned resop){
20187	       case OP_CB_GETATTR:    CB_GETATTR4res  opcbgetattr;
20188	       case OP_CB_RECALL:     CB_RECALL4res   opcbrecall;
20189	   };

20191	   struct CB_COMPOUND4res {
20192	       nfsstat4 status;
20193	       utf8str_cs      tag;
20194	       nfs_cb_resop4   resarray<>;
20195	   };

20197	18.2.4.  DESCRIPTION

20199	   The CB_COMPOUND procedure is used to combine one or more of the
20200	   callback procedures into a single RPC request.  The main callback RPC
20201	   program has two main procedures: CB_NULL and CB_COMPOUND.  All other
20202	   operations use the CB_COMPOUND procedure as a wrapper.

20204	   In the processing of the CB_COMPOUND procedure, the client may find
20205	   that it does not have the available resources to execute any or all
20206	   of the operations within the CB_COMPOUND sequence.  This is discussed
20207	   in Section 2.10.4.4.

20209	   The minorversion field of the arguments MUST be the same as the
20210	   minorversion of the COMPOUND procedure used to created the client ID
20211	   and session.  For NFSv4.1, minorversion MUST be set to 1.

20213	   Contained within the CB_COMPOUND results is a 'status' field.  This
20214	   status must be equivalent to the status of the last operation that
20215	   was executed within the CB_COMPOUND procedure.  Therefore, if an
20216	   operation incurred an error then the 'status' value will be the same
20217	   error value as is being returned for the operation that failed.

20219	   For the definition of the "tag" field, see the section "Procedure 1:
20220	   COMPOUND - Compound Operations".  [[Comment.21: Need an xref.]]

20222	   Illegal operation codes are handled in the same way as they are
20223	   handled for the COMPOUND procedure.

20225	18.2.5.  IMPLEMENTATION

20227	   The CB_COMPOUND procedure is used to combine individual operations
20228	   into a single RPC request.  The client interprets each of the
20229	   operations in turn.  If an operation is executed by the client and
20230	   the status of that operation is NFS4_OK, then the next operation in
20231	   the CB_COMPOUND procedure is executed.  The client continues this
20232	   process until there are no more operations to be executed or one of
20233	   the operations has a status value other than NFS4_OK.

20235	18.2.6.  ERRORS

20237	   NFS4ERR_BADHANDLE NFS4ERR_BAD_STATEID NFS4ERR_BADXDR
20238	   NFS4ERR_OP_ILLEGAL NFS4ERR_RESOURCE NFS4ERR_SERVERFAULT

20240	19.  NFS version 4.1 Callback Operations

20242	19.1.  Operation 3: CB_GETATTR - Get Attributes

20244	19.1.1.  SYNOPSIS

20246	    fh, attr_request -> attrmask, attr_vals

20248	19.1.2.  ARGUMENT

20250	   /*
20251	    * NFS4 Callback Procedure Definitions and Program
20252	    */

20254	   /*
20255	    * CB_GETATTR: Get Current Attributes
20256	    */
20257	   struct CB_GETATTR4args {
20258	           nfs_fh4 fh;
20259	           bitmap4 attr_request;
20260	   };

20262	19.1.3.  RESULT

20264	   struct CB_GETATTR4resok {
20265	           fattr4  obj_attributes;
20266	   };

20268	   union CB_GETATTR4res switch (nfsstat4 status) {
20269	    case NFS4_OK:
20270	            CB_GETATTR4resok       resok4;
20271	    default:
20272	            void;
20273	   };

20275	19.1.4.  DESCRIPTION

20277	   The CB_GETATTR operation is used by the server to obtain the current
20278	   modified state of a file that has been write delegated.  The
20279	   attributes size and change are the only ones guaranteed to be
20280	   serviced by the client.  See the section "Handling of CB_GETATTR" for
20281	   a full description of how the client and server are to interact with
20282	   the use of CB_GETATTR.

20284	   If the filehandle specified is not one for which the client holds a
20285	   write open delegation, an NFS4ERR_BADHANDLE error is returned.

20287	19.1.5.  IMPLEMENTATION

20289	   The client returns attrmask bits and the associated attribute values
20290	   only for the change attribute, and attributes that it may change
20291	   (time_modify, and size).

20293	19.2.  Operation 4: CB_RECALL - Recall an Open Delegation

20295	19.2.1.  SYNOPSIS

20297	   stateid, truncate, fh -> ()

20299	19.2.2.  ARGUMENT

20301	   /*
20302	    * CB_RECALL: Recall an Open Delegation
20303	    */
20304	   struct CB_RECALL4args {
20305	           stateid4        stateid;
20306	           bool            truncate;
20307	           nfs_fh4         fh;
20308	   };

20310	19.2.3.  RESULT

20312	   struct CB_RECALL4res {
20313	           nfsstat4        status;
20314	   };

20316	19.2.4.  DESCRIPTION

20318	   The CB_RECALL operation is used to begin the process of recalling an
20319	   open delegation and returning it to the server.

20321	   The truncate flag is used to optimize recall for a file which is
20322	   about to be truncated to zero.  When it is set, the client is freed
20323	   of obligation to propagate modified data for the file to the server,
20324	   since this data is irrelevant.

20326	   If the handle specified is not one for which the client holds an open
20327	   delegation, an NFS4ERR_BADHANDLE error is returned.

20329	   If the stateid specified is not one corresponding to an open
20330	   delegation for the file specified by the filehandle, an
20331	   NFS4ERR_BAD_STATEID is returned.

20333	19.2.5.  IMPLEMENTATION

20335	   The client should reply to the callback immediately.  Replying does
20336	   not complete the recall except when an error was returned.  The
20337	   recall is not complete until the delegation is returned using a
20338	   DELEGRETURN.

20340	19.3.  Operation 5: CB_LAYOUTRECALL

20342	19.3.1.  SYNOPSIS

20344	   layout_type, iomode, layoutchanged, layoutrecall -> -

20346	19.3.2.  ARGUMENT

20348	   /*
20349	    * NFSv4.1 callback arguments and results
20350	    */

20352	   enum layoutrecall_type4 {
20353	           LAYOUTRECALL4_FILE = 1,
20354	           LAYOUTRECALL4_FSID = 2,
20355	           LAYOUTRECALL4_ALL  = 3
20356	   };

20358	   struct layoutrecall_file4 {
20359	           nfs_fh4         lor_fh;
20360	           offset4         lor_offset;
20361	           length4         lor_length;
20362	   };

20364	   union layoutrecall4 switch(layoutrecall_type4 recalltype) {
20365	   case LAYOUTRECALL4_FILE:
20366	           layoutrecall_file4 lor_layout;
20367	   case LAYOUTRECALL4_FSID:
20368	           fsid4              lor_fsid;
20369	   case LAYOUTRECALL4_ALL:
20370	           void;
20371	   };

20373	   struct CB_LAYOUTRECALL4args {
20374	           layouttype4             clora_type;
20375	           layoutiomode4           clora_iomode;
20376	           bool                    clora_changed;
20377	           layoutrecall4           clora_recall;
20378	   };

20380	19.3.3.  RESULT

20382	   struct CB_LAYOUTRECALL4res {
20383	           nfsstat4        clorr_status;
20384	   };

20386	19.3.4.  DESCRIPTION

20388	   The CB_LAYOUTRECALL operation is used to begin the process of
20389	   recalling layout segments, a layout, all layouts pertaining to a
20390	   particular file system (FSID), or layouts in all file systems (ALL).
20391	   If LAYOUTRECALL4_FILE is specified, the lrf_offset and lrf_length
20392	   fields specify the layout segments.  If a lrf_length of all ones is
20393	   specified then all layout segments identified by the current file
20394	   handle, clora_type, clora_iomode, and corresponding to the octet
20395	   range from lrf_offset to the end-of-file MUST be returned (via
20396	   LAYOUTRETURN, see Section 17.44).  The clora_iomode specifies the set
20397	   of layouts to be returned.  An clora_iomode of LAYOUTIOMODE4_ANY
20398	   specifies that all matching layout segments regardless of iomode,
20399	   must be returned; otherwise, only layout segments that exactly match
20400	   the iomode must be returned.  If clora_iomode is LAYOUTIOMODE4_ANY,
20401	   lo_offset is zero, and lo_length is all ones, then the entire layout
20402	   is to be returned.

20404	   If the clora_changed field is TRUE, then the client SHOULD not write
20405	   and commit its modified data to the storage devices specified by the
20406	   layout being recalled.  Instead, it is preferable for the client to
20407	   write and commit the modified data through the metadata server.
20408	   Alternatively, the client may attempt to obtain a new layout.  Note:
20409	   in order to obtain a new layout the client must first return the old
20410	   layout.  Since obtaining a new layout is not guaranteed to succeed,
20411	   the client must be ready to write and commit its modified data
20412	   through the metadata server.

20414	   If the client does not hold any layout segment either matching or
20415	   overlapping with the requested layout, it returns
20416	   NFS4ERR_NOMATCHING_LAYOUT.

20418	   If LAYOUTRECALL4_FSID is specified, the fsid specifies the file
20419	   system for which any outstanding layouts MUST be returned.  If
20420	   LAYOUTRECALL4_ALL is specified, all outstanding layouts MUST be
20421	   returned.  In addition, LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL
20422	   specify that all the storage device ID to storage device address
20423	   mappings in the affected file system(s) are also recalled.  The
20424	   respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or
20425	   LAYOUTRETURN4_ALL acknowledges to the server that the client
20426	   invalidated the said device mappings.  Device mappings are
20427	   invalidated also when no layouts are found for LAYOUTRECALL4_FSID or
20428	   LAYOUTRECALL4_ALL and NFS4ERR_NOMATCHING_LAYOUT is returned.

20430	19.3.5.  IMPLEMENTATION

20432	   The client should reply to the callback immediately.  Replying does
20433	   not complete the recall except when an error is returned; otherwise
20434	   the recall is not complete until the layout(s) are returned using a
20435	   LAYOUTRETURN operation.

20437	   The client should complete any in-flight I/O operations using the
20438	   recalled layout(s) before returning it/them via LAYOUTRETURN.  If the
20439	   client has buffered modified data there are a number of options for
20440	   writing and committing that data.  If clora_changed is false, the
20441	   client may choose to write modified data directly to storage before
20442	   calling LAYOUTRETURN.  However, if clora_changed is true, the client
20443	   may either choose to write it later using normal NFSv4 WRITE
20444	   operations to the metadata server or it may attempt to obtain a new
20445	   layout, after first returning the recalled layout, using the new
20446	   layout to write the modified data.  Regardless of whether the client
20447	   is holding a layout, it may always write data through the metadata
20448	   server.

20450	   If modified data is written while the layout is held, the client must
20451	   still issue LAYOUTCOMMIT operations at the appropriate time,
20452	   especially before issuing the LAYOUTRETURN.  If a large amount of
20453	   modified data is outstanding, the client may issue LAYOUTRETURNs for
20454	   portions of the layout being recalled; this allows the server to
20455	   monitor the client's progress and adherence to the callback.
20456	   However, the last LAYOUTRETURN in a sequence of returns, MUST specify
20457	   the full range being recalled (see Section 12.5.4.1 for details).

20459	19.4.  Operation 6: CB_NOTIFY - Notify directory changes

20461	   Tell the client of directory changes.

20463	19.4.1.  SYNOPSIS

20465	   stateid, notification -> {}

20467	19.4.2.  ARGUMENT

20469	   /* Changed entry information.  */
20470	   struct notify_entry4 {
20471	           component4      ne_file;
20472	           fattr4          ne_attrs;
20473	   };

20475	   /* Previous entry information */
20476	   struct prev_entry4 {
20477	           notify_entry4   pe_prev_entry;
20478	           /* what READDIR returned for this entry */
20479	           nfs_cookie4     pe_prev_entry_cookie;
20480	   };
20481	   struct notify_add4 {
20482	           notify_entry4       nad_new_entry;
20483	           /* what READDIR would have returned for this entry */
20484	           nfs_cookie4         nad_new_entry_cookie<1>;
20485	           prev_entry4         nad_prev_entry<1>;
20486	           bool                nad_last_entry;
20487	   };

20489	   struct notify_attr4 {
20490	           notify_entry4   na_changed_entry;
20491	   };

20493	   struct notify_remove4 {
20494	           notify_entry4   nrm_old_entry;
20495	           nfs_cookie4     nrm_old_entry_cookie;
20496	   };

20498	   struct notify_rename4 {
20499	           notify_entry4   nrn_old_entry;
20500	           notify_add4     nrn_new_entry;
20501	   };

20503	   struct notify_verifier4 {
20504	           verifier4       nv_old_cookieverf;
20505	           verifier4       nv_new_cookieverf;
20506	   };

20508	   enum notify_type4 {
20509	           NOTIFY4_CHANGE_CHILD_ATTRS = 0,
20510	           NOTIFY4_CHANGE_DIR_ATTRS = 1,
20511	           NOTIFY4_REMOVE_ENTRY = 2,
20512	           NOTIFY4_ADD_ENTRY = 3,
20513	           NOTIFY4_RENAME_ENTRY = 4,
20514	           NOTIFY4_CHANGE_COOKIE_VERIFIER = 5
20515	   };

20517	   /*
20518	   * Notification information sent to the client.
20519	   */
20520	   union notify4 switch (notify_type4 n_type) {
20521	   case NOTIFY4_CHANGE_CHILD_ATTRS:
20522	           notify_attr4     n_change_child_attrs;
20523	   case NOTIFY4_CHANGE_DIR_ATTRS:
20524	           fattr4           n_change_dir_attrs;
20525	   case NOTIFY4_REMOVE_ENTRY:
20526	           notify_remove4   n_remove_notify;
20527	   case NOTIFY4_ADD_ENTRY:
20528	           notify_add4      n_add_notify;

20530	   case NOTIFY4_RENAME_ENTRY:
20531	           notify_rename4   n_rename_notify;
20532	   case NOTIFY4_CHANGE_COOKIE_VERIFIER:
20533	           notify_verifier4 n_verf_notify;
20534	   };

20536	   struct CB_NOTIFY4args {
20537	           stateid4    cna_stateid;
20538	           nfs_fh4     cna_fh;
20539	           notify4     cna_changes<>;
20540	   };

20542	19.4.3.  RESULT

20544	   struct CB_NOTIFY4res {
20545	           nfsstat4    cnr_status;
20546	   };

20548	19.4.4.  DESCRIPTION

20550	   The CB_NOTIFY operation is used by the server to send notifications
20551	   to clients about changes in a delegated directory.  These
20552	   notifications are sent over the callback path.  The notification is
20553	   sent once the original request has been processed on the server.  The
20554	   server will send an array of notifications for all changes that might
20555	   have occurred in the directory.  The notify_type4 can only have one
20556	   bit set for each notification in the array.  If the client holding
20557	   the delegation makes any changes in the directory that cause files or
20558	   sub directories to be added or removed, the server will notify that
20559	   client of the resulting change(s).  If the client holding the
20560	   delegation is making attribute or cookie verifier changes only, the
20561	   server does not need to send notifications to that client.  The
20562	   server will send the following information for each operation:

20564	   ADDING A FILE  The server will send information about the new entry
20565	      being created along with the cookie for that entry.  The entry
20566	      information (data type notify_add4) includes the component name of
20567	      the entry and attributes.  If this entry is added to the end of
20568	      the directory, the server will set the nad_last_entry flag to
20569	      true.  If the file is added such that there is at least one entry
20570	      before it, the server will also return the previous entry
20571	      information (nad_prev_entry, a variable length array of up to one
20572	      element.  If the array is of zero length, there is no previous
20573	      entry), along with its cookie.  This is to help clients find the
20574	      right location in their DNLC or directory caches where this entry
20575	      should be cached.  If the new entry's cookie is available, it will
20576	      be in nad_new_entry_cookie (another variable length array of up to
20577	      one element).

20579	   REMOVING A FILE  The server will send information about the directory
20580	      entry being deleted.  The server will also send the cookie value
20581	      for the deleted entry so that clients can get to the cached
20582	      information for this entry.

20584	   RENAMING A FILE  The server will send information about both the old
20585	      entry and the new entry.  This includes name and attributes for
20586	      each entry.  This notification is only sent if both entries are in
20587	      the same directory.  If the rename is across directories, the
20588	      server will send a remove notification to one directory and an add
20589	      notification to the other directory, assuming both have a
20590	      directory delegation.

20592	   FILE/DIR ATTRIBUTE CHANGE  The client will use the attribute mask to
20593	      inform the server of attributes for which it wants to receive
20594	      notifications.  This change notification can be requested for both
20595	      changes to the attributes of the directory as well as changes to
20596	      any file attributes in the directory by using two separate
20597	      attribute masks.  The client cannot ask for change attribute
20598	      notification per file.  One attribute mask covers all the files in
20599	      the directory.  Upon any attribute change, the server will send
20600	      back the values of changed attributes.  Notifications might not
20601	      make sense for some file system wide attributes and it is up to
20602	      the server to decide which subset it wants to support.  The client
20603	      can negotiate the frequency of attribute notifications by letting
20604	      the server know how often it wants to be notified of an attribute
20605	      change.  The server will return supported notification frequencies
20606	      or an indication that no notification is permitted for directory
20607	      or child attributes by setting the dir_notif_delay and
20608	      dir_entry_notif_delay attributes respectively.

20610	   COOKIE VERIFIER CHANGE  If the cookie verifier changes while a client
20611	      is holding a delegation, the server will notify the client so that
20612	      it can invalidate its cookies and reissue a READDIR to get the new
20613	      set of cookies.

20615	19.4.5.  IMPLEMENTATION

20617	19.5.  Operation 7: CB_PUSH_DELEG

20619	19.5.1.  SYNOPSIS

20621	   fh, stateid -> { }

20623	19.5.2.  ARGUMENT

20625	   struct CB_PUSH_DELEG4args {
20626	           stateid4         cpda_stateid;
20627	           nfs_fh4          cpda_fh;
20628	           open_delegation4 cpda_delegation;

20630	   };

20632	19.5.3.  RESULT

20634	   struct CB_PUSH_DELEG4res {
20635	           nfsstat4 cpdr_status;
20636	   };

20638	19.5.4.  DESCRIPTION

20640	   CB_PUSH_DELEG is used by the server to both signal to the client that
20641	   the delegation it wants is available and to simultaneously offer the
20642	   delegation to the client.  The client has the choice of accepting the
20643	   delegation by returning NFS4_OK to the server, delaying the decision
20644	   to accept the offered delegation by returning NFS4ERR_DELAY or
20645	   permanently rejecting the offer of the delegation via any other error
20646	   status.

20648	   The server MUST send in cpda_delegation a delegation corresponding to
20649	   the type of what the client requested in the OPEN, WANT_DELEGATION,
20650	   or GET_DIR_DELEGATION request.

20652	   If the client does return NFS4ERR_DELAY and there is a conflicting
20653	   delegation request, the server MAY process it at the expense of the
20654	   client that returned NFS4ERR_DELAY.  The client's want will not be
20655	   cancelled, but MAY processed behind other delegation requests or
20656	   registered wants.

20658	19.5.5.  IMPLEMENTATION

20660	   TBD

20662	19.6.  Operation 8: CB_RECALL_ANY - Keep any N delegations

20664	   Notify client to return delegation and keep N of them.

20666	19.6.1.  SYNOPSIS

20668	   N, type_mask -> {}

20670	19.6.2.  ARGUMENT

20672	   const RCA4_TYPE_MASK_RDATA_DLG          = 0;
20673	   const RCA4_TYPE_MASK_WDATA_DLG          = 1;
20674	   const RCA4_TYPE_MASK_DIR_DLG            = 2;
20675	   const RCA4_TYPE_MASK_FILE_LAYOUT        = 3;
20676	   const RCA4_TYPE_MASK_BLK_LAYOUT_MIN     = 4;
20677	   const RCA4_TYPE_MASK_BLK_LAYOUT_MAX     = 7;
20678	   const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN     = 8;
20679	   const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX     = 11;
20680	   const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN   = 12;
20681	   const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX   = 15;

20683	   struct  CB_RECALL_ANY4args      {
20684	           uint32_t        craa_objects_to_keep;
20685	           bitmap4         craa_type_mask;
20686	   };

20688	19.6.3.  RESULT

20690	   struct CB_RECALL_ANY4res {
20691	           nfsstat4        crar_status;
20692	   };

20694	19.6.4.  DESCRIPTION

20696	   The server may decide that it cannot hold all of the state for
20697	   recallable objects, such as delegations and layouts, without running
20698	   out of resources.  In such a case, it is free to recall individual
20699	   objects to reduce the load but this would be far from optimal.

20701	   Because the general purpose of such recallable objects as delegations
20702	   is to eliminate client interaction with the server, the server cannot
20703	   interpret lack of recent use as indicating that the object is no
20704	   longer useful.  The absence of visible use may be the result of a
20705	   large number of potential operations eliminated.  In the case of
20706	   layouts, the layout will be used explicitly but the meta-data server
20707	   does not have direct knowledge of such use.

20709	   In order to implement an effective reclaim scheme for such objects,
20710	   the server's knowledge of available resources must be used to
20711	   determine when objects must be recalled with the clients selecting
20712	   the actual objects to be returned.

20714	   Server implementations may differ in their resource allocation
20715	   requirements.  For example, one server may share resources among all
20716	   classes of recallable objects whereas another may use separate
20717	   resource pools for layouts and for delegations, or further separate
20718	   resources by types of delegations.

20720	   When a given resource pool is over-utilized, the server can issue a
20721	   CB_RECALL_ANY to clients holding recallable objects of the types
20722	   involved, allowing it to keep a certain number of such objects and
20723	   return any excess.  A mask specifies which types of objects are to be
20724	   limited.  The client chooses, based on its own knowledge of current
20725	   usefulness, which of the objects in that class should be returned.

20727	   For NFSv4.1, sixteen bits are defined.  For some of these, ranges are
20728	   defined and it is up to the definition of the storage protocol to
20729	   specify how these are to be used.  There are ranges for blocks-based
20730	   storage protocols, for object-based storage protocols and a reserved
20731	   range for other experimental storage protocols.  The RFC defining
20732	   such a storage protocol needs to specify how particular bits within
20733	   its range are to be used.  For example, it may specify a mapping
20734	   between attributes of the layout (read vs. write, size of area) and
20735	   the bit to be used or it may define a field in the layout where the
20736	   associated bit position is made available by the server to the
20737	   client.

20739	   When an undefined bit is set in the type mask, NFS4ERR_INVAL should
20740	   be returned.  However even if a client does not support an object of
20741	   the specified type, if the bit is defined, NFS4ERR_INVAL should not
20742	   be returned.  Future minor versions of NFSv4 may expand the set of
20743	   valid type mask bits.

20745	   CB_RECALL_ANY specifies a count of objects that the client may keep
20746	   as opposed to a count that the client must return.  This is to avoid
20747	   potential race between a CB_RECALL_ANY that had a count of objects to
20748	   free with a set of client-originated operations to return layouts or
20749	   delegations.  As a result of the race, the client and server would
20750	   have differing ideas as to how many objects to return.  Hence the
20751	   client could mistakenly free too many.

20753	   If resource demands prompt it, the server may send another
20754	   CB_RECALL_ANY with a lower count, even it has not yet received an
20755	   acknowledgement from the client for a previous CB_RECALL_ANY with the
20756	   same type mask.  Although the possibility exists that these will be
20757	   received by the client in a order different from the order in which
20758	   they were sent, any such permutation of the callback stream is
20759	   harmless.  It is the job of the client to bring down the size of the
20760	   recallable object set in line with each CB_RECALL_ANY received and
20761	   until that obligation is met it cannot be canceled or modified by any
20762	   subsequent CB_RECALL_ANY for the same type mask.  Thus if the server
20763	   sends two CB_RECALL_ANY's, the effect will be the same as if the
20764	   lower count was sent, whatever the order of recall receipt.  Note
20765	   that this means that a server may not cancel the effect of a
20766	   CB_RECALL_ANY by sending another recall with a higher count.  When a
20767	   CB_RECALL_ANY is received and the count is already within the limit
20768	   set or is above a limit that the client is working to get down to,
20769	   that callback has no effect.

20771	   The client can choose to return any type of object specified by the
20772	   mask.  If a server wishes to limit use of objects of a specific type,
20773	   it should only specify that type in the mask sent.  The client may
20774	   not return requested objects and it is up to the server to handle
20775	   this situation, typically by doing specific recalls to properly limit
20776	   resource usage.  The server should give the client enough time to
20777	   return objects before proceeding to specific recalls.  This time
20778	   should not be less than the lease period.

20780	   Servers are generally free not to give out recallable objects when
20781	   insufficient resources are available.  Note that the effect of such a
20782	   policy is implicitly to give precedence to existing objects relative
20783	   to requested ones, with the result that resources might not be
20784	   optimally used.  To prevent this, servers are well advised to make
20785	   the point at which they start issuing CB_RECALL_ANY callbacks
20786	   somewhat below that at which they cease to give out new delegations
20787	   and layouts.  This allows the client to purge its less-used objects
20788	   whenever appropriate and so continue to have its subsequent requests
20789	   given new resources freed up by object returns.

20791	19.6.5.  IMPLEMENTATION

20793	19.7.  Operation 9: CB_RECALLABLE_OBJ_AVAIL

20795	19.7.1.  SYNOPSIS

20797	   TBD

20799	19.7.2.  ARGUMENT

20801	   typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args;

20803	19.7.3.  RESULT

20805	   struct CB_RECALLABLE_OBJ_AVAIL4res {
20806	           nfsstat4        croa_status;
20807	   };

20809	19.7.4.  DESCRIPTION

20811	   CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client
20812	   that the server has resources to grant recallable objects that might
20813	   previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG,
20814	   or LAYOUTGET.

20816	   The argument, objects_to_keep means the total number of recallable
20817	   objects of the types indicated in the argument type_mask that the
20818	   server believes it can allow the client to have, including the number
20819	   of such objects the client already has.  A client that tries to
20820	   acquire more recallable objects than the server informs it can have
20821	   runs the risk of having objects recalled.

20823	19.7.5.  IMPLEMENTATION

20825	   TBD

20827	19.8.  Operation 10: CB_RECALL_SLOT - change flow control limits

20829	   Change flow control limits

20831	19.8.1.  SYNOPSIS

20833	   targetcount -> status

20835	19.8.2.  ARGUMENT

20837	   struct CB_RECALL_SLOT4args {
20838	           uint32_t       rsa_target_highest_slotid;
20839	   };

20841	19.8.3.  RESULT

20843	   struct CB_RECALL_SLOT4res {
20844	           nfsstat4   rsr_status;
20845	   };

20847	19.8.4.  DESCRIPTION

20849	   The CB_RECALL_SLOT operation requests the client to return session
20850	   slots, and if applicable, transport credits (e.g.  RDMA credits for
20851	   connections bound to the operations channel) to the server.
20852	   CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target
20853	   highest_slot the server wants for the session.  The client, should
20854	   then work toward reducing the highest_slot to the target.

20856	   If the session has only non-RDMA connections bound to its operations
20857	   channel, then the client need only wait for all outstanding requests
20858	   with a slotid > rsa_target_highest_slotid to complete, then issue a
20859	   single COMPOUND consisting of a single SEQUENCE operation, with the
20860	   sa_highslot field set to rsa_target_highest_slotid.  If there are
20861	   RDMA-based connections bound to operation channel, then the client
20862	   needs to also issue enough zero-length RDMA Sends to take the total
20863	   RDMA credit count to rsa_target_highest_slotid + 1 or below.

20865	19.8.5.  IMPLEMENTATION

20867	   No discussion at this time.

20869	19.9.  Operation 11: CB_SEQUENCE - Supply callback channel sequencing
20870	       and control

20872	   Sequence and control

20874	19.9.1.  SYNOPSIS

20876	                 control -> control

20878	19.9.2.  ARGUMENT

20880	   struct referring_call4 {
20881	           sequenceid4     rc_sequenceid;
20882	           slotid4         rc_slotid;
20883	   };

20885	   struct referring_call_list4 {
20886	           sessionid4      rcl_sessionid;
20887	           referring_call4 rcl_referring_calls<>;
20888	   };

20890	   struct CB_SEQUENCE4args {
20891	           sessionid4           csa_sessionid;
20892	           sequenceid4          csa_sequenceid;
20893	           slotid4              csa_slotid;
20894	           slotid4              csa_highest_slotid;
20895	           bool                 csa_cachethis;
20896	           referring_call_list4 csa_referring_call_lists<>;
20897	   };

20899	19.9.3.  RESULT

20901	   struct CB_SEQUENCE4resok {
20902	           sessionid4         csr_sessionid;
20903	           sequenceid4        csr_sequenceid;
20904	           slotid4            csr_slotid;
20905	           slotid4            csr_highest_slotid;
20906	           slotid4            csr_target_highest_slotid;
20907	   };

20909	   union CB_SEQUENCE4res switch (nfsstat4 csr_status) {
20910	   case NFS4_OK:
20911	           CB_SEQUENCE4resok   csr_resok4;
20912	   default:
20913	           void;
20914	   };

20916	19.9.4.  DESCRIPTION

20918	   The CB_SEQUENCE operation is used to manage operational accounting
20919	   for the callback channel of the session on which the operation is
20920	   sent.  The contents include the session to which this request
20921	   belongs, slotid and sequenceid used by the server to implement
20922	   session request control and exactly once semantics, and exchanged
20923	   slot maximums which are used to adjust the size of the replay cache.
20924	   This operation MUST appear once as the first operation in each
20925	   CB_COMPOUND sent procedure after the callback channel is successfully
20926	   bound, or a protocol error must result.  See Section 17.46.4 for a
20927	   description of how slots are processed.

20929	   If csa_cachethis is TRUE, then the server is requesting that the
20930	   client cache the reply in the callback reply cache.  The client MUST
20931	   cache the reply (see Section 2.10.4.1.2).

20933	   The csa_referring_call_lists array is the list of COMPOUND calls,
20934	   identified by sessionid, slotid and sequencid, that the client
20935	   previously sent to the server that could have triggered the callback.
20936	   A sessionid is included because leased state is tied to a client ID,
20937	   and a client ID can have multiple sessions.  See Section 2.10.4.3
20938	   Resolving server callback races with sessions.

20940	   If the difference between csa_sequenceid and the sequenceid the
20941	   client has for the slot is two (2) or more, then client MUST return
20942	   NFS4ERR_SEQ_MISORDERED.  If csa_sequenceid is less than the client's
20943	   cached sequencid (accounting for wraparound of the unsigned
20944	   sequenceid value), then the client MUST return
20945	   NFS4ERR_SEQ_MISORDERED.  If sa_sequenceid and the cached sequenceid
20946	   are the same, this is a replay, and the client returns the response
20947	   to the CB_COMPOUND that is cached.  Otherwise, if sa_sequenceid is
20948	   one greater (accounting for wraparound) than the cached sequenceid,
20949	   then this is a new request, and the slot's sequenceid is incremented.
20950	   The operations subsequent to CB_SEQUENCE, if any, are processed.  If
20951	   there are no other operations, the only other effects are to cache
20952	   the CB_SEQUENCE reply in the slot.

20954	   If CB_SEQUENCE returns an error, then the state of the slot
20955	   (sequenceid, cached reply) is not changed.

20957	   The client returns two "highest_slotid" values: csr_highest_slotid,
20958	   and csr_target_highest_slotid.  The former is the highest slotid the
20959	   client will accept in a future CB_SEQUENCE operation, and must not be
20960	   less than the the value of csa_highest_slotid.  The latter is the
20961	   highest slotid the client would prefer the client use on a future
20962	   CB_SEQUENCE operation.

20964	19.9.5.  IMPLEMENTATION

20966	19.10.  Operation 12: CB_WANTS_CANCELLED

20968	19.10.1.  SYNOPSIS

20970	   fh, size -> -

20972	19.10.2.  ARGUMENT

20974	   struct CB_WANTS_CANCELLED4args {
20975	           bool cwca_contended_wants_cancelled;
20976	           bool cwca_resourced_wants_cancelled;
20977	   };

20979	19.10.3.  RESULT

20981	   struct CB_WANTS_CANCELLED4res {
20982	           nfsstat4        cwcr_status;
20983	   };

20985	19.10.4.  DESCRIPTION

20987	   The CB_WANTS_CANCELLED operation is used to notify the client that
20988	   the some or all wants it registered for recallable delegations and
20989	   layouts have been canceled.

20991	   If cwca_contended_wants_cancelled is TRUE, this indicates the server
20992	   will not be pushing to the client any delegations that become
20993	   available after contention passes.

20995	   If cwca_resourced_wants_cancelled is TRUE, this indicates the server
20996	   will not notify the client when there are resources on the server
20997	   grant delegations or layouts.

20999	   After receiving a CB_WANTS_CANCELLED operation, the client is free to
21000	   attempt to acquire the delegations or layouts it was waiting for, and
21001	   possibly re-register wants.

21003	19.10.5.  IMPLEMENTATION

21005	19.11.  Operation 13: CB_NOTIFY_LOCK - Notify of possible lock
21006	        availability

21008	19.11.1.  SYNOPSIS

21010	   fh, lockowner -> ()

21012	19.11.2.  ARGUMENT

21014	   struct CB_NOTIFY_LOCK4args {
21015	       lock_owner4 cnla_lock_owner;
21016	       nfs_fh4     cnla_fh;
21017	   };

21019	19.11.3.  RESULT

21021	   struct CB_NOTIFY_LOCK4res {
21022	           nfsstat4        cnlr_status;
21023	   };

21025	19.11.4.  DESCRIPTION

21027	   The server may use this operation to indicate that a lock for the
21028	   given file and lockowner may have become available.

21030	   This callback is meant to be used by servers to help reduce the
21031	   latency of blocking locks in the case where they recognize that a
21032	   client which has been polling for a blocking lock may now be able to
21033	   acquire the lock.  The notification is purely a hint, provided as a
21034	   possible performance optimization, and is not required for
21035	   correctness.

21037	19.11.5.  IMPLEMENTATION

21039	   The server must not grant the lock to the client unless and until it
21040	   receives an actual lock request from the client.  Similarly, the
21041	   client receiving this callback cannot assume that it now has the
21042	   lock, or that a subsequent request for the lock will be successful.

21044	   The server is not required to implement this callback, and even if it
21045	   does, it is not required to use it in any particular case.  Therefore
21046	   the client must still rely on polling for blocking locks, as
21047	   described in the "Blocking Locks" section.

21049	   Similarly, the client is not required to implement this callback, and
21050	   even it does, is still free to ignore it.  Therefore the server must
21051	   not assume that the client will act based on the callback.

21053	   If the server supports this callback for a given file, it should set
21054	   the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when responding to successful
21055	   opens for that file.  This does not commit the server to use of
21056	   CB_NOTIFY_LOCK, but the client may use this as a hint to decide how
21057	   frequently poll for locks derived from that open.

21059	19.12.  Operation 10044: CB_ILLEGAL - Illegal Callback Operation

21061	19.12.1.  SYNOPSIS

21063	   <null> -> ()

21065	19.12.2.  ARGUMENT

21067	           void;

21069	19.12.3.  RESULT

21071	   /*
21072	    * CB_ILLEGAL: Response for illegal operation numbers
21073	    */
21074	   struct CB_ILLEGAL4res {
21075	           nfsstat4        status;
21076	   };

21078	19.12.4.  DESCRIPTION

21080	   This operation is a placeholder for encoding a result to handle the
21081	   case of the client sending an operation code within COMPOUND that is
21082	   not supported.  See the COMPOUND procedure description for more
21083	   details.

21085	   The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.

21087	19.12.5.  IMPLEMENTATION

21089	   A server will probably not send an operation with code OP_CB_ILLEGAL
21090	   but if it does, the response will be CB_ILLEGAL4res just as it would
21091	   be with any other invalid operation code.  Note that if the client
21092	   gets an illegal operation code that is not OP_ILLEGAL, and if the
21093	   client checks for legal operation codes during the XDR decode phase,
21094	   then the CB_ILLEGAL4res would not be returned.

21096	20.  Security Considerations

21098	   TBD

21100	21.  IANA Considerations

21102	21.1.  Defining new layout types

21104	   New layout type numbers will be requested from IANA.  IANA will only
21105	   provide layout type numbers for Standards Track RFCs approved by the
21106	   IESG, in accordance with Standards Action policy defined in RFC2434
21107	   [16].

21109	   The author of a new pNFS layout specification must follow these steps
21110	   to obtain acceptance of the layout type as a standard:

21112	   1.  The author devises the new layout specification.

21114	   2.  The new layout type specification MUST, at a minimum:

21116	       *  Define the contents of the layout-type-specific fields of the
21117	          following data types:

21119	          +  the da_addr_body field of the device_addr4 data type;

21121	          +  the loh_body field of the layouthint4 data type;

21123	          +  the loc_body field of layout_content4 data type (which in
21124	             turn is the lo_content field of the layout4 data type);

21126	          +  the lou_body field of the layoutupdate4 data type;

21128	       *  Describe or define the storage access protocol used to access
21129	          the data servers

21131	       *  Describe the methods of recovery from storage device restart,
21132	          and loss of layout state on the metadata server (see
21133	          Section 12.7.3).

21135	       *  Include a security considerations section

21137	   3.  The author documents the new layout specification as an Internet
21138	       Draft.

21140	   4.  The author submits the Internet Draft for review through the IETF
21141	       standards process as defined in "Internet Official Protocol
21142	       Standards" (STD 1).  The new layout specification will be
21143	       submitted for eventual publication as a standards track RFC.

21145	   5.  The layout specification progresses through the IETF standards
21146	       process; the new option will be reviewed by the NFSv4 Working
21147	       Group (if that group still exists), or as an Internet Draft not
21148	       submitted by an IETF working group.

21150	22.  References

21152	22.1.  Normative References

21154	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
21155	         Levels", March 1997.

21157	   [2]   Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame,
21158	         C., Eisler, M., and D. Noveck, "Network File System (NFS)
21159	         version 4 Protocol", RFC 3530, April 2003.

21161	   [3]   Eisler, M., "XDR: External Data Representation Standard",
21162	         STD 67, RFC 4506, May 2006.

21164	   [4]   Srinivasan, R., "RPC: Remote Procedure Call Protocol
21165	         Specification Version 2", RFC 1831, August 1995.

21167	   [5]   Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
21168	         Specification", RFC 2203, September 1997.

21170	   [6]   Linn, J., "The Kerberos Version 5 GSS-API Mechanism", RFC 1964,
21171	         June 1996.

21173	   [7]   Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism
21174	         Using SPKM", RFC 2847, June 2000.

21176	   [8]   Linn, J., "Generic Security Service Application Program
21177	         Interface Version 2, Update 1", RFC 2743, January 2000.

21179	   [9]   Hinden, R. and S. Deering, "IP Version 6 Addressing
21180	         Architecture", RFC 1884, December 1995.

21182	   [10]  International Organization for Standardization, "Information
21183	         Technology - Universal Multiple-octet coded Character Set (UCS)
21184	         - Part 1: Architecture and Basic Multilingual Plane",
21185	         ISO Standard 10646-1, May 1993.

21187	   [11]  Alvestrand, H., "IETF Policy on Character Sets and Languages",
21188	         BCP 18, RFC 2277, January 1998.

21190	   [12]  Hoffman, P. and M. Blanchet, "Preparation of Internationalized
21191	         Strings ("stringprep")", RFC 3454, December 2002.

21193	   [13]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile
21194	         for Internationalized Domain Names (IDN)", RFC 3491,
21195	         March 2003.

21197	   [14]  Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing
21198	         for Message Authentication", RFC 2104, February 1997.

21200	   [15]  Schaad, J., Kaliski, B., and R. Housley, "Additional Algorithms
21201	         and Identifiers for RSA Cryptography for use in the Internet
21202	         X.509 Public Key Infrastructure Certificate and Certificate
21203	         Revocation List (CRL) Profile", RFC 4055, June 2005.

21205	   [16]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
21206	         Considerations Section in RFCs", BCP 26, RFC 2434,
21207	         October 1998.

21209	22.2.  Informative References

21211	   [17]  Nowicki, B., "NFS: Network File System Protocol specification",
21212	         RFC 1094, March 1989.

21214	   [18]  Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3
21215	         Protocol Specification", RFC 1813, June 1995.

21217	   [19]  Eisler, M., "NFS Version 2 and Version 3 Security Issues and
21218	         the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5",
21219	         RFC 2623, June 1999.

21221	   [20]  Juszczak, C., "Improving the Performance and Correctness of an
21222	         NFS Server", USENIX Conference Proceedings , June 1990.

21224	   [21]  Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On-
21225	         line Database", RFC 3232, January 2002.

21227	   [22]  Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
21228	         RFC 1833, August 1995.

21230	   [23]  Zelenka, J., Welch, B., and B. Halevy, "Object-based pNFS
21231	         Operations", July 2005, <ftp://www.ietf.org/internet-drafts/
21232	         draft-zelenka-pnfs-obj-01.txt>.

21234	   [24]  Black, D., "pNFS Block/Volume Layout", July 2005, <ftp://
21235	         www.ietf.org/internet-drafts/draft-black-pnfs-block-01.txt>.

21237	   [25]  Callaghan, B., "WebNFS Client Specification", RFC 2054,
21238	         October 1996.

21240	   [26]  Callaghan, B., "WebNFS Server Specification", RFC 2055,
21241	         October 1996.

21243	   [27]  Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
21244	         June 1999.

21246	   [28]  Simonsen, K., "Character Mnemonics and Character Sets",
21247	         RFC 1345, June 1992.

21249	   [29]  Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E.
21250	         Zeidner, "Internet Small Computer Systems Interface (iSCSI)",
21251	         RFC 3720, April 2004.

21253	   [30]  Snively, R., "Fibre Channel Protocol for SCSI, 2nd Version
21254	         (FCP-2)", ANSI/INCITS 350-2003, Oct 2003.

21256	   [31]  Weber, R., "Object-Based Storage Device Commands (OSD)", ANSI/
21257	         INCITS 400-2004, July 2004,
21258	         <http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf>.

21260	   [32]  Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997.

21262	   [33]  Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation
21263	         for WebNFS", RFC 2755, January 2000.

21265	Appendix A.  Acknowledgments

21267	   The initial drafts for the SECINFO extensions were edited by Mike
21268	   Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl
21269	   Burnett.

21271	   The initial drafts for the SESSIONS extensions were edited by Tom
21272	   Talpey, Spencer Shepler, Jon Bauman with contributions from Charles
21273	   Antonelli, Brent Callaghan, Mike Eisler, John Howard, Chet Juszczak,
21274	   Trond Myklebust, Dave Noveck, John Scott, Mike Stolarchuk and Mark
21275	   Wittle.  [[Comment.22: global namespace stuff?]]

21277	   The initial drafts for the Directory Delegations support were
21278	   contributed by Saadia Khan with input from Dave Noveck, Mike Eisler,
21279	   Carl Burnett, Ted Anderson and Tom Talpey.

21281	   The initial drafts for the ACL explanations were contributed by Sam
21282	   Falkner and Lisa Week.

21284	   The initial drafts for the parallel NFS support were edited by Brent
21285	   Welch and Garth Goodson.  Additional authors for those documents were
21286	   Benny Halevy, David Black, and Andy Adamson.  Additional input came
21287	   from the informal group which contributed to the construction of the
21288	   initial pNFS drafts; specific acknowledgement goes to Gary Grider,
21289	   Peter Corbett, Dave Noveck, and Peter Honeyman.  The pNFS work was
21290	   inspired by the NASD and OSD work done by Garth Gibson.  Gary Grider
21291	   of the national labs (LANL) has also been a champion of high-
21292	   performance parallel I/O.

21294	   Fredric Isaman found several errors in draft versions of the ONC RPC
21295	   XDR description of the NFSv4.1 protocol.

21297	Authors' Addresses

21299	   Spencer Shepler
21300	   Sun Microsystems, Inc.
21301	   7808 Moonflower Drive
21302	   Austin, TX  78750
21303	   USA

21305	   Phone: +1-512-349-9376
21306	   Email: spencer.shepler@sun.com

21308	   Mike Eisler
21309	   Network Appliance, Inc.
21310	   5765 Chase Point Circle
21311	   Colorado Springs, CO  80919
21312	   USA

21314	   Phone: +1-719-599-9026
21315	   Email: email2mre-@yahoo.com
21316	   URI:   Insert ietf2 between the - and @ symbols in the above address
21317	   David Noveck
21318	   Network Appliance, Inc.
21319	   1601 Trapelo Road, Suite 16
21320	   Waltham, MA  02454
21321	   USA

21323	   Phone: +1-781-768-5347
21324	   Email: dnoveck@netapp.com

21326	Full Copyright Statement

21328	   Copyright (C) The IETF Trust (2007).

21330	   This document is subject to the rights, licenses and restrictions
21331	   contained in BCP 78, and except as set forth therein, the authors
21332	   retain all their rights.

21334	   This document and the information contained herein are provided on an
21335	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
21336	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
21337	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
21338	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
21339	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
21340	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

21342	Intellectual Property

21344	   The IETF takes no position regarding the validity or scope of any
21345	   Intellectual Property Rights or other rights that might be claimed to
21346	   pertain to the implementation or use of the technology described in
21347	   this document or the extent to which any license under such rights
21348	   might or might not be available; nor does it represent that it has
21349	   made any independent effort to identify any such rights.  Information
21350	   on the procedures with respect to rights in RFC documents can be
21351	   found in BCP 78 and BCP 79.

21353	   Copies of IPR disclosures made to the IETF Secretariat and any
21354	   assurances of licenses to be made available, or the result of an
21355	   attempt made to obtain a general license or permission for the use of
21356	   such proprietary rights by implementers or users of this
21357	   specification can be obtained from the IETF on-line IPR repository at
21358	   http://www.ietf.org/ipr.

21360	   The IETF invites any interested party to bring to its attention any
21361	   copyrights, patents or patent applications, or other proprietary
21362	   rights that may cover technology that may be required to implement
21363	   this standard.  Please address the information to the IETF at
21364	   ietf-ipr@ietf.org.

21366	Acknowledgment

21368	   Funding for the RFC Editor function is provided by the IETF
21369	   Administrative Support Activity (IASA).