idnits 2.17.1 

draft-ietf-nfsv4-minorversion2-12.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  == There are 5 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Furthermore, each DS MUST not report to a client a sparse ADB which
     belongs to another DS.  One implication of this requirement is that the
     app_data_block4's adb_block_size MUST be either be the stripe width or
     the stripe width must be an even multiple of it.  The second implication
     here is that the DS must be able to use the Control Protocol to determine
     from the MDS where the sparse ADBs occur.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The second change is to provide a method for the server to notify
     the client that the attribute changed on an open file on the server.  If
     the file is closed, then during the open attempt, the client will gather
     the new attribute value.  The server MUST not communicate the new value
     of the attribute, the client MUST query it.  This requirement stems from
     the need for the client to provide sufficient access rights to the
     attribute.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     o  MUST not expose an object to either the client or server name
     space before its security information has been bound to it.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     With pNFS, the semantics of using READ_PLUS remains the same.  Any
     data server MAY return a hole or ADB result for a READ_PLUS request that
     it receives.  When a data server chooses to return such a result, it has
     the option of returning information for the data stored on that data
     server (as defined by the data layout), but it MUST not return results
     for a byte range that includes data managed by another data server.

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 20, 2012) is 4321 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 3760, but not defined

  -- Looks like a reference, but probably isn't: '32K' on line 3760

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 5661 (ref. '2') (Obsoleted by RFC 8881)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  == Outdated reference: A later version (-05) exists of
     draft-ietf-nfsv4-labreqs-00

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-nfsv4-labreqs (ref. '7')

  == Outdated reference: A later version (-35) exists of
     draft-ietf-nfsv4-rfc3530bis-09

  -- Obsolete informational reference (is this intentional?): RFC 2616 (ref.
     '13') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC
     7235)

  -- Obsolete informational reference (is this intentional?): RFC 5226 (ref.
     '25') (Obsoleted by RFC 8126)


     Summary: 2 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                          T. Haynes
3	Internet-Draft                                                    Editor
4	Intended status: Standards Track                           June 20, 2012
5	Expires: December 22, 2012

7	                     NFS Version 4 Minor Version 2
8	                 draft-ietf-nfsv4-minorversion2-12.txt

10	Abstract

12	   This Internet-Draft describes NFS version 4 minor version two,
13	   focusing mainly on the protocol extensions made from NFS version 4
14	   minor version 0 and NFS version 4 minor version 1.  Major extensions
15	   introduced in NFS version 4 minor version two include: Server-side
16	   Copy, Application I/O Advise, Space Reservations, Sparse Files,
17	   Application Data Blocks, and Labeled NFS.

19	Requirements Language

21	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
22	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
23	   document are to be interpreted as described in RFC 2119 [1].

25	Status of this Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on December 22, 2012.

42	Copyright Notice

44	   Copyright (c) 2012 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	   This document may contain material from IETF Documents or IETF
58	   Contributions published or made publicly available before November
59	   10, 2008.  The person(s) controlling the copyright in some of this
60	   material may not have granted the IETF Trust the right to allow
61	   modifications of such material outside the IETF Standards Process.
62	   Without obtaining an adequate license from the person(s) controlling
63	   the copyright in such materials, this document may not be modified
64	   outside the IETF Standards Process, and derivative works of it may
65	   not be created outside the IETF Standards Process, except to format
66	   it for publication as an RFC or to translate it into languages other
67	   than English.

69	Table of Contents

71	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  6
72	     1.1.   The NFS Version 4 Minor Version 2 Protocol  . . . . . . .  6
73	     1.2.   Scope of This Document  . . . . . . . . . . . . . . . . .  6
74	     1.3.   NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . .  6
75	     1.4.   Overview of NFSv4.2 Features  . . . . . . . . . . . . . .  7
76	       1.4.1.  Sparse Files . . . . . . . . . . . . . . . . . . . . .  7
77	       1.4.2.  Application I/O Advise . . . . . . . . . . . . . . . .  7
78	     1.5.   Differences from NFSv4.1  . . . . . . . . . . . . . . . .  7
79	   2.  NFS Server-side Copy . . . . . . . . . . . . . . . . . . . . .  7
80	     2.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . .  7
81	     2.2.   Protocol Overview . . . . . . . . . . . . . . . . . . . .  8
82	       2.2.1.  Overview of Copy Operations  . . . . . . . . . . . . .  9
83	       2.2.2.  Intra-Server Copy  . . . . . . . . . . . . . . . . . .  9
84	       2.2.3.  Inter-Server Copy  . . . . . . . . . . . . . . . . . . 10
85	       2.2.4.  Server-to-Server Copy Protocol . . . . . . . . . . . . 13
86	     2.3.   Requirements for Operations . . . . . . . . . . . . . . . 15
87	       2.3.1.  netloc4 - Network Locations  . . . . . . . . . . . . . 15
88	       2.3.2.  Copy Offload Stateids  . . . . . . . . . . . . . . . . 16
89	     2.4.   Security Considerations . . . . . . . . . . . . . . . . . 17
90	       2.4.1.  Inter-Server Copy Security . . . . . . . . . . . . . . 17
91	   3.  Support for Application IO Hints . . . . . . . . . . . . . . . 25
92	     3.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 25
93	     3.2.   POSIX Requirements  . . . . . . . . . . . . . . . . . . . 26
94	     3.3.   Additional Requirements . . . . . . . . . . . . . . . . . 27
95	     3.4.   Security Considerations . . . . . . . . . . . . . . . . . 28
96	     3.5.   IANA Considerations . . . . . . . . . . . . . . . . . . . 28
97	   4.  Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . . 28
98	     4.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 28
99	     4.2.   Terminology . . . . . . . . . . . . . . . . . . . . . . . 29
100	   5.  Space Reservation  . . . . . . . . . . . . . . . . . . . . . . 29
101	     5.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 29
102	   6.  Application Data Block Support . . . . . . . . . . . . . . . . 31
103	     6.1.   Generic Framework . . . . . . . . . . . . . . . . . . . . 32
104	       6.1.1.  Data Block Representation  . . . . . . . . . . . . . . 33
105	       6.1.2.  Data Content . . . . . . . . . . . . . . . . . . . . . 33
106	     6.2.   pNFS Considerations . . . . . . . . . . . . . . . . . . . 33
107	     6.3.   An Example of Detecting Corruption  . . . . . . . . . . . 34
108	     6.4.   Example of READ_PLUS  . . . . . . . . . . . . . . . . . . 35
109	     6.5.   Zero Filled Holes . . . . . . . . . . . . . . . . . . . . 36
110	   7.  Labeled NFS  . . . . . . . . . . . . . . . . . . . . . . . . . 36
111	     7.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 36
112	     7.2.   Definitions . . . . . . . . . . . . . . . . . . . . . . . 37
113	     7.3.   MAC Security Attribute  . . . . . . . . . . . . . . . . . 38
114	       7.3.1.  Delegations  . . . . . . . . . . . . . . . . . . . . . 38
115	       7.3.2.  Permission Checking  . . . . . . . . . . . . . . . . . 39
116	       7.3.3.  Object Creation  . . . . . . . . . . . . . . . . . . . 39
117	       7.3.4.  Existing Objects . . . . . . . . . . . . . . . . . . . 39
118	       7.3.5.  Label Changes  . . . . . . . . . . . . . . . . . . . . 39
119	     7.4.   pNFS Considerations . . . . . . . . . . . . . . . . . . . 40
120	     7.5.   Discovery of Server Labeled NFS Support . . . . . . . . . 40
121	     7.6.   MAC Security NFS Modes of Operation . . . . . . . . . . . 41
122	       7.6.1.  Full Mode  . . . . . . . . . . . . . . . . . . . . . . 41
123	       7.6.2.  Guest Mode . . . . . . . . . . . . . . . . . . . . . . 42
124	     7.7.   Security Considerations . . . . . . . . . . . . . . . . . 43
125	   8.  Sharing change attribute implementation details with NFSv4
126	       clients  . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
127	     8.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 43
128	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 44
129	   10. Error Values . . . . . . . . . . . . . . . . . . . . . . . . . 44
130	     10.1.  Error Definitions . . . . . . . . . . . . . . . . . . . . 44
131	       10.1.1. General Errors . . . . . . . . . . . . . . . . . . . . 44
132	       10.1.2. Server to Server Copy Errors . . . . . . . . . . . . . 45
133	       10.1.3. Labeled NFS Errors . . . . . . . . . . . . . . . . . . 45
134	   11. New File Attributes  . . . . . . . . . . . . . . . . . . . . . 46
135	     11.1.  New RECOMMENDED Attributes - List and Definition
136	            References  . . . . . . . . . . . . . . . . . . . . . . . 46
137	     11.2.  Attribute Definitions . . . . . . . . . . . . . . . . . . 46
138	   12. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . . 49
139	   13. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . . 53
140	     13.1.  Operation 59: COPY - Initiate a server-side copy  . . . . 53
141	     13.2.  Operation 60: COPY_ABORT - Cancel a server-side copy  . . 61
142	     13.3.  Operation 61: COPY_NOTIFY - Notify a source server of
143	            a future copy . . . . . . . . . . . . . . . . . . . . . . 62
144	     13.4.  Operation 62: COPY_REVOKE - Revoke a destination
145	            server's copy privileges  . . . . . . . . . . . . . . . . 63
146	     13.5.  Operation 63: COPY_STATUS - Poll for status of a
147	            server-side copy  . . . . . . . . . . . . . . . . . . . . 64
148	     13.6.  Modification to Operation 42: EXCHANGE_ID -
149	            Instantiate Client ID . . . . . . . . . . . . . . . . . . 66
150	     13.7.  Operation 64: INITIALIZE  . . . . . . . . . . . . . . . . 67
151	     13.8.  Operation 67: IO_ADVISE - Application I/O access
152	            pattern hints . . . . . . . . . . . . . . . . . . . . . . 70
153	     13.9.  Changes to Operation 51: LAYOUTRETURN . . . . . . . . . . 76
154	     13.10. Operation 65: READ_PLUS . . . . . . . . . . . . . . . . . 79
155	     13.11. Operation 66: SEEK  . . . . . . . . . . . . . . . . . . . 84
156	   14. NFSv4.2 Callback Operations  . . . . . . . . . . . . . . . . . 85
157	     14.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that
158	            the File's Attributes Changed . . . . . . . . . . . . . . 85
159	     14.2.  Operation 15: CB_COPY - Report results of a
160	            server-side copy  . . . . . . . . . . . . . . . . . . . . 86
161	   15. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 88
162	   16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 88
163	     16.1.  Normative References  . . . . . . . . . . . . . . . . . . 88
164	     16.2.  Informative References  . . . . . . . . . . . . . . . . . 89

166	   Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 90
167	   Appendix B.  RFC Editor Notes  . . . . . . . . . . . . . . . . . . 91
168	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 91

170	1.  Introduction

172	1.1.  The NFS Version 4 Minor Version 2 Protocol

174	   The NFS version 4 minor version 2 (NFSv4.2) protocol is the third
175	   minor version of the NFS version 4 (NFSv4) protocol.  The first minor
176	   version, NFSv4.0, is described in [10] and the second minor version,
177	   NFSv4.1, is described in [2].  It follows the guidelines for minor
178	   versioning that are listed in Section 11 of [10].

180	   As a minor version, NFSv4.2 is consistent with the overall goals for
181	   NFSv4, but extends the protocol so as to better meet those goals,
182	   based on experiences with NFSv4.1.  In addition, NFSv4.2 has adopted
183	   some additional goals, which motivate some of the major extensions in
184	   NFSv4.2.

186	1.2.  Scope of This Document

188	   This document describes the NFSv4.2 protocol.  With respect to
189	   NFSv4.0 and NFSv4.1, this document does not:

191	   o  describe the NFSv4.0 or NFSv4.1 protocols, except where needed to
192	      contrast with NFSv4.2.

194	   o  modify the specification of the NFSv4.0 or NFSv4.1 protocols.

196	   o  clarify the NFSv4.0 or NFSv4.1 protocols.  I.e., any
197	      clarifications made here apply to NFSv4.2 and neither of the prior
198	      protocols.

200	   The full XDR for NFSv4.2 is presented in [3].

202	1.3.  NFSv4.2 Goals

204	   The goal of the design of NFSv4.2 is to take common local file system
205	   features and offer them remotely.  These features might

207	   o  already be available on the servers, e.g., sparse files

209	   o  be under development as a new standard, e.g., SEEK_HOLE and
210	      SEEK_DATA

212	   o  be used by clients with the servers via some proprietary means,
213	      e.g., Labeled NFS

215	   but the clients are not able to leverage them on the server within
216	   the confines of the NFS protocol.

218	1.4.  Overview of NFSv4.2 Features

220	   [[Comment.1: This needs fleshing out! --TH]]

222	1.4.1.  Sparse Files

224	   Two new operations are defined to support the reading of sparse files
225	   (READ_PLUS) and the punching of holes to remove backing storage
226	   (INITIALIZE).

228	1.4.2.  Application I/O Advise

230	   We propose a new IO_ADVISE operation for NFSv4.2 that clients can use
231	   to communicate expected I/O behavior to the server.  By communicating
232	   future I/O behavior such as whether a file will be accessed
233	   sequentially or randomly, and whether a file will or will not be
234	   accessed in the near future, servers can optimize future I/O requests
235	   for a file by, for example, prefetching or evicting data.  This
236	   operation can be used to support the posix_fadvise function as well
237	   as other applications such as databases and video editors.

239	1.5.  Differences from NFSv4.1

241	   In NFSv4.1, the only way to introduce new variants of an operation
242	   was to introduce a new operation.  I.e., READ becomes either READ2 or
243	   READ_PLUS.  With the use of discriminated unions as parameters to
244	   such functions in NFSv4.2, it is possible to add a new arm in a
245	   subsequent minor version.  And it is also possible to move such an
246	   operation from OPTIONAL/RECOMMENDED to REQUIRED.  Forcing an
247	   implementation to adopt each arm of a discriminated union at such a
248	   time does not meet the spirit of the minor versioning rules.  As
249	   such, new arms of a discriminated union MUST follow the same
250	   guidelines for minor versioning as operations in NFSv4.1 - i.e., they
251	   may not be made REQUIRED.  To support this, a new error code,
252	   NFS4ERR_UNION_NOTSUPP, is introduced which allows the server to
253	   communicate to the client that the operation is supported, but the
254	   specific arm of the discriminated union is not.

256	2.  NFS Server-side Copy

258	2.1.  Introduction

260	   This section describes a server-side copy feature for the NFS
261	   protocol.

263	   The server-side copy feature provides a mechanism for the NFS client
264	   to perform a file copy on the server without the data being
265	   transmitted back and forth over the network.

267	   Without this feature, an NFS client copies data from one location to
268	   another by reading the data from the server over the network, and
269	   then writing the data back over the network to the server.  Using
270	   this server-side copy operation, the client is able to instruct the
271	   server to copy the data locally without the data being sent back and
272	   forth over the network unnecessarily.

274	   If the source object and destination object are on different file
275	   servers, the file servers will communicate with one another to
276	   perform the copy operation.  The server-to-server protocol by which
277	   this is accomplished is not defined in this document.

279	2.2.  Protocol Overview

281	   The server-side copy offload operations support both intra-server and
282	   inter-server file copies.  An intra-server copy is a copy in which
283	   the source file and destination file reside on the same server.  In
284	   an inter-server copy, the source file and destination file are on
285	   different servers.  In both cases, the copy may be performed
286	   synchronously or asynchronously.

288	   Throughout the rest of this document, we refer to the NFS server
289	   containing the source file as the "source server" and the NFS server
290	   to which the file is transferred as the "destination server".  In the
291	   case of an intra-server copy, the source server and destination
292	   server are the same server.  Therefore in the context of an intra-
293	   server copy, the terms source server and destination server refer to
294	   the single server performing the copy.

296	   The operations described below are designed to copy files.  Other
297	   file system objects can be copied by building on these operations or
298	   using other techniques.  For example if the user wishes to copy a
299	   directory, the client can synthesize a directory copy by first
300	   creating the destination directory and then copying the source
301	   directory's files to the new destination directory.  If the user
302	   wishes to copy a namespace junction [11] [12], the client can use the
303	   ONC RPC Federated Filesystem protocol [12] to perform the copy.
304	   Specifically the client can determine the source junction's
305	   attributes using the FEDFS_LOOKUP_FSN procedure and create a
306	   duplicate junction using the FEDFS_CREATE_JUNCTION procedure.

308	   For the inter-server copy, the operations are defined to be
309	   compatible with the traditional copy authentication approach.  The
310	   client and user are authorized at the source for reading.  Then they
311	   are authorized at the destination for writing.

313	2.2.1.  Overview of Copy Operations

315	   COPY_NOTIFY:  For inter-server copies, the client sends this
316	      operation to the source server to notify it of a future file copy
317	      from a given destination server for the given user.
318	      (Section 13.3)

320	   COPY_REVOKE:  Also for inter-server copies, the client sends this
321	      operation to the source server to revoke permission to copy a file
322	      for the given user.  (Section 13.4)

324	   COPY:  Used by the client to request a file copy.  (Section 13.1)

326	   COPY_ABORT:  Used by the client to abort an asynchronous file copy.
327	      (Section 13.2)

329	   COPY_STATUS:  Used by the client to poll the status of an
330	      asynchronous file copy.  (Section 13.5)

332	   CB_COPY:  Used by the destination server to report the results of an
333	      asynchronous file copy to the client.  (Section 14.2)

335	2.2.2.  Intra-Server Copy

337	   To copy a file on a single server, the client uses a COPY operation.
338	   The server may respond to the copy operation with the final results
339	   of the copy or it may perform the copy asynchronously and deliver the
340	   results using a CB_COPY operation callback.  If the copy is performed
341	   asynchronously, the client may poll the status of the copy using
342	   COPY_STATUS or cancel the copy using COPY_ABORT.

344	   A synchronous intra-server copy is shown in Figure 1.  In this
345	   example, the NFS server chooses to perform the copy synchronously.
346	   The copy operation is completed, either successfully or
347	   unsuccessfully, before the server replies to the client's request.
348	   The server's reply contains the final result of the operation.

350	     Client                                  Server
351	        +                                      +
352	        |                                      |
353	        |--- COPY ---------------------------->| Client requests
354	        |<------------------------------------/| a file copy
355	        |                                      |
356	        |                                      |

358	                Figure 1: A synchronous intra-server copy.

360	   An asynchronous intra-server copy is shown in Figure 2.  In this
361	   example, the NFS server performs the copy asynchronously.  The
362	   server's reply to the copy request indicates that the copy operation
363	   was initiated and the final result will be delivered at a later time.
364	   The server's reply also contains a copy stateid.  The client may use
365	   this copy stateid to poll for status information (as shown) or to
366	   cancel the copy using a COPY_ABORT.  When the server completes the
367	   copy, the server performs a callback to the client and reports the
368	   results.

370	     Client                                  Server
371	        +                                      +
372	        |                                      |
373	        |--- COPY ---------------------------->| Client requests
374	        |<------------------------------------/| a file copy
375	        |                                      |
376	        |                                      |
377	        |--- COPY_STATUS --------------------->| Client may poll
378	        |<------------------------------------/| for status
379	        |                                      |
380	        |                  .                   | Multiple COPY_STATUS
381	        |                  .                   | operations may be sent.
382	        |                  .                   |
383	        |                                      |
384	        |<-- CB_COPY --------------------------| Server reports results
385	        |\------------------------------------>|
386	        |                                      |

388	               Figure 2: An asynchronous intra-server copy.

390	2.2.3.  Inter-Server Copy

392	   A copy may also be performed between two servers.  The copy protocol
393	   is designed to accommodate a variety of network topologies.  As shown
394	   in Figure 3, the client and servers may be connected by multiple
395	   networks.  In particular, the servers may be connected by a
396	   specialized, high speed network (network 192.168.33.0/24 in the
397	   diagram) that does not include the client.  The protocol allows the
398	   client to setup the copy between the servers (over network
399	   10.11.78.0/24 in the diagram) and for the servers to communicate on
400	   the high speed network if they choose to do so.

402	                             192.168.33.0/24
403	                 +-------------------------------------+
404	                 |                                     |
405	                 |                                     |
406	                 | 192.168.33.18                       | 192.168.33.56
407	         +-------+------+                       +------+------+
408	         |     Source   |                       | Destination |
409	         +-------+------+                       +------+------+
410	                 | 10.11.78.18                         | 10.11.78.56
411	                 |                                     |
412	                 |                                     |
413	                 |             10.11.78.0/24           |
414	                 +------------------+------------------+
415	                                    |
416	                                    |
417	                                    | 10.11.78.243
418	                              +-----+-----+
419	                              |   Client  |
420	                              +-----------+

422	            Figure 3: An example inter-server network topology.

424	   For an inter-server copy, the client notifies the source server that
425	   a file will be copied by the destination server using a COPY_NOTIFY
426	   operation.  The client then initiates the copy by sending the COPY
427	   operation to the destination server.  The destination server may
428	   perform the copy synchronously or asynchronously.

430	   A synchronous inter-server copy is shown in Figure 4.  In this case,
431	   the destination server chooses to perform the copy before responding
432	   to the client's COPY request.

434	   An asynchronous copy is shown in Figure 5.  In this case, the
435	   destination server chooses to respond to the client's COPY request
436	   immediately and then perform the copy asynchronously.

438	     Client                Source         Destination
439	        +                    +                 +
440	        |                    |                 |
441	        |--- COPY_NOTIFY --->|                 |
442	        |<------------------/|                 |
443	        |                    |                 |
444	        |                    |                 |
445	        |--- COPY ---------------------------->|
446	        |                    |                 |
447	        |                    |                 |
448	        |                    |<----- read -----|
449	        |                    |\--------------->|
450	        |                    |                 |
451	        |                    |        .        | Multiple reads may
452	        |                    |        .        | be necessary
453	        |                    |        .        |
454	        |                    |                 |
455	        |                    |                 |
456	        |<------------------------------------/| Destination replies
457	        |                    |                 | to COPY

459	                Figure 4: A synchronous inter-server copy.

461	     Client                Source         Destination
462	        +                    +                 +
463	        |                    |                 |
464	        |--- COPY_NOTIFY --->|                 |
465	        |<------------------/|                 |
466	        |                    |                 |
467	        |                    |                 |
468	        |--- COPY ---------------------------->|
469	        |<------------------------------------/|
470	        |                    |                 |
471	        |                    |                 |
472	        |                    |<----- read -----|
473	        |                    |\--------------->|
474	        |                    |                 |
475	        |                    |        .        | Multiple reads may
476	        |                    |        .        | be necessary
477	        |                    |        .        |
478	        |                    |                 |
479	        |                    |                 |
480	        |--- COPY_STATUS --------------------->| Client may poll
481	        |<------------------------------------/| for status
482	        |                    |                 |
483	        |                    |        .        | Multiple COPY_STATUS
484	        |                    |        .        | operations may be sent
485	        |                    |        .        |
486	        |                    |                 |
487	        |                    |                 |
488	        |                    |                 |
489	        |<-- CB_COPY --------------------------| Destination reports
490	        |\------------------------------------>| results
491	        |                    |                 |

493	               Figure 5: An asynchronous inter-server copy.

495	2.2.4.  Server-to-Server Copy Protocol

497	   The source server and destination server are not required to use a
498	   specific protocol to transfer the file data.  The choice of what
499	   protocol to use is ultimately the destination server's decision.

501	2.2.4.1.  Using NFSv4.x as a Server-to-Server Copy Protocol

503	   The destination server MAY use standard NFSv4.x (where x >= 1) to
504	   read the data from the source server.  If NFSv4.x is used for the
505	   server-to-server copy protocol, the destination server can use the
506	   filehandle contained in the COPY request with standard NFSv4.x
507	   operations to read data from the source server.  Specifically, the
508	   destination server may use the NFSv4.x OPEN operation's CLAIM_FH
509	   facility to open the file being copied and obtain an open stateid.
510	   Using the stateid, the destination server may then use NFSv4.x READ
511	   operations to read the file.

513	2.2.4.2.  Using an alternative Server-to-Server Copy Protocol

515	   In a homogeneous environment, the source and destination servers
516	   might be able to perform the file copy extremely efficiently using
517	   specialized protocols.  For example the source and destination
518	   servers might be two nodes sharing a common file system format for
519	   the source and destination file systems.  Thus the source and
520	   destination are in an ideal position to efficiently render the image
521	   of the source file to the destination file by replicating the file
522	   system formats at the block level.  Another possibility is that the
523	   source and destination might be two nodes sharing a common storage
524	   area network, and thus there is no need to copy any data at all, and
525	   instead ownership of the file and its contents might simply be re-
526	   assigned to the destination.  To allow for these possibilities, the
527	   destination server is allowed to use a server-to-server copy protocol
528	   of its choice.

530	   In a heterogeneous environment, using a protocol other than NFSv4.x
531	   (e.g., HTTP [13] or FTP [14]) presents some challenges.  In
532	   particular, the destination server is presented with the challenge of
533	   accessing the source file given only an NFSv4.x filehandle.

535	   One option for protocols that identify source files with path names
536	   is to use an ASCII hexadecimal representation of the source
537	   filehandle as the file name.

539	   Another option for the source server is to use URLs to direct the
540	   destination server to a specialized service.  For example, the
541	   response to COPY_NOTIFY could include the URL
542	   ftp://s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII
543	   hexadecimal representation of the source filehandle.  When the
544	   destination server receives the source server's URL, it would use
545	   "_FH/0x12345" as the file name to pass to the FTP server listening on
546	   port 9999 of s1.example.com.  On port 9999 there would be a special
547	   instance of the FTP service that understands how to convert NFS
548	   filehandles to an open file descriptor (in many operating systems,
549	   this would require a new system call, one which is the inverse of the
550	   makefh() function that the pre-NFSv4 MOUNT service needs).

552	   Authenticating and identifying the destination server to the source
553	   server is also a challenge.  Recommendations for how to accomplish
554	   this are given in Section 2.4.1.2.4 and Section 2.4.1.4.

556	2.3.  Requirements for Operations

558	   The implementation of server-side copy is OPTIONAL by the client and
559	   the server.  However, in order to successfully copy a file, some
560	   operations MUST be supported by the client and/or server.

562	   If a client desires an intra-server file copy, then it MUST support
563	   the COPY and CB_COPY operations.  If COPY returns a stateid, then the
564	   client MAY use the COPY_ABORT and COPY_STATUS operations.

566	   If a client desires an inter-server file copy, then it MUST support
567	   the COPY, COPY_NOTICE, and CB_COPY operations, and MAY use the
568	   COPY_REVOKE operation.  If COPY returns a stateid, then the client
569	   MAY use the COPY_ABORT and COPY_STATUS operations.

571	   If a server supports intra-server copy, then the server MUST support
572	   the COPY operation.  If a server's COPY operation returns a stateid,
573	   then the server MUST also support these operations: CB_COPY,
574	   COPY_ABORT, and COPY_STATUS.

576	   If a source server supports inter-server copy, then the source server
577	   MUST support all these operations: COPY_NOTIFY and COPY_REVOKE.  If a
578	   destination server supports inter-server copy, then the destination
579	   server MUST support the COPY operation.  If a destination server's
580	   COPY operation returns a stateid, then the destination server MUST
581	   also support these operations: CB_COPY, COPY_ABORT, COPY_NOTIFY,
582	   COPY_REVOKE, and COPY_STATUS.

584	   Each operation is performed in the context of the user identified by
585	   the ONC RPC credential of its containing COMPOUND or CB_COMPOUND
586	   request.  For example, a COPY_ABORT operation issued by a given user
587	   indicates that a specified COPY operation initiated by the same user
588	   be canceled.  Therefore a COPY_ABORT MUST NOT interfere with a copy
589	   of the same file initiated by another user.

591	   An NFS server MAY allow an administrative user to monitor or cancel
592	   copy operations using an implementation specific interface.

594	2.3.1.  netloc4 - Network Locations

596	   The server-side copy operations specify network locations using the
597	   netloc4 data type shown below:

599	   enum netloc_type4 {
600	           NL4_NAME        = 0,
601	           NL4_URL         = 1,
602	           NL4_NETADDR     = 2
603	   };
604	   union netloc4 switch (netloc_type4 nl_type) {
605	           case NL4_NAME:          utf8str_cis nl_name;
606	           case NL4_URL:           utf8str_cis nl_url;
607	           case NL4_NETADDR:       netaddr4    nl_addr;
608	   };

610	   If the netloc4 is of type NL4_NAME, the nl_name field MUST be
611	   specified as a UTF-8 string.  The nl_name is expected to be resolved
612	   to a network address via DNS, LDAP, NIS, /etc/hosts, or some other
613	   means.  If the netloc4 is of type NL4_URL, a server URL [4]
614	   appropriate for the server-to-server copy operation is specified as a
615	   UTF-8 string.  If the netloc4 is of type NL4_NETADDR, the nl_addr
616	   field MUST contain a valid netaddr4 as defined in Section 3.3.9 of
617	   [2].

619	   When netloc4 values are used for an inter-server copy as shown in
620	   Figure 3, their values may be evaluated on the source server,
621	   destination server, and client.  The network environment in which
622	   these systems operate should be configured so that the netloc4 values
623	   are interpreted as intended on each system.

625	2.3.2.  Copy Offload Stateids

627	   A server may perform a copy offload operation asynchronously.  An
628	   asynchronous copy is tracked using a copy offload stateid.  Copy
629	   offload stateids are included in the COPY, COPY_ABORT, COPY_STATUS,
630	   and CB_COPY operations.

632	   Section 8.2.4 of [2] specifies that stateids are valid until either
633	   (A) the client or server restart or (B) the client returns the
634	   resource.

636	   A copy offload stateid will be valid until either (A) the client or
637	   server restarts or (B) the client returns the resource by issuing a
638	   COPY_ABORT operation or the client replies to a CB_COPY operation.

640	   A copy offload stateid's seqid MUST NOT be 0.  In the context of a
641	   copy offload operation, it is ambiguous to indicate the most recent
642	   copy offload operation using a stateid with seqid of 0.  Therefore a
643	   copy offload stateid with seqid of 0 MUST be considered invalid.

645	2.4.  Security Considerations

647	   The security considerations pertaining to NFSv4 [10] apply to this
648	   chapter.

650	   The standard security mechanisms provide by NFSv4 [10] may be used to
651	   secure the protocol described in this chapter.

653	   NFSv4 clients and servers supporting the inter-server copy operations
654	   described in this chapter are REQUIRED to implement [5], including
655	   the RPCSEC_GSSv3 privileges copy_from_auth and copy_to_auth.  If the
656	   server-to-server copy protocol is ONC RPC based, the servers are also
657	   REQUIRED to implement the RPCSEC_GSSv3 privilege copy_confirm_auth.
658	   These requirements to implement are not requirements to use.  NFSv4
659	   clients and servers are RECOMMENDED to use [5] to secure server-side
660	   copy operations.

662	2.4.1.  Inter-Server Copy Security

664	2.4.1.1.  Requirements for Secure Inter-Server Copy

666	   Inter-server copy is driven by several requirements:

668	   o  The specification MUST NOT mandate an inter-server copy protocol.
669	      There are many ways to copy data.  Some will be more optimal than
670	      others depending on the identities of the source server and
671	      destination server.  For example the source and destination
672	      servers might be two nodes sharing a common file system format for
673	      the source and destination file systems.  Thus the source and
674	      destination are in an ideal position to efficiently render the
675	      image of the source file to the destination file by replicating
676	      the file system formats at the block level.  In other cases, the
677	      source and destination might be two nodes sharing a common storage
678	      area network, and thus there is no need to copy any data at all,
679	      and instead ownership of the file and its contents simply gets re-
680	      assigned to the destination.

682	   o  The specification MUST provide guidance for using NFSv4.x as a
683	      copy protocol.  For those source and destination servers willing
684	      to use NFSv4.x there are specific security considerations that
685	      this specification can and does address.

687	   o  The specification MUST NOT mandate pre-configuration between the
688	      source and destination server.  Requiring that the source and
689	      destination first have a "copying relationship" increases the
690	      administrative burden.  However the specification MUST NOT
691	      preclude implementations that require pre-configuration.

693	   o  The specification MUST NOT mandate a trust relationship between
694	      the source and destination server.  The NFSv4 security model
695	      requires mutual authentication between a principal on an NFS
696	      client and a principal on an NFS server.  This model MUST continue
697	      with the introduction of COPY.

699	2.4.1.2.  Inter-Server Copy with RPCSEC_GSSv3

701	   When the client sends a COPY_NOTIFY to the source server to expect
702	   the destination to attempt to copy data from the source server, it is
703	   expected that this copy is being done on behalf of the principal
704	   (called the "user principal") that sent the RPC request that encloses
705	   the COMPOUND procedure that contains the COPY_NOTIFY operation.  The
706	   user principal is identified by the RPC credentials.  A mechanism
707	   that allows the user principal to authorize the destination server to
708	   perform the copy in a manner that lets the source server properly
709	   authenticate the destination's copy, and without allowing the
710	   destination to exceed its authorization is necessary.

712	   An approach that sends delegated credentials of the client's user
713	   principal to the destination server is not used for the following
714	   reasons.  If the client's user delegated its credentials, the
715	   destination would authenticate as the user principal.  If the
716	   destination were using the NFSv4 protocol to perform the copy, then
717	   the source server would authenticate the destination server as the
718	   user principal, and the file copy would securely proceed.  However,
719	   this approach would allow the destination server to copy other files.
720	   The user principal would have to trust the destination server to not
721	   do so.  This is counter to the requirements, and therefore is not
722	   considered.  Instead an approach using RPCSEC_GSSv3 [5] privileges is
723	   proposed.

725	   One of the stated applications of the proposed RPCSEC_GSSv3 protocol
726	   is compound client host and user authentication [+ privilege
727	   assertion].  For inter-server file copy, we require compound NFS
728	   server host and user authentication [+ privilege assertion].  The
729	   distinction between the two is one without meaning.

731	   RPCSEC_GSSv3 introduces the notion of privileges.  We define three
732	   privileges:

734	   copy_from_auth:  A user principal is authorizing a source principal
735	      ("nfs@<source>") to allow a destination principal ("nfs@
736	      <destination>") to copy a file from the source to the destination.
737	      This privilege is established on the source server before the user
738	      principal sends a COPY_NOTIFY operation to the source server.

740	   struct copy_from_auth_priv {
741	           secret4             cfap_shared_secret;
742	           netloc4             cfap_destination;
743	           /* the NFSv4 user name that the user principal maps to */
744	           utf8str_mixed       cfap_username;
745	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
746	           unsigned int        cfap_seq_num;
747	   };

749	      cfp_shared_secret is a secret value the user principal generates.

751	   copy_to_auth:  A user principal is authorizing a destination
752	      principal ("nfs@<destination>") to allow it to copy a file from
753	      the source to the destination.  This privilege is established on
754	      the destination server before the user principal sends a COPY
755	      operation to the destination server.

757	   struct copy_to_auth_priv {
758	           /* equal to cfap_shared_secret */
759	           secret4              ctap_shared_secret;
760	           netloc4              ctap_source;
761	           /* the NFSv4 user name that the user principal maps to */
762	           utf8str_mixed        ctap_username;
763	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
764	           unsigned int         ctap_seq_num;
765	   };

767	      ctap_shared_secret is a secret value the user principal generated
768	      and was used to establish the copy_from_auth privilege with the
769	      source principal.

771	   copy_confirm_auth:  A destination principal is confirming with the
772	      source principal that it is authorized to copy data from the
773	      source on behalf of the user principal.  When the inter-server
774	      copy protocol is NFSv4, or for that matter, any protocol capable
775	      of being secured via RPCSEC_GSSv3 (i.e., any ONC RPC protocol),
776	      this privilege is established before the file is copied from the
777	      source to the destination.

779	   struct copy_confirm_auth_priv {
780	           /* equal to GSS_GetMIC() of cfap_shared_secret */
781	           opaque              ccap_shared_secret_mic<>;
782	           /* the NFSv4 user name that the user principal maps to */
783	           utf8str_mixed       ccap_username;
784	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
785	           unsigned int        ccap_seq_num;
786	   };

788	2.4.1.2.1.  Establishing a Security Context

790	   When the user principal wants to COPY a file between two servers, if
791	   it has not established copy_from_auth and copy_to_auth privileges on
792	   the servers, it establishes them:

794	   o  The user principal generates a secret it will share with the two
795	      servers.  This shared secret will be placed in the
796	      cfap_shared_secret and ctap_shared_secret fields of the
797	      appropriate privilege data types, copy_from_auth_priv and
798	      copy_to_auth_priv.

800	   o  An instance of copy_from_auth_priv is filled in with the shared
801	      secret, the destination server, and the NFSv4 user id of the user
802	      principal.  It will be sent with an RPCSEC_GSS3_CREATE procedure,
803	      and so cfap_seq_num is set to the seq_num of the credential of the
804	      RPCSEC_GSS3_CREATE procedure.  Because cfap_shared_secret is a
805	      secret, after XDR encoding copy_from_auth_priv, GSS_Wrap() (with
806	      privacy) is invoked on copy_from_auth_priv.  The
807	      RPCSEC_GSS3_CREATE procedure's arguments are:

809	      struct {
810	         rpc_gss3_gss_binding    *compound_binding;
811	         rpc_gss3_chan_binding   *chan_binding_mic;
812	         rpc_gss3_assertion      assertions<>;
813	         rpc_gss3_extension      extensions<>;
814	      } rpc_gss3_create_args;

816	      The string "copy_from_auth" is placed in assertions[0].privs.  The
817	      output of GSS_Wrap() is placed in extensions[0].data.  The field
818	      extensions[0].critical is set to TRUE.  The source server calls
819	      GSS_Unwrap() on the privilege, and verifies that the seq_num
820	      matches the credential.  It then verifies that the NFSv4 user id
821	      being asserted matches the source server's mapping of the user
822	      principal.  If it does, the privilege is established on the source
823	      server as: <"copy_from_auth", user id, destination>.  The
824	      successful reply to RPCSEC_GSS3_CREATE has:

826	      struct {
827	         opaque                  handle<>;
828	         rpc_gss3_chan_binding   *chan_binding_mic;
829	         rpc_gss3_assertion      granted_assertions<>;
830	         rpc_gss3_assertion      server_assertions<>;
831	         rpc_gss3_extension      extensions<>;
832	      } rpc_gss3_create_res;

834	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
835	      use on COPY_NOTIFY requests involving the source and destination
836	      server. granted_assertions[0].privs will be equal to
837	      "copy_from_auth".  The server will return a GSS_Wrap() of
838	      copy_to_auth_priv.

840	   o  An instance of copy_to_auth_priv is filled in with the shared
841	      secret, the source server, and the NFSv4 user id.  It will be sent
842	      with an RPCSEC_GSS3_CREATE procedure, and so ctap_seq_num is set
843	      to the seq_num of the credential of the RPCSEC_GSS3_CREATE
844	      procedure.  Because ctap_shared_secret is a secret, after XDR
845	      encoding copy_to_auth_priv, GSS_Wrap() is invoked on
846	      copy_to_auth_priv.  The RPCSEC_GSS3_CREATE procedure's arguments
847	      are:

849	      struct {
850	         rpc_gss3_gss_binding    *compound_binding;
851	         rpc_gss3_chan_binding   *chan_binding_mic;
852	         rpc_gss3_assertion      assertions<>;
853	         rpc_gss3_extension      extensions<>;
854	      } rpc_gss3_create_args;

856	      The string "copy_to_auth" is placed in assertions[0].privs.  The
857	      output of GSS_Wrap() is placed in extensions[0].data.  The field
858	      extensions[0].critical is set to TRUE.  After unwrapping,
859	      verifying the seq_num, and the user principal to NFSv4 user ID
860	      mapping, the destination establishes a privilege of
861	      <"copy_to_auth", user id, source>.  The successful reply to
862	      RPCSEC_GSS3_CREATE has:

864	      struct {
865	         opaque                  handle<>;
866	         rpc_gss3_chan_binding   *chan_binding_mic;
867	         rpc_gss3_assertion      granted_assertions<>;
868	         rpc_gss3_assertion      server_assertions<>;
869	         rpc_gss3_extension      extensions<>;

871	      } rpc_gss3_create_res;

873	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
874	      use on COPY requests involving the source and destination server.
875	      The field granted_assertions[0].privs will be equal to
876	      "copy_to_auth".  The server will return a GSS_Wrap() of
877	      copy_to_auth_priv.

879	2.4.1.2.2.  Starting a Secure Inter-Server Copy

881	   When the client sends a COPY_NOTIFY request to the source server, it
882	   uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle.
883	   cna_destination_server in COPY_NOTIFY MUST be the same as the name of
884	   the destination server specified in copy_from_auth_priv.  Otherwise,
885	   COPY_NOTIFY will fail with NFS4ERR_ACCESS.  The source server
886	   verifies that the privilege <"copy_from_auth", user id, destination>
887	   exists, and annotates it with the source filehandle, if the user
888	   principal has read access to the source file, and if administrative
889	   policies give the user principal and the NFS client read access to
890	   the source file (i.e., if the ACCESS operation would grant read
891	   access).  Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS.

893	   When the client sends a COPY request to the destination server, it
894	   uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle.
895	   ca_source_server in COPY MUST be the same as the name of the source
896	   server specified in copy_to_auth_priv.  Otherwise, COPY will fail
897	   with NFS4ERR_ACCESS.  The destination server verifies that the
898	   privilege <"copy_to_auth", user id, source> exists, and annotates it
899	   with the source and destination filehandles.  If the client has
900	   failed to establish the "copy_to_auth" policy it will reject the
901	   request with NFS4ERR_PARTNER_NO_AUTH.

903	   If the client sends a COPY_REVOKE to the source server to rescind the
904	   destination server's copy privilege, it uses the privileged
905	   "copy_from_auth" RPCSEC_GSSv3 handle and the cra_destination_server
906	   in COPY_REVOKE MUST be the same as the name of the destination server
907	   specified in copy_from_auth_priv.  The source server will then delete
908	   the <"copy_from_auth", user id, destination> privilege and fail any
909	   subsequent copy requests sent under the auspices of this privilege
910	   from the destination server.

912	2.4.1.2.3.  Securing ONC RPC Server-to-Server Copy Protocols

914	   After a destination server has a "copy_to_auth" privilege established
915	   on it, and it receives a COPY request, if it knows it will use an ONC
916	   RPC protocol to copy data, it will establish a "copy_confirm_auth"
917	   privilege on the source server, using nfs@<destination> as the
918	   initiator principal, and nfs@<source> as the target principal.

920	   The value of the field ccap_shared_secret_mic is a GSS_VerifyMIC() of
921	   the shared secret passed in the copy_to_auth privilege.  The field
922	   ccap_username is the mapping of the user principal to an NFSv4 user
923	   name ("user"@"domain" form), and MUST be the same as ctap_username
924	   and cfap_username.  The field ccap_seq_num is the seq_num of the
925	   RPCSEC_GSSv3 credential used for the RPCSEC_GSS3_CREATE procedure the
926	   destination will send to the source server to establish the
927	   privilege.

929	   The source server verifies the privilege, and establishes a
930	   <"copy_confirm_auth", user id, destination> privilege.  If the source
931	   server fails to verify the privilege, the COPY operation will be
932	   rejected with NFS4ERR_PARTNER_NO_AUTH.  All subsequent ONC RPC
933	   requests sent from the destination to copy data from the source to
934	   the destination will use the RPCSEC_GSSv3 handle returned by the
935	   source's RPCSEC_GSS3_CREATE response.

937	   Note that the use of the "copy_confirm_auth" privilege accomplishes
938	   the following:

940	   o  if a protocol like NFS is being used, with export policies, export
941	      policies can be overridden in case the destination server as-an-
942	      NFS-client is not authorized

944	   o  manual configuration to allow a copy relationship between the
945	      source and destination is not needed.

947	   If the attempt to establish a "copy_confirm_auth" privilege fails,
948	   then when the user principal sends a COPY request to destination, the
949	   destination server will reject it with NFS4ERR_PARTNER_NO_AUTH.

951	2.4.1.2.4.  Securing Non ONC RPC Server-to-Server Copy Protocols

953	   If the destination won't be using ONC RPC to copy the data, then the
954	   source and destination are using an unspecified copy protocol.  The
955	   destination could use the shared secret and the NFSv4 user id to
956	   prove to the source server that the user principal has authorized the
957	   copy.

959	   For protocols that authenticate user names with passwords (e.g., HTTP
960	   [13] and FTP [14]), the nfsv4 user id could be used as the user name,
961	   and an ASCII hexadecimal representation of the RPCSEC_GSSv3 shared
962	   secret could be used as the user password or as input into non-
963	   password authentication methods like CHAP [15].

965	2.4.1.3.  Inter-Server Copy via ONC RPC but without RPCSEC_GSSv3

967	   ONC RPC security flavors other than RPCSEC_GSSv3 MAY be used with the
968	   server-side copy offload operations described in this chapter.  In
969	   particular, host-based ONC RPC security flavors such as AUTH_NONE and
970	   AUTH_SYS MAY be used.  If a host-based security flavor is used, a
971	   minimal level of protection for the server-to-server copy protocol is
972	   possible.

974	   In the absence of strong security mechanisms such as RPCSEC_GSSv3,
975	   the challenge is how the source server and destination server
976	   identify themselves to each other, especially in the presence of
977	   multi-homed source and destination servers.  In a multi-homed
978	   environment, the destination server might not contact the source
979	   server from the same network address specified by the client in the
980	   COPY_NOTIFY.  This can be overcome using the procedure described
981	   below.

983	   When the client sends the source server the COPY_NOTIFY operation,
984	   the source server may reply to the client with a list of target
985	   addresses, names, and/or URLs and assign them to the unique
986	   quadruple: <random number, source fh, user ID, destination address
987	   Y>.  If the destination uses one of these target netlocs to contact
988	   the source server, the source server will be able to uniquely
989	   identify the destination server, even if the destination server does
990	   not connect from the address specified by the client in COPY_NOTIFY.
991	   The level of assurance in this identification depends on the
992	   unpredictability, strength and secrecy of the random number.

994	   For example, suppose the network topology is as shown in Figure 3.
995	   If the source filehandle is 0x12345, the source server may respond to
996	   a COPY_NOTIFY for destination 10.11.78.56 with the URLs:

998	      nfs://10.11.78.18//_COPY/FvhH1OKbu8VrxvV1erdjvR7N/10.11.78.56/_FH/
999	      0x12345

1001	      nfs://192.168.33.18//_COPY/FvhH1OKbu8VrxvV1erdjvR7N/10.11.78.56/
1002	      _FH/0x12345

1004	   The name component after _COPY is 24 characters of base 64, more than
1005	   enough to encode a 128 bit random number.

1007	   The client will then send these URLs to the destination server in the
1008	   COPY operation.  Suppose that the 192.168.33.0/24 network is a high
1009	   speed network and the destination server decides to transfer the file
1010	   over this network.  If the destination contacts the source server
1011	   from 192.168.33.56 over this network using NFSv4.1, it does the
1012	   following:

1014	   COMPOUND  { PUTROOTFH, LOOKUP "_COPY" ; LOOKUP
1015	      "FvhH1OKbu8VrxvV1erdjvR7N" ; LOOKUP "10.11.78.56"; LOOKUP "_FH" ;
1016	      OPEN "0x12345" ; GETFH }

1018	   Provided that the random number is unpredictable and has been kept
1019	   secret by the parties involved, the source server will therefore know
1020	   that these NFSv4.x operations are being issued by the destination
1021	   server identified in the COPY_NOTIFY.  This random number technique
1022	   only provides initial authentication of the destination server, and
1023	   cannot defend against man-in-the-middle attacks after authentication
1024	   or an eavesdropper that observes the random number on the wire.
1025	   Other secure communication techniques (e.g., IPsec) are necessary to
1026	   block these attacks.

1028	2.4.1.4.  Inter-Server Copy without ONC RPC and RPCSEC_GSSv3

1030	   The same techniques as Section 2.4.1.3, using unique URLs for each
1031	   destination server, can be used for other protocols (e.g., HTTP [13]
1032	   and FTP [14]) as well.

1034	3.  Support for Application IO Hints

1036	3.1.  Introduction

1038	   Applications currently have several options for communicating I/O
1039	   access patterns to the NFS client.  While this can help the NFS
1040	   client optimize I/O and caching for a file, it does not allow the NFS
1041	   server and its exported file system to do likewise.  Therefore, here
1042	   we put forth a proposal for the NFSv4.2 protocol to allow
1043	   applications to communicate their expected behavior to the server.

1045	   By communicating expected access pattern, e.g., sequential or random,
1046	   and data re-use behavior, e.g., data range will be read multiple
1047	   times and should be cached, the server will be able to better
1048	   understand what optimizations it should implement for access to a
1049	   file.  For example, if a application indicates it will never read the
1050	   data more than once, then the file system can avoid polluting the
1051	   data cache and not cache the data.

1053	   The first application that can issue client I/O hints is the
1054	   posix_fadvise operation.  For example, on Linux, when an application
1055	   uses posix_fadvise to specify a file will be read sequentially, Linux
1056	   doubles the readahead buffer size.

1058	   Another instance where applications provide an indication of their
1059	   desired I/O behavior is the use of direct I/O. By specifying direct
1060	   I/O, clients will no longer cache data, but this information is not
1061	   passed to the server, which will continue caching data.

1063	   Application specific NFS clients such as those used by hypervisors
1064	   and databases can also leverage application hints to communicate
1065	   their specialized requirements.

1067	   This section adds a new IO_ADVISE operation to communicate the client
1068	   file access patterns to the NFS server.  The NFS server upon
1069	   receiving a IO_ADVISE operation MAY choose to alter its I/O and
1070	   caching behavior, but is under no obligation to do so.

1072	3.2.  POSIX Requirements

1074	   The first key requirement of the IO_ADVISE operation is to support
1075	   the posix_fadvise function [6], which is supported in Linux and many
1076	   other operating systems.  Examples and guidance on how to use
1077	   posix_fadvise to improve performance can be found here [16].
1078	   posix_fadvise is defined as follows,

1080	      int posix_fadvise(int fd, off_t offset, off_t len, int advice);

1082	   The posix_fadvise() function shall advise the implementation on the
1083	   expected behavior of the application with respect to the data in the
1084	   file associated with the open file descriptor, fd, starting at offset
1085	   and continuing for len bytes.  The specified range need not currently
1086	   exist in the file.  If len is zero, all data following offset is
1087	   specified.  The implementation may use this information to optimize
1088	   handling of the specified data.  The posix_fadvise() function shall
1089	   have no effect on the semantics of other operations on the specified
1090	   data, although it may affect the performance of other operations.

1092	   The advice to be applied to the data is specified by the advice
1093	   parameter and may be one of the following values:

1095	   POSIX_FADV_NORMAL -  Specifies that the application has no advice to
1096	      give on its behavior with respect to the specified data.  It is
1097	      the default characteristic if no advice is given for an open file.

1099	   POSIX_FADV_SEQUENTIAL -  Specifies that the application expects to
1100	      access the specified data sequentially from lower offsets to
1101	      higher offsets.

1103	   POSIX_FADV_RANDOM -  Specifies that the application expects to access
1104	      the specified data in a random order.

1106	   POSIX_FADV_WILLNEED -  Specifies that the application expects to
1107	      access the specified data in the near future.

1109	   POSIX_FADV_DONTNEED -  Specifies that the application expects that it
1110	      will not access the specified data in the near future.

1112	   POSIX_FADV_NOREUSE -  Specifies that the application expects to
1113	      access the specified data once and then not reuse it thereafter.

1115	   Upon successful completion, posix_fadvise() shall return zero;
1116	   otherwise, an error number shall be returned to indicate the error.

1118	3.3.  Additional Requirements

1120	   Many use cases exist for sending application I/O hints to the server
1121	   that cannot utilize the POSIX supported interface.  This is because
1122	   some applications may benefit from additional hints not specified by
1123	   posix_fadvise, and some applications may not use POSIX altogether.

1125	   One use case is "Opportunistic Prefetch", which allows a stateid
1126	   holder to tell the server that it is possible that it will access the
1127	   specified data in the near future.  This is similar to
1128	   POSIX_FADV_WILLNEED, but the client is unsure it will in fact read
1129	   the specified data, so the server should only prefetch the data if it
1130	   can be done at a marginal cost.  For example, when a server receives
1131	   this hint, it could prefetch only the indirect blocks for a file
1132	   instead of all the data.  This would still improve performance if the
1133	   client does read the data, but with less pressure on server memory.

1135	   An example use case for this hint is a database that reads in a
1136	   single record that points to additional records in either other areas
1137	   of the same file or different files located on the same or different
1138	   server.  While it is likely that the application may access the
1139	   additional records, it is far from guaranteed.  Therefore, the
1140	   database may issue an opportunistic prefetch (instead of
1141	   POSIX_FADV_WILLNEED) for the data in the other files pointed to by
1142	   the record.

1144	   Another use case is "Direct I/O", which allows a stated holder to
1145	   inform the server that it does not wish to cache data.  Today, for
1146	   applications that only intend to read data once, the use of direct
1147	   I/O disables client caching, but does not affect server caching.  By
1148	   caching data that will not be re-read, the server is polluting its
1149	   cache and possibly causing useful cached data to be evicted.  By
1150	   informing the server of its expected I/O access, this situation can
1151	   be avoid.  Direct I/O can be used in Linux and AIX via the open()
1152	   O_DIRECT parameter, in Solaris via the directio() function, and in
1153	   Windows via the CreateFile() FILE_FLAG_NO_BUFFERING flag.

1155	   Another use case is "Backward Sequential Read", which allows a stated
1156	   holder to inform the server that it intends to read the specified
1157	   data backwards, i.e., back the end to the beginning.  This is
1158	   different than POSIX_FADV_SEQUENTIAL, whose implied intention was
1159	   that data will be read from beginning to end.  This hint allows
1160	   servers to prefetch data at the end of the range first, and then
1161	   prefetch data sequentially in a backwards manner to the start of the
1162	   data range.  One example of an application that can make use of this
1163	   hint is video editing.

1165	3.4.  Security Considerations

1167	   None.

1169	3.5.  IANA Considerations

1171	   The IO_ADVISE_type4 will be extended through an IANA registry.

1173	4.  Sparse Files

1175	4.1.  Introduction

1177	   A sparse file is a common way of representing a large file without
1178	   having to utilize all of the disk space for it.  Consequently, a
1179	   sparse file uses less physical space than its size indicates.  This
1180	   means the file contains 'holes', byte ranges within the file that
1181	   contain no data.  Most modern file systems support sparse files,
1182	   including most UNIX file systems and NTFS, but notably not Apple's
1183	   HFS+.  Common examples of sparse files include Virtual Machine (VM)
1184	   OS/disk images, database files, log files, and even checkpoint
1185	   recovery files most commonly used by the HPC community.

1187	   If an application reads a hole in a sparse file, the file system must
1188	   return all zeros to the application.  For local data access there is
1189	   little penalty, but with NFS these zeroes must be transferred back to
1190	   the client.  If an application uses the NFS client to read data into
1191	   memory, this wastes time and bandwidth as the application waits for
1192	   the zeroes to be transferred.

1194	   A sparse file is typically created by initializing the file to be all
1195	   zeros - nothing is written to the data in the file, instead the hole
1196	   is recorded in the metadata for the file.  So a 8G disk image might
1197	   be represented initially by a couple hundred bits in the inode and
1198	   nothing on the disk.  If the VM then writes 100M to a file in the
1199	   middle of the image, there would now be two holes represented in the
1200	   metadata and 100M in the data.

1202	   Two new operations INITIALIZE (Section 13.7) and READ_PLUS
1203	   (Section 13.10) are introduced.  INITIALIZE allows for the creation
1204	   of a sparse file and for hole punching.  An application might want to
1205	   zero out a range of the file.  READ_PLUS supports all the features of
1206	   READ but includes an extension to support sparse pattern files
1207	   (Section 6.1.2).  READ_PLUS is guaranteed to perform no worse than
1208	   READ, and can dramatically improve performance with sparse files.
1209	   READ_PLUS does not depend on pNFS protocol features, but can be used
1210	   by pNFS to support sparse files.

1212	4.2.  Terminology

1214	   Regular file:  An object of file type NF4REG or NF4NAMEDATTR.

1216	   Sparse file:  A Regular file that contains one or more Holes.

1218	   Hole:  A byte range within a Sparse file that contains regions of all
1219	      zeroes.  For block-based file systems, this could also be an
1220	      unallocated region of the file.

1222	   Hole Threshold:  The minimum length of a Hole as determined by the
1223	      server.  If a server chooses to define a Hole Threshold, then it
1224	      would not return hole information about holes with a length
1225	      shorter than the Hole Threshold.

1227	5.  Space Reservation

1229	5.1.  Introduction

1231	   This section describes a set of operations that allow applications
1232	   such as hypervisors to reserve space for a file, report the amount of
1233	   actual disk space a file occupies and freeup the backing space of a
1234	   file when it is not required.  In virtualized environments, virtual
1235	   disk files are often stored on NFS mounted volumes.  Since virtual
1236	   disk files represent the hard disks of virtual machines, hypervisors
1237	   often have to guarantee certain properties for the file.

1239	   One such example is space reservation.  When a hypervisor creates a
1240	   virtual disk file, it often tries to preallocate the space for the
1241	   file so that there are no future allocation related errors during the
1242	   operation of the virtual machine.  Such errors prevent a virtual
1243	   machine from continuing execution and result in downtime.

1245	   Currently, in order to achieve such a guarantee, applications zero
1246	   the entire file.  The initial zeroing allocates the backing blocks
1247	   and all subsequent writes are overwrites of already allocated blocks.
1248	   This approach is not only inefficient in terms of the amount of I/O
1249	   done, it is also not guaranteed to work on file systems that are log
1250	   structured or deduplicated.  An efficient way of guaranteeing space
1251	   reservation would be beneficial to such applications.

1253	   If the space_reserved attribute (see Section 11.2.3) is set on a
1254	   file, it is guaranteed that writes that do not grow the file will not
1255	   fail with NFSERR_NOSPC.

1257	   Another useful feature would be the ability to report the number of
1258	   blocks that would be freed when a file is deleted.  Currently, NFS
1259	   reports two size attributes:

1261	   size  The logical file size of the file.

1263	   space_used  The size in bytes that the file occupies on disk

1265	   While these attributes are sufficient for space accounting in
1266	   traditional file systems, they prove to be inadequate in modern file
1267	   systems that support block sharing.  In such file systems, multiple
1268	   inodes can point to a single block with a block reference count to
1269	   guard against premature freeing.  Having a way to tell the number of
1270	   blocks that would be freed if the file was deleted would be useful to
1271	   applications that wish to migrate files when a volume is low on
1272	   space.

1274	   Since virtual disks represent a hard drive in a virtual machine, a
1275	   virtual disk can be viewed as a file system within a file.  Since not
1276	   all blocks within a file system are in use, there is an opportunity
1277	   to reclaim blocks that are no longer in use.  A call to deallocate
1278	   blocks could result in better space efficiency.  Lesser space MAY be
1279	   consumed for backups after block deallocation.

1281	   The following operations and attributes can be used to resolve this
1282	   issues:

1284	   space_reserved  This attribute specifies whether the blocks backing
1285	      the file have been preallocated.

1287	   space_freed  This attribute specifies the space freed when a file is
1288	      deleted, taking block sharing into consideration.

1290	   INITIALIZE  This operation zeroes and/or deallocates the blocks
1291	      backing a region of the file.

1293	   If space_used of a file is interpreted to mean the size in bytes of
1294	   all disk blocks pointed to by the inode of the file, then shared
1295	   blocks get double counted, over-reporting the space utilization.
1296	   This also has the adverse effect that the deletion of a file with
1297	   shared blocks frees up less than space_used bytes.

1299	   On the other hand, if space_used is interpreted to mean the size in
1300	   bytes of those disk blocks unique to the inode of the file, then
1301	   shared blocks are not counted in any file, resulting in under-
1302	   reporting of the space utilization.

1304	   For example, two files A and B have 10 blocks each.  Let 6 of these
1305	   blocks be shared between them.  Thus, the combined space utilized by
1306	   the two files is 14 * BLOCK_SIZE bytes.  In the former case, the
1307	   combined space utilization of the two files would be reported as 20 *
1308	   BLOCK_SIZE.  However, deleting either would only result in 4 *
1309	   BLOCK_SIZE being freed.  Conversely, the latter interpretation would
1310	   report that the space utilization is only 8 * BLOCK_SIZE.

1312	   Adding another size attribute, space_freed (see Section 11.2.4), is
1313	   helpful in solving this problem. space_freed is the number of blocks
1314	   that are allocated to the given file that would be freed on its
1315	   deletion.  In the example, both A and B would report space_freed as 4
1316	   * BLOCK_SIZE and space_used as 10 * BLOCK_SIZE.  If A is deleted, B
1317	   will report space_freed as 10 * BLOCK_SIZE as the deletion of B would
1318	   result in the deallocation of all 10 blocks.

1320	   The addition of this problem doesn't solve the problem of space being
1321	   over-reported.  However, over-reporting is better than under-
1322	   reporting.

1324	6.  Application Data Block Support

1326	   At the OS level, files are contained on disk blocks.  Applications
1327	   are also free to impose structure on the data contained in a file and
1328	   we can define an Application Data Block (ADB) to be such a structure.
1329	   From the application's viewpoint, it only wants to handle ADBs and
1330	   not raw bytes (see [17]).  An ADB is typically comprised of two
1331	   sections: a header and data.  The header describes the
1332	   characteristics of the block and can provide a means to detect
1333	   corruption in the data payload.  The data section is typically
1334	   initialized to all zeros.

1336	   The format of the header is application specific, but there are two
1337	   main components typically encountered:

1339	   1.  An ADB Number (ADBN), which allows the application to determine
1340	       which data block is being referenced.  The ADBN is a logical
1341	       block number and is useful when the client is not storing the
1342	       blocks in contiguous memory.

1344	   2.  Fields to describe the state of the ADB and a means to detect
1345	       block corruption.  For both pieces of data, a useful property is
1346	       that allowed values be unique in that if passed across the
1347	       network, corruption due to translation between big and little
1348	       endian architectures are detectable.  For example, 0xF0DEDEF0 has
1349	       the same bit pattern in both architectures.

1351	   Applications already impose structures on files [17] and detect
1352	   corruption in data blocks [18].  What they are not able to do is
1353	   efficiently transfer and store ADBs.  To initialize a file with ADBs,
1354	   the client must send the full ADB to the server and that must be
1355	   stored on the server.  When the application is initializing a file to
1356	   have the ADB structure, it could compress the ADBs to just the
1357	   information to necessary to later reconstruct the header portion of
1358	   the ADB when the contents are read back.  Using sparse file
1359	   techniques, the disk blocks described by would not be allocated.
1360	   Unlike sparse file techniques, there would be a small cost to store
1361	   the compressed header data.

1363	   In this section, we are going to define a generic framework for an
1364	   ADB, present one approach to detecting corruption in a given ADB
1365	   implementation, and describe the model for how the client and server
1366	   can support efficient initialization of ADBs, reading of ADB holes,
1367	   punching holes in ADBs, and space reservation.

1369	6.1.  Generic Framework

1371	   We want the representation of the ADB to be flexible enough to
1372	   support many different applications.  The most basic approach is no
1373	   imposition of a block at all, which means we are working with the raw
1374	   bytes.  Such an approach would be useful for storing holes, punching
1375	   holes, etc.  In more complex deployments, a server might be
1376	   supporting multiple applications, each with their own definition of
1377	   the ADB.  One might store the ADBN at the start of the block and then
1378	   have a guard pattern to detect corruption [19].  The next might store
1379	   the ADBN at an offset of 100 bytes within the block and have no guard
1380	   pattern at all.  I.e., existing applications might already have well
1381	   defined formats for their data blocks.

1383	   The guard pattern can be used to represent the state of the block, to
1384	   protect against corruption, or both.  Again, it needs to be able to
1385	   be placed anywhere within the ADB.

1387	   We need to be able to represent the starting offset of the block and
1388	   the size of the block.  Note that nothing prevents the application
1389	   from defining different sized blocks in a file.

1391	6.1.1.  Data Block Representation

1393	   struct app_data_block4 {
1394	           offset4         adb_offset;
1395	           length4         adb_block_size;
1396	           length4         adb_block_count;
1397	           length4         adb_reloff_blocknum;
1398	           count4          adb_block_num;
1399	           length4         adb_reloff_pattern;
1400	           opaque          adb_pattern<>;
1401	   };

1403	   The app_data_block4 structure captures the abstraction presented for
1404	   the ADB.  The additional fields present are to allow the transmission
1405	   of adb_block_count ADBs at one time.  We also use adb_block_num to
1406	   convey the ADBN of the first block in the sequence.  Each ADB will
1407	   contain the same adb_pattern string.

1409	   As both adb_block_num and adb_pattern are optional, if either
1410	   adb_reloff_pattern or adb_reloff_blocknum is set to NFS4_UINT64_MAX,
1411	   then the corresponding field is not set in any of the ADB.

1413	6.1.2.  Data Content

1415	   /*
1416	    * Use an enum such that we can extend new types.
1417	    */
1418	   enum data_content4 {
1419	           NFS4_CONTENT_DATA = 0,
1420	           NFS4_CONTENT_APP_BLOCK = 1,
1421	           NFS4_CONTENT_HOLE = 2
1422	   };

1424	   New operations might need to differentiate between wanting to access
1425	   data versus an ADB.  Also, future minor versions might want to
1426	   introduce new data formats.  This enumeration allows that to occur.

1428	6.2.  pNFS Considerations

1430	   While this document does not mandate how sparse ADBs are recorded on
1431	   the server, it does make the assumption that such information is not
1432	   in the file.  I.e., the information is metadata.  As such, the
1433	   INITIALIZE operation is defined to be not supported by the DS - it
1434	   must be issued to the MDS.  But since the client must not assume a
1435	   priori whether a read is sparse or not, the READ_PLUS operation MUST
1436	   be supported by both the DS and the MDS.  I.e., the client might
1437	   impose on the MDS to asynchronously read the data from the DS.

1439	   Furthermore, each DS MUST not report to a client a sparse ADB which
1440	   belongs to another DS.  One implication of this requirement is that
1441	   the app_data_block4's adb_block_size MUST be either be the stripe
1442	   width or the stripe width must be an even multiple of it.  The second
1443	   implication here is that the DS must be able to use the Control
1444	   Protocol to determine from the MDS where the sparse ADBs occur.

1446	6.3.  An Example of Detecting Corruption

1448	   In this section, we define an ADB format in which corruption can be
1449	   detected.  Note that this is just one possible format and means to
1450	   detect corruption.

1452	   Consider a very basic implementation of an operating system's disk
1453	   blocks.  A block is either data or it is an indirect block which
1454	   allows for files to be larger than one block.  It is desired to be
1455	   able to initialize a block.  Lastly, to quickly unlink a file, a
1456	   block can be marked invalid.  The contents remain intact - which
1457	   would enable this OS application to undelete a file.

1459	   The application defines 4k sized data blocks, with an 8 byte block
1460	   counter occurring at offset 0 in the block, and with the guard
1461	   pattern occurring at offset 8 inside the block.  Furthermore, the
1462	   guard pattern can take one of four states:

1464	   0xfeedface -   This is the FREE state and indicates that the ADB
1465	      format has been applied.

1467	   0xcafedead -   This is the DATA state and indicates that real data
1468	      has been written to this block.

1470	   0xe4e5c001 -   This is the INDIRECT state and indicates that the
1471	      block contains block counter numbers that are chained off of this
1472	      block.

1474	   0xba1ed4a3 -   This is the INVALID state and indicates that the block
1475	      contains data whose contents are garbage.

1477	   Finally, it also defines an 8 byte checksum [20] starting at byte 16
1478	   which applies to the remaining contents of the block.  If the state
1479	   is FREE, then that checksum is trivially zero.  As such, the
1480	   application has no need to transfer the checksum implicitly inside
1481	   the ADB - it need not make the transfer layer aware of the fact that
1482	   there is a checksum (see [18] for an example of checksums used to
1483	   detect corruption in application data blocks).

1485	   Corruption in each ADB can be detected thusly:

1487	   o  If the guard pattern is anything other than one of the allowed
1488	      values, including all zeros.

1490	   o  If the guard pattern is FREE and any other byte in the remainder
1491	      of the ADB is anything other than zero.

1493	   o  If the guard pattern is anything other than FREE, then if the
1494	      stored checksum does not match the computed checksum.

1496	   o  If the guard pattern is INDIRECT and one of the stored indirect
1497	      block numbers has a value greater than the number of ADBs in the
1498	      file.

1500	   o  If the guard pattern is INDIRECT and one of the stored indirect
1501	      block numbers is a duplicate of another stored indirect block
1502	      number.

1504	   As can be seen, the application can detect errors based on the
1505	   combination of the guard pattern state and the checksum.  But also,
1506	   the application can detect corruption based on the state and the
1507	   contents of the ADB.  This last point is important in validating the
1508	   minimum amount of data we incorporated into our generic framework.
1509	   I.e., the guard pattern is sufficient in allowing applications to
1510	   design their own corruption detection.

1512	   Finally, it is important to note that none of these corruption checks
1513	   occur in the transport layer.  The server and client components are
1514	   totally unaware of the file format and might report everything as
1515	   being transferred correctly even in the case the application detects
1516	   corruption.

1518	6.4.  Example of READ_PLUS

1520	   The hypothetical application presented in Section 6.3 can be used to
1521	   illustrate how READ_PLUS would return an array of results.  A file is
1522	   created and initialized with 100 4k ADBs in the FREE state:

1524	      INITIALIZE {0, 4k, 100, 0, 0, 8, 0xfeedface}

1526	   Further, assume the application writes a single ADB at 16k, changing
1527	   the guard pattern to 0xcafedead, we would then have in memory:

1529	      0 -> (16k - 1)   : 4k, 4, 0, 0, 8, 0xfeedface
1530	      16k -> (20k - 1) : 00 00 00 05 ca fe de ad XX XX ... XX XX
1531	      20k -> 400k      : 4k, 95, 0, 6, 0xfeedface

1533	   And when the client did a READ_PLUS of 64k at the start of the file,
1534	   it would get back a result of an ADB, some data, and a final ADB:

1536	      ADB {0, 4, 0, 0, 8, 0xfeedface}
1537	      data 4k
1538	      ADB {20k, 4k, 59, 0, 6, 0xfeedface}

1540	6.5.  Zero Filled Holes

1542	   As applications are free to define the structure of an ADB, it is
1543	   trivial to define an ADB which supports zero filled holes.  Such a
1544	   case would encompass the traditional definitions of a sparse file and
1545	   hole punching.  For example, to punch a 64k hole, starting at 100M,
1546	   into an existing file which has no ADB structure:

1548	      INITIALIZE {100M, 64k, 1, NFS4_UINT64_MAX,
1549	                  0, NFS4_UINT64_MAX, 0x0}

1551	7.  Labeled NFS

1553	7.1.  Introduction

1555	   Access control models such as Unix permissions or Access Control
1556	   Lists are commonly referred to as Discretionary Access Control (DAC)
1557	   models.  These systems base their access decisions on user identity
1558	   and resource ownership.  In contrast Mandatory Access Control (MAC)
1559	   models base their access control decisions on the label on the
1560	   subject (usually a process) and the object it wishes to access [7].
1561	   These labels may contain user identity information but usually
1562	   contain additional information.  In DAC systems users are free to
1563	   specify the access rules for resources that they own.  MAC models
1564	   base their security decisions on a system wide policy established by
1565	   an administrator or organization which the users do not have the
1566	   ability to override.  In this section, we add a MAC model to NFSv4.2.

1568	   The first change necessary is to devise a method for transporting and
1569	   storing security label data on NFSv4 file objects.  Security labels
1570	   have several semantics that are met by NFSv4 recommended attributes
1571	   such as the ability to set the label value upon object creation.
1572	   Access control on these attributes are done through a combination of
1573	   two mechanisms.  As with other recommended attributes on file objects
1574	   the usual DAC checks (ACLs and permission bits) will be performed to
1575	   ensure that proper file ownership is enforced.  In addition a MAC
1576	   system MAY be employed on the client, server, or both to enforce
1577	   additional policy on what subjects may modify security label
1578	   information.

1580	   The second change is to provide a method for the server to notify the
1581	   client that the attribute changed on an open file on the server.  If
1582	   the file is closed, then during the open attempt, the client will
1583	   gather the new attribute value.  The server MUST not communicate the
1584	   new value of the attribute, the client MUST query it.  This
1585	   requirement stems from the need for the client to provide sufficient
1586	   access rights to the attribute.

1588	   The final change necessary is a modification to the RPC layer used in
1589	   NFSv4 in the form of a new version of the RPCSEC_GSS [8] framework.
1590	   In order for an NFSv4 server to apply MAC checks it must obtain
1591	   additional information from the client.  Several methods were
1592	   explored for performing this and it was decided that the best
1593	   approach was to incorporate the ability to make security attribute
1594	   assertions through the RPC mechanism.  RPCSECGSSv3 [5] outlines a
1595	   method to assert additional security information such as security
1596	   labels on gss context creation and have that data bound to all RPC
1597	   requests that make use of that context.

1599	7.2.  Definitions

1601	   Label Format Specifier (LFS):  is an identifier used by the client to
1602	      establish the syntactic format of the security label and the
1603	      semantic meaning of its components.  These specifiers exist in a
1604	      registry associated with documents describing the format and
1605	      semantics of the label.

1607	   Label Format Registry:  is the IANA registry containing all
1608	      registered LFS along with references to the documents that
1609	      describe the syntactic format and semantics of the security label.

1611	   Policy Identifier (PI):  is an optional part of the definition of a
1612	      Label Format Specifier which allows for clients and server to
1613	      identify specific security policies.

1615	   Object:  is a passive resource within the system that we wish to be
1616	      protected.  Objects can be entities such as files, directories,
1617	      pipes, sockets, and many other system resources relevant to the
1618	      protection of the system state.

1620	   Subject:  is an active entity usually a process which is requesting
1621	      access to an object.

1623	   MAC-Aware:  is a server which can transmit and store object labels.

1625	   MAC-Functional:  is a client or server which is Labeled NFS enabled.
1626	      Such a system can interpret labels and apply policies based on the
1627	      security system.

1629	   Multi-Level Security (MLS):  is a traditional model where objects are
1630	      given a sensitivity level (Unclassified, Secret, Top Secret, etc)
1631	      and a category set [21].

1633	7.3.  MAC Security Attribute

1635	   MAC models base access decisions on security attributes bound to
1636	   subjects and objects.  This information can range from a user
1637	   identity for an identity based MAC model, sensitivity levels for
1638	   Multi-level security, or a type for Type Enforcement.  These models
1639	   base their decisions on different criteria but the semantics of the
1640	   security attribute remain the same.  The semantics required by the
1641	   security attributes are listed below:

1643	   o  MUST provide flexibility with respect to the MAC model.

1645	   o  MUST provide the ability to atomically set security information
1646	      upon object creation.

1648	   o  MUST provide the ability to enforce access control decisions both
1649	      on the client and the server.

1651	   o  MUST not expose an object to either the client or server name
1652	      space before its security information has been bound to it.

1654	   NFSv4 implements the security attribute as a recommended attribute.
1655	   These attributes have a fixed format and semantics, which conflicts
1656	   with the flexible nature of the security attribute.  To resolve this
1657	   the security attribute consists of two components.  The first
1658	   component is a LFS as defined in [22] to allow for interoperability
1659	   between MAC mechanisms.  The second component is an opaque field
1660	   which is the actual security attribute data.  To allow for various
1661	   MAC models, NFSv4 should be used solely as a transport mechanism for
1662	   the security attribute.  It is the responsibility of the endpoints to
1663	   consume the security attribute and make access decisions based on
1664	   their respective models.  In addition, creation of objects through
1665	   OPEN and CREATE allows for the security attribute to be specified
1666	   upon creation.  By providing an atomic create and set operation for
1667	   the security attribute it is possible to enforce the second and
1668	   fourth requirements.  The recommended attribute FATTR4_SEC_LABEL (see
1669	   Section 11.2.2) will be used to satisfy this requirement.

1671	7.3.1.  Delegations

1673	   In the event that a security attribute is changed on the server while
1674	   a client holds a delegation on the file, both the server and the
1675	   client MUST follow the NFSv4.1 protocol (see Chapter 10 of [2]) with
1676	   respect to attribute changes.  It SHOULD flush all changes back to
1677	   the server and relinquish the delegation.

1679	7.3.2.  Permission Checking

1681	   It is not feasible to enumerate all possible MAC models and even
1682	   levels of protection within a subset of these models.  This means
1683	   that the NFSv4 client and servers cannot be expected to directly make
1684	   access control decisions based on the security attribute.  Instead
1685	   NFSv4 should defer permission checking on this attribute to the host
1686	   system.  These checks are performed in addition to existing DAC and
1687	   ACL checks outlined in the NFSv4 protocol.  Section 7.6 gives a
1688	   specific example of how the security attribute is handled under a
1689	   particular MAC model.

1691	7.3.3.  Object Creation

1693	   When creating files in NFSv4 the OPEN and CREATE operations are used.
1694	   One of the parameters to these operations is an fattr4 structure
1695	   containing the attributes the file is to be created with.  This
1696	   allows NFSv4 to atomically set the security attribute of files upon
1697	   creation.  When a client is MAC-Functional it must always provide the
1698	   initial security attribute upon file creation.  In the event that the
1699	   server is MAC-Functional as well, it should determine by policy
1700	   whether it will accept the attribute from the client or instead make
1701	   the determination itself.  If the client is not MAC-Functional, then
1702	   the MAC-Functional server must decide on a default label.  A more in
1703	   depth explanation can be found in Section 7.6.

1705	7.3.4.  Existing Objects

1707	   Note that under the MAC model, all objects must have labels.
1708	   Therefore, if an existing server is upgraded to include Labeled NFS
1709	   support, then it is the responsibility of the security system to
1710	   define the behavior for existing objects.

1712	7.3.5.  Label Changes

1714	   As per the requirements, when a file's security label is modified,
1715	   the server must notify all clients which have the file opened of the
1716	   change in label.  It does so with CB_ATTR_CHANGED.  There are
1717	   preconditions to making an attribute change imposed by NFSv4 and the
1718	   security system might want to impose others.  In the process of
1719	   meeting these preconditions, the server may chose to either serve the
1720	   request in whole or return NFS4ERR_DELAY to the SETATTR operation.

1722	   If there are open delegations on the file belonging to client other
1723	   than the one making the label change, then the process described in
1724	   Section 7.3.1 must be followed.

1726	   As the server is always presented with the subject label from the
1727	   client, it does not necessarily need to communicate the fact that the
1728	   label has changed to the client.  In the cases where the change
1729	   outright denies the client access, the client will be able to quickly
1730	   determine that there is a new label in effect.  It is in cases where
1731	   the client may share the same object between multiple subjects or a
1732	   security system which is not strictly hierarchical that the
1733	   CB_ATTR_CHANGED callback is very useful.  It allows the server to
1734	   inform the clients that the cached security attribute is now stale.

1736	   Consider a system in which the clients enforce MAC checks and and the
1737	   server has a very simple security system which just stores the
1738	   labels.  In this system, the MAC label check always allows access,
1739	   regardless of the subject label.

1741	   The way in which MAC labels are enforced is by the client.  So if
1742	   client A changes a security label on a file, then the server MUST
1743	   inform all clients that have the file opened that the label has
1744	   changed via CB_ATTR_CHANGED.  Then the clients MUST retrieve the new
1745	   label and MUST enforce access via the new attribute values.

1747	7.4.  pNFS Considerations

1749	   This section examines the issues in deploying Labeled NFS in a pNFS
1750	   community of servers.

1752	7.4.1.  MAC Label Checks

1754	   The new FATTR4_SEC_LABEL attribute is metadata information and as
1755	   such the DS is not aware of the value contained on the MDS.
1756	   Fortunately, the NFSv4.1 protocol [2] already has provisions for
1757	   doing access level checks from the DS to the MDS.  In order for the
1758	   DS to validate the subject label presented by the client, it SHOULD
1759	   utilize this mechanism.

1761	   If a file's FATTR4_SEC_LABEL is changed, then the MDS should utilize
1762	   CB_ATTR_CHANGED to inform the client of that fact.  If the MDS is
1763	   maintaining [[Comment.2: Houston, we seem to have a problem! --TH]]

1765	7.5.  Discovery of Server Labeled NFS Support

1767	   The server can easily determine that a client supports Labeled NFS
1768	   when it queries for the FATTR4_SEC_LABEL label for an object.  Note
1769	   that it cannot assume that the presence of RPCSEC_GSSv3 indicates
1770	   Labeled NFS support.  The client might need to discover which LFS the
1771	   server supports.

1773	   A server which supports Labeled NFS MUST allow a client with any
1774	   subject label to retrieve the FATTR4_SEC_LABEL attribute for the root
1775	   filehandle, ROOTFH.  The following compound must always succeed as
1776	   far as a MAC label check is concerned:

1778	        PUTROOTFH, GETATTR {FATTR4_SEC_LABEL}

1780	   Note that the server might have imposed a security flavor on the root
1781	   that precludes such access.  I.e., if the server requires kerberized
1782	   access and the client presents a compound with AUTH_SYS, then the
1783	   server is allowed to return NFS4ERR_WRONGSEC in this case.  But if
1784	   the client presents a correct security flavor, then the server MUST
1785	   return the FATTR4_SEC_LABEL attribute with the supported LFS filled
1786	   in.

1788	7.6.  MAC Security NFS Modes of Operation

1790	   A system using Labeled NFS may operate in two modes.  The first mode
1791	   provides the most protection and is called "full mode".  In this mode
1792	   both the client and server implement a MAC model allowing each end to
1793	   make an access control decision.  The remaining mode is called the
1794	   "guest mode" and in this mode one end of the connection is not
1795	   implementing a MAC model and thus offers less protection than full
1796	   mode.

1798	7.6.1.  Full Mode

1800	   Full mode environments consist of MAC-Functional NFSv4 servers and
1801	   clients and may be composed of mixed MAC models and policies.  The
1802	   system requires that both the client and server have an opportunity
1803	   to perform an access control check based on all relevant information
1804	   within the network.  The file object security attribute is provided
1805	   using the mechanism described in Section 7.3.  The security attribute
1806	   of the subject making the request is transported at the RPC layer
1807	   using the mechanism described in RPCSECGSSv3 [5].

1809	7.6.1.1.  Initial Labeling and Translation

1811	   The ability to create a file is an action that a MAC model may wish
1812	   to mediate.  The client is given the responsibility to determine the
1813	   initial security attribute to be placed on a file.  This allows the
1814	   client to make a decision as to the acceptable security attributes to
1815	   create a file with before sending the request to the server.  Once
1816	   the server receives the creation request from the client it may
1817	   choose to evaluate if the security attribute is acceptable.

1819	   Security attributes on the client and server may vary based on MAC
1820	   model and policy.  To handle this the security attribute field has an
1821	   LFS component.  This component is a mechanism for the host to
1822	   identify the format and meaning of the opaque portion of the security
1823	   attribute.  A full mode environment may contain hosts operating in
1824	   several different LFSs.  In this case a mechanism for translating the
1825	   opaque portion of the security attribute is needed.  The actual
1826	   translation function will vary based on MAC model and policy and is
1827	   out of the scope of this document.  If a translation is unavailable
1828	   for a given LFS then the request MUST be denied.  Another recourse is
1829	   to allow the host to provide a fallback mapping for unknown security
1830	   attributes.

1832	7.6.1.2.  Policy Enforcement

1834	   In full mode access control decisions are made by both the clients
1835	   and servers.  When a client makes a request it takes the security
1836	   attribute from the requesting process and makes an access control
1837	   decision based on that attribute and the security attribute of the
1838	   object it is trying to access.  If the client denies that access an
1839	   RPC call to the server is never made.  If however the access is
1840	   allowed the client will make a call to the NFS server.

1842	   When the server receives the request from the client it extracts the
1843	   security attribute conveyed in the RPC request.  The server then uses
1844	   this security attribute and the attribute of the object the client is
1845	   trying to access to make an access control decision.  If the server's
1846	   policy allows this access it will fulfill the client's request,
1847	   otherwise it will return NFS4ERR_ACCESS.

1849	   Implementations MAY validate security attributes supplied over the
1850	   network to ensure that they are within a set of attributes permitted
1851	   from a specific peer, and if not, reject them.  Note that a system
1852	   may permit a different set of attributes to be accepted from each
1853	   peer.

1855	7.6.1.3.  Limited Server

1857	   A Limited Server mode (see Section 3.5.2 of [7]) consists of a server
1858	   which is label aware, but does not enforce policies.  Such a server
1859	   will store and retrieve all object labels presented by clients,
1860	   notify the clients of any label changes via CB_ATTR_CHANGED, but will
1861	   not restrict access via the subject label.  Instead, it will expect
1862	   the clients to enforce all such access locally.

1864	7.6.2.  Guest Mode

1866	   Guest mode implies that either the client or the server does not
1867	   handle labels.  If the client is not Labeled NFS aware, then it will
1868	   not offer subject labels to the server.  The server is the only
1869	   entity enforcing policy, and may selectively provide standard NFS
1870	   services to clients based on their authentication credentials and/or
1871	   associated network attributes (e.g., IP address, network interface).
1872	   The level of trust and access extended to a client in this mode is
1873	   configuration-specific.  If the server is not Labeled NFS aware, then
1874	   it will not return object labels to the client.  Clients in this
1875	   environment are may consist of groups implementing different MAC
1876	   model policies.  The system requires that all clients in the
1877	   environment be responsible for access control checks.

1879	7.7.  Security Considerations

1881	   This entire chapter deals with security issues.

1883	   Depending on the level of protection the MAC system offers there may
1884	   be a requirement to tightly bind the security attribute to the data.

1886	   When only one of the client or server enforces labels, it is
1887	   important to realize that the other side is not enforcing MAC
1888	   protections.  Alternate methods might be in use to handle the lack of
1889	   MAC support and care should be taken to identify and mitigate threats
1890	   from possible tampering outside of these methods.

1892	   An example of this is that a server that modifies READDIR or LOOKUP
1893	   results based on the client's subject label might want to always
1894	   construct the same subject label for a client which does not present
1895	   one.  This will prevent a non-Labeled NFS client from mixing entries
1896	   in the directory cache.

1898	8.  Sharing change attribute implementation details with NFSv4 clients

1900	8.1.  Introduction

1902	   Although both the NFSv4 [10] and NFSv4.1 protocol [2], define the
1903	   change attribute as being mandatory to implement, there is little in
1904	   the way of guidance.  The only mandated feature is that the value
1905	   must change whenever the file data or metadata change.

1907	   While this allows for a wide range of implementations, it also leaves
1908	   the client with a conundrum: how does it determine which is the most
1909	   recent value for the change attribute in a case where several RPC
1910	   calls have been issued in parallel?  In other words if two COMPOUNDs,
1911	   both containing WRITE and GETATTR requests for the same file, have
1912	   been issued in parallel, how does the client determine which of the
1913	   two change attribute values returned in the replies to the GETATTR
1914	   requests correspond to the most recent state of the file?  In some
1915	   cases, the only recourse may be to send another COMPOUND containing a
1916	   third GETATTR that is fully serialised with the first two.

1918	   NFSv4.2 avoids this kind of inefficiency by allowing the server to
1919	   share details about how the change attribute is expected to evolve,
1920	   so that the client may immediately determine which, out of the
1921	   several change attribute values returned by the server, is the most
1922	   recent. change_attr_type is defined as a new recommended attribute
1923	   (see Section 11.2.1), and is per file system.

1925	9.  Security Considerations

1927	10.  Error Values

1929	   NFS error numbers are assigned to failed operations within a Compound
1930	   (COMPOUND or CB_COMPOUND) request.  A Compound request contains a
1931	   number of NFS operations that have their results encoded in sequence
1932	   in a Compound reply.  The results of successful operations will
1933	   consist of an NFS4_OK status followed by the encoded results of the
1934	   operation.  If an NFS operation fails, an error status will be
1935	   entered in the reply and the Compound request will be terminated.

1937	10.1.  Error Definitions

1939	                        Protocol Error Definitions

1941	         +--------------------------+--------+------------------+
1942	         | Error                    | Number | Description      |
1943	         +--------------------------+--------+------------------+
1944	         | NFS4ERR_BADLABEL         | 10093  | Section 10.1.3.1 |
1945	         | NFS4ERR_METADATA_NOTSUPP | 10090  | Section 10.1.2.1 |
1946	         | NFS4ERR_OFFLOAD_DENIED   | 10091  | Section 10.1.2.2 |
1947	         | NFS4ERR_PARTNER_NO_AUTH  | 10089  | Section 10.1.2.3 |
1948	         | NFS4ERR_PARTNER_NOTSUPP  | 10088  | Section 10.1.2.4 |
1949	         | NFS4ERR_UNION_NOTSUPP    | 10094  | Section 10.1.1.1 |
1950	         | NFS4ERR_WRONG_LFS        | 10092  | Section 10.1.3.2 |
1951	         +--------------------------+--------+------------------+

1953	                                  Table 1

1955	10.1.1.  General Errors

1957	   This section deals with errors that are applicable to a broad set of
1958	   different purposes.

1960	10.1.1.1.  NFS4ERR_UNION_NOTSUPP (Error Code 10094)

1962	   One of the arguments to the operation is a discriminated union and
1963	   while the server supports the given operation, it does not support
1964	   the selected arm of the discriminated union.  For an example, see
1965	   READ_PLUS (Section 13.10).

1967	10.1.2.  Server to Server Copy Errors

1969	   These errors deal with the interaction between server to server
1970	   copies.

1972	10.1.2.1.  NFS4ERR_METADATA_NOTSUPP (Error Code 10090)

1974	   The destination file cannot support the same metadata as the source
1975	   file.

1977	10.1.2.2.  NFS4ERR_OFFLOAD_DENIED (Error Code 10091)

1979	   The copy offload operation is supported by both the source and the
1980	   destination, but the destination is not allowing it for this file.
1981	   If the client sees this error, it should fall back to the normal copy
1982	   semantics.

1984	10.1.2.3.  NFS4ERR_PARTNER_NO_AUTH (Error Code 10089)

1986	   The source server does not authorize a server-to-server copy offload
1987	   operation.  This may be due to the client's failure to send the
1988	   COPY_NOTIFY operation to the source server, the source server
1989	   receiving a server-to-server copy offload request after the copy
1990	   lease time expired, or for some other permission problem.

1992	10.1.2.4.  NFS4ERR_PARTNER_NOTSUPP (Error Code 10088)

1994	   The remote server does not support the server-to-server copy offload
1995	   protocol.

1997	10.1.3.  Labeled NFS Errors

1999	   These errors are used in Labeled NFS.

2001	10.1.3.1.  NFS4ERR_BADLABEL (Error Code 10093)

2003	   The label specified is invalid in some manner.

2005	10.1.3.2.  NFS4ERR_WRONG_LFS (Error Code 10092)

2007	   The LFS specified in the subject label is not compatible with the LFS
2008	   in the object label.

2010	11.  New File Attributes

2012	11.1.  New RECOMMENDED Attributes - List and Definition References

2014	   The list of new RECOMMENDED attributes appears in Table 2.  The
2015	   meaning of the columns of the table are:

2017	   Name:  The name of the attribute.

2019	   Id:  The number assigned to the attribute.  In the event of conflicts
2020	      between the assigned number and [3], the latter is likely
2021	      authoritative, but should be resolved with Errata to this document
2022	      and/or [3].  See [23] for the Errata process.

2024	   Data Type:  The XDR data type of the attribute.

2026	   Acc:  Access allowed to the attribute.

2028	      R  means read-only (GETATTR may retrieve, SETATTR may not set).

2030	      W  means write-only (SETATTR may set, GETATTR may not retrieve).

2032	      R W   means read/write (GETATTR may retrieve, SETATTR may set).

2034	   Defined in:  The section of this specification that describes the
2035	      attribute.

2037	   +------------------+----+-------------------+-----+----------------+
2038	   | Name             | Id | Data Type         | Acc | Defined in     |
2039	   +------------------+----+-------------------+-----+----------------+
2040	   | change_attr_type | 79 | change_attr_type4 | R   | Section 11.2.1 |
2041	   | sec_label        | 80 | sec_label4        | R W | Section 11.2.2 |
2042	   | space_reserved   | 77 | boolean           | R W | Section 11.2.3 |
2043	   | space_freed      | 78 | length4           | R   | Section 11.2.4 |
2044	   +------------------+----+-------------------+-----+----------------+

2046	                                  Table 2

2048	11.2.  Attribute Definitions
2049	11.2.1.  Attribute 79: change_attr_type

2051	   enum change_attr_type4 {
2052	              NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR         = 0,
2053	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER        = 1,
2054	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2,
2055	              NFS4_CHANGE_TYPE_IS_TIME_METADATA          = 3,
2056	              NFS4_CHANGE_TYPE_IS_UNDEFINED              = 4
2057	   };

2059	   change_attr_type is a per file system attribute which enables the
2060	   NFSv4.2 server to provide additional information about how it expects
2061	   the change attribute value to evolve after the file data, or metadata
2062	   has changed.  While Section 5.4 of [2] discusses per file system
2063	   attributes, it is expected that the value of change_attr_type not
2064	   depend on the value of "homogeneous" and only changes in the event of
2065	   a migration.

2067	   NFS4_CHANGE_TYPE_IS_UNDEFINED:  The change attribute does not take
2068	      values that fit into any of these categories.

2070	   NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR:  The change attribute value MUST
2071	      monotonically increase for every atomic change to the file
2072	      attributes, data, or directory contents.

2074	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER:  The change attribute value MUST
2075	      be incremented by one unit for every atomic change to the file
2076	      attributes, data, or directory contents.  This property is
2077	      preserved when writing to pNFS data servers.

2079	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS:  The change attribute
2080	      value MUST be incremented by one unit for every atomic change to
2081	      the file attributes, data, or directory contents.  In the case
2082	      where the client is writing to pNFS data servers, the number of
2083	      increments is not guaranteed to exactly match the number of
2084	      writes.

2086	   NFS4_CHANGE_TYPE_IS_TIME_METADATA:  The change attribute is
2087	      implemented as suggested in the NFSv4 spec [10] in terms of the
2088	      time_metadata attribute.

2090	   If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR,
2091	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or
2092	   NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at
2093	   the very least that the change attribute is monotonically increasing,
2094	   which is sufficient to resolve the question of which value is the
2095	   most recent.

2097	   If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then
2098	   by inspecting the value of the 'time_delta' attribute it additionally
2099	   has the option of detecting rogue server implementations that use
2100	   time_metadata in violation of the spec.

2102	   If the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it has the
2103	   ability to predict what the resulting change attribute value should
2104	   be after a COMPOUND containing a SETATTR, WRITE, or CREATE.  This
2105	   again allows it to detect changes made in parallel by another client.
2106	   The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits the
2107	   same, but only if the client is not doing pNFS WRITEs.

2109	   Finally, if the server does not support change_attr_type or if
2110	   NFS4_CHANGE_TYPE_IS_UNDEFINED is set, then the server SHOULD make an
2111	   effort to implement the change attribute in terms of the
2112	   time_metadata attribute.

2114	11.2.2.  Attribute 80: sec_label

2116	   typedef uint32_t  policy4;

2118	   struct labelformat_spec4 {
2119	           policy4 lfs_lfs;
2120	           policy4 lfs_pi;
2121	   };

2123	   struct sec_label4 {
2124	           labelformat_spec4       slai_lfs;
2125	           opaque                  slai_data<>;
2126	   };

2128	   The FATTR4_SEC_LABEL contains an array of two components with the
2129	   first component being an LFS.  It serves to provide the receiving end
2130	   with the information necessary to translate the security attribute
2131	   into a form that is usable by the endpoint.  Label Formats assigned
2132	   an LFS may optionally choose to include a Policy Identifier field to
2133	   allow for complex policy deployments.  The LFS and Label Format
2134	   Registry are described in detail in [22].  The translation used to
2135	   interpret the security attribute is not specified as part of the
2136	   protocol as it may depend on various factors.  The second component
2137	   is an opaque section which contains the data of the attribute.  This
2138	   component is dependent on the MAC model to interpret and enforce.

2140	   In particular, it is the responsibility of the LFS specification to
2141	   define a maximum size for the opaque section, slai_data<>.  When
2142	   creating or modifying a label for an object, the client needs to be
2143	   guaranteed that the server will accept a label that is sized
2144	   correctly.  By both client and server being part of a specific MAC
2145	   model, the client will be aware of the size.

2147	11.2.3.  Attribute 77: space_reserved

2149	   The space_reserve attribute is a read/write attribute of type
2150	   boolean.  It is a per file attribute.  When the space_reserved
2151	   attribute is set via SETATTR, the server must ensure that there is
2152	   disk space to accommodate every byte in the file before it can return
2153	   success.  If the server cannot guarantee this, it must return
2154	   NFS4ERR_NOSPC.

2156	   If the client tries to grow a file which has the space_reserved
2157	   attribute set, the server must guarantee that there is disk space to
2158	   accommodate every byte in the file with the new size before it can
2159	   return success.  If the server cannot guarantee this, it must return
2160	   NFS4ERR_NOSPC.

2162	   It is not required that the server allocate the space to the file
2163	   before returning success.  The allocation can be deferred, however,
2164	   it must be guaranteed that it will not fail for lack of space.

2166	   The value of space_reserved can be obtained at any time through
2167	   GETATTR.

2169	   In order to avoid ambiguity, the space_reserve bit cannot be set
2170	   along with the size bit in SETATTR.  Increasing the size of a file
2171	   with space_reserve set will fail if space reservation cannot be
2172	   guaranteed for the new size.  If the file size is decreased, space
2173	   reservation is only guaranteed for the new size and the extra blocks
2174	   backing the file can be released.

2176	11.2.4.  Attribute 78: space_freed

2178	   space_freed gives the number of bytes freed if the file is deleted.
2179	   This attribute is read only and is of type length4.  It is a per file
2180	   attribute.

2182	12.  Operations: REQUIRED, RECOMMENDED, or OPTIONAL

2184	   The following tables summarize the operations of the NFSv4.2 protocol
2185	   and the corresponding designation of REQUIRED, RECOMMENDED, and
2186	   OPTIONAL to implement or either OBSOLETE if implemented or MUST NOT
2187	   implement.  The designation of OBSOLETE if implemented is reserved
2188	   for those operations which are defined in either NFSv4.0 or NFSV4.1,
2189	   can be implemented in NFSv4.2, and are intended to be MUST NOT be
2190	   implemented in NFSv4.3.  The designation of MUST NOT implement is
2191	   reserved for those operations that were defined in either NFSv4.0 or
2192	   NFSV4.1 and MUST NOT be implemented in NFSv4.2.

2194	   For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation
2195	   for operations sent by the client is for the server implementation.
2196	   The client is generally required to implement the operations needed
2197	   for the operating environment for which it serves.  For example, a
2198	   read-only NFSv4.2 client would have no need to implement the WRITE
2199	   operation and is not required to do so.

2201	   The REQUIRED or OPTIONAL designation for callback operations sent by
2202	   the server is for both the client and server.  Generally, the client
2203	   has the option of creating the backchannel and sending the operations
2204	   on the fore channel that will be a catalyst for the server sending
2205	   callback operations.  A partial exception is CB_RECALL_SLOT; the only
2206	   way the client can avoid supporting this operation is by not creating
2207	   a backchannel.

2209	   Since this is a summary of the operations and their designation,
2210	   there are subtleties that are not presented here.  Therefore, if
2211	   there is a question of the requirements of implementation, the
2212	   operation descriptions themselves must be consulted along with other
2213	   relevant explanatory text within this either specification or that of
2214	   NFSv4.1 [2].

2216	   The abbreviations used in the second and third columns of the table
2217	   are defined as follows.

2219	   REQ  REQUIRED to implement

2221	   REC  RECOMMEND to implement

2223	   OPT  OPTIONAL to implement

2225	   OBS  MUST NOT implement

2227	   MNI  MUST NOT implement

2229	   For the NFSv4.2 features that are OPTIONAL, the operations that
2230	   support those features are OPTIONAL, and the server would return
2231	   NFS4ERR_NOTSUPP in response to the client's use of those operations.
2232	   If an OPTIONAL feature is supported, it is possible that a set of
2233	   operations related to the feature become REQUIRED to implement.  The
2234	   third column of the table designates the feature(s) and if the
2235	   operation is REQUIRED or OPTIONAL in the presence of support for the
2236	   feature.

2238	   The OPTIONAL features identified and their abbreviations are as
2239	   follows:

2241	   pNFS  Parallel NFS

2243	   FDELG  File Delegations

2245	   DDELG  Directory Delegations

2247	   COPY  Server Side Copy

2249	   ADB  Application Data Blocks

2251	                                Operations

2253	   +----------------------+--------------------+-----------------------+
2254	   | Operation            | REQ, REC, OPT, or  | Feature (REQ, REC, or |
2255	   |                      | MNI                | OPT)                  |
2256	   +----------------------+--------------------+-----------------------+
2257	   | ACCESS               | REQ                |                       |
2258	   | BACKCHANNEL_CTL      | REQ                |                       |
2259	   | BIND_CONN_TO_SESSION | REQ                |                       |
2260	   | CLOSE                | REQ                |                       |
2261	   | COMMIT               | REQ                |                       |
2262	   | COPY                 | OPT                | COPY (REQ)            |
2263	   | COPY_ABORT           | OPT                | COPY (REQ)            |
2264	   | COPY_NOTIFY          | OPT                | COPY (REQ)            |
2265	   | COPY_REVOKE          | OPT                | COPY (REQ)            |
2266	   | COPY_STATUS          | OPT                | COPY (REQ)            |
2267	   | CREATE               | REQ                |                       |
2268	   | CREATE_SESSION       | REQ                |                       |
2269	   | DELEGPURGE           | OPT                | FDELG (REQ)           |
2270	   | DELEGRETURN          | OPT                | FDELG, DDELG, pNFS    |
2271	   |                      |                    | (REQ)                 |
2272	   | DESTROY_CLIENTID     | REQ                |                       |
2273	   | DESTROY_SESSION      | REQ                |                       |
2274	   | EXCHANGE_ID          | REQ                |                       |
2275	   | FREE_STATEID         | REQ                |                       |
2276	   | GETATTR              | REQ                |                       |
2277	   | GETDEVICEINFO        | OPT                | pNFS (REQ)            |
2278	   | GETDEVICELIST        | OPT                | pNFS (OPT)            |
2279	   | GETFH                | REQ                |                       |
2280	   | INITIALIZE           | OPT                | ADB (REQ)             |
2281	   | GET_DIR_DELEGATION   | OPT                | DDELG (REQ)           |
2282	   | LAYOUTCOMMIT         | OPT                | pNFS (REQ)            |
2283	   | LAYOUTGET            | OPT                | pNFS (REQ)            |
2284	   | LAYOUTRETURN         | OPT                | pNFS (REQ)            |
2285	   | LINK                 | OPT                |                       |
2286	   | LOCK                 | REQ                |                       |
2287	   | LOCKT                | REQ                |                       |
2288	   | LOCKU                | REQ                |                       |
2289	   | LOOKUP               | REQ                |                       |
2290	   | LOOKUPP              | REQ                |                       |
2291	   | NVERIFY              | REQ                |                       |
2292	   | OPEN                 | REQ                |                       |
2293	   | OPENATTR             | OPT                |                       |
2294	   | OPEN_CONFIRM         | MNI                |                       |
2295	   | OPEN_DOWNGRADE       | REQ                |                       |
2296	   | PUTFH                | REQ                |                       |
2297	   | PUTPUBFH             | REQ                |                       |
2298	   | PUTROOTFH            | REQ                |                       |
2299	   | READ                 | OBS                |                       |
2300	   | READDIR              | REQ                |                       |
2301	   | READLINK             | OPT                |                       |
2302	   | READ_PLUS            | OPT                | ADB (REQ)             |
2303	   | RECLAIM_COMPLETE     | REQ                |                       |
2304	   | RELEASE_LOCKOWNER    | MNI                |                       |
2305	   | REMOVE               | REQ                |                       |
2306	   | RENAME               | REQ                |                       |
2307	   | RENEW                | MNI                |                       |
2308	   | RESTOREFH            | REQ                |                       |
2309	   | SAVEFH               | REQ                |                       |
2310	   | SECINFO              | REQ                |                       |
2311	   | SECINFO_NO_NAME      | REC                | pNFS file layout      |
2312	   |                      |                    | (REQ)                 |
2313	   | SEQUENCE             | REQ                |                       |
2314	   | SETATTR              | REQ                |                       |
2315	   | SETCLIENTID          | MNI                |                       |
2316	   | SETCLIENTID_CONFIRM  | MNI                |                       |
2317	   | SET_SSV              | REQ                |                       |
2318	   | TEST_STATEID         | REQ                |                       |
2319	   | VERIFY               | REQ                |                       |
2320	   | WANT_DELEGATION      | OPT                | FDELG (OPT)           |
2321	   | WRITE                | REQ                |                       |
2322	   +----------------------+--------------------+-----------------------+
2323	                            Callback Operations

2325	   +-------------------------+-------------------+---------------------+
2326	   | Operation               | REQ, REC, OPT, or | Feature (REQ, REC,  |
2327	   |                         | MNI               | or OPT)             |
2328	   +-------------------------+-------------------+---------------------+
2329	   | CB_COPY                 | OPT               | COPY (REQ)          |
2330	   | CB_GETATTR              | OPT               | FDELG (REQ)         |
2331	   | CB_LAYOUTRECALL         | OPT               | pNFS (REQ)          |
2332	   | CB_NOTIFY               | OPT               | DDELG (REQ)         |
2333	   | CB_NOTIFY_DEVICEID      | OPT               | pNFS (OPT)          |
2334	   | CB_NOTIFY_LOCK          | OPT               |                     |
2335	   | CB_PUSH_DELEG           | OPT               | FDELG (OPT)         |
2336	   | CB_RECALL               | OPT               | FDELG, DDELG, pNFS  |
2337	   |                         |                   | (REQ)               |
2338	   | CB_RECALL_ANY           | OPT               | FDELG, DDELG, pNFS  |
2339	   |                         |                   | (REQ)               |
2340	   | CB_RECALL_SLOT          | REQ               |                     |
2341	   | CB_RECALLABLE_OBJ_AVAIL | OPT               | DDELG, pNFS (REQ)   |
2342	   | CB_SEQUENCE             | OPT               | FDELG, DDELG, pNFS  |
2343	   |                         |                   | (REQ)               |
2344	   | CB_WANTS_CANCELLED      | OPT               | FDELG, DDELG, pNFS  |
2345	   |                         |                   | (REQ)               |
2346	   +-------------------------+-------------------+---------------------+

2348	13.  NFSv4.2 Operations

2350	13.1.  Operation 59: COPY - Initiate a server-side copy

2352	13.1.1.  ARGUMENT

2354	   const COPY4_GUARDED     = 0x00000001;
2355	   const COPY4_METADATA    = 0x00000002;

2357	   struct COPY4args {
2358	           /* SAVED_FH: source file */
2359	           /* CURRENT_FH: destination file or */
2360	           /*             directory           */
2361	           offset4         ca_src_offset;
2362	           offset4         ca_dst_offset;
2363	           length4         ca_count;
2364	           uint32_t        ca_flags;
2365	           component4      ca_destination;
2366	           netloc4         ca_source_server<>;
2367	   };

2369	13.1.2.  RESULT

2371	   union COPY4res switch (nfsstat4 cr_status) {
2372	           case NFS4_OK:
2373	                   stateid4        cr_callback_id<1>;
2374	           default:
2375	                   length4         cr_bytes_copied;
2376	   };

2378	13.1.3.  DESCRIPTION

2380	   The COPY operation is used for both intra-server and inter-server
2381	   copies.  In both cases, the COPY is always sent from the client to
2382	   the destination server of the file copy.  The COPY operation requests
2383	   that a file be copied from the location specified by the SAVED_FH
2384	   value to the location specified by the combination of CURRENT_FH and
2385	   ca_destination.

2387	   The SAVED_FH must be a regular file.  If SAVED_FH is not a regular
2388	   file, the operation MUST fail and return NFS4ERR_WRONG_TYPE.

2390	   In order to set SAVED_FH to the source file handle, the compound
2391	   procedure requesting the COPY will include a sub-sequence of
2392	   operations such as

2394	      PUTFH source-fh
2395	      SAVEFH

2397	   If the request is for a server-to-server copy, the source-fh is a
2398	   filehandle from the source server and the compound procedure is being
2399	   executed on the destination server.  In this case, the source-fh is a
2400	   foreign filehandle on the server receiving the COPY request.  If
2401	   either PUTFH or SAVEFH checked the validity of the filehandle, the
2402	   operation would likely fail and return NFS4ERR_STALE.

2404	   If a server supports the server-to-server COPY feature, a PUTFH
2405	   followed by a SAVEFH MUST NOT return NFS4ERR_STALE for either
2406	   operation.  These restrictions do not pose substantial difficulties
2407	   for servers.  The CURRENT_FH and SAVED_FH may be validated in the
2408	   context of the operation referencing them and an NFS4ERR_STALE error
2409	   returned for an invalid file handle at that point.

2411	   The CURRENT_FH and ca_destination together specify the destination of
2412	   the copy operation.  If ca_destination is of 0 (zero) length, then
2413	   CURRENT_FH specifies the target file.  In this case, CURRENT_FH MUST
2414	   be a regular file and not a directory.  If ca_destination is not of 0
2415	   (zero) length, the ca_destination argument specifies the file name to
2416	   which the data will be copied within the directory identified by
2417	   CURRENT_FH.  In this case, CURRENT_FH MUST be a directory and not a
2418	   regular file.

2420	   If the file named by ca_destination does not exist and the operation
2421	   completes successfully, the file will be visible in the file system
2422	   namespace.  If the file does not exist and the operation fails, the
2423	   file MAY be visible in the file system namespace depending on when
2424	   the failure occurs and on the implementation of the NFS server
2425	   receiving the COPY operation.  If the ca_destination name cannot be
2426	   created in the destination file system (due to file name
2427	   restrictions, such as case or length), the operation MUST fail.

2429	   The ca_src_offset is the offset within the source file from which the
2430	   data will be read, the ca_dst_offset is the offset within the
2431	   destination file to which the data will be written, and the ca_count
2432	   is the number of bytes that will be copied.  An offset of 0 (zero)
2433	   specifies the start of the file.  A count of 0 (zero) requests that
2434	   all bytes from ca_src_offset through EOF be copied to the
2435	   destination.  If concurrent modifications to the source file overlap
2436	   with the source file region being copied, the data copied may include
2437	   all, some, or none of the modifications.  The client can use standard
2438	   NFS operations (e.g., OPEN with OPEN4_SHARE_DENY_WRITE or mandatory
2439	   byte range locks) to protect against concurrent modifications if the
2440	   client is concerned about this.  If the source file's end of file is
2441	   being modified in parallel with a copy that specifies a count of 0
2442	   (zero) bytes, the amount of data copied is implementation dependent
2443	   (clients may guard against this case by specifying a non-zero count
2444	   value or preventing modification of the source file as mentioned
2445	   above).

2447	   If the source offset or the source offset plus count is greater than
2448	   or equal to the size of the source file, the operation will fail with
2449	   NFS4ERR_INVAL.  The destination offset or destination offset plus
2450	   count may be greater than the size of the destination file.  This
2451	   allows for the client to issue parallel copies to implement
2452	   operations such as "cat file1 file2 file3 file4 > dest".

2454	   If the destination file is created as a result of this command, the
2455	   destination file's size will be equal to the number of bytes
2456	   successfully copied.  If the destination file already existed, the
2457	   destination file's size may increase as a result of this operation
2458	   (e.g. if ca_dst_offset plus ca_count is greater than the
2459	   destination's initial size).

2461	   If the ca_source_server list is specified, then this is an inter-
2462	   server copy operation and the source file is on a remote server.  The
2463	   client is expected to have previously issued a successful COPY_NOTIFY
2464	   request to the remote source server.  The ca_source_server list MUST
2465	   be the same as the COPY_NOTIFY response's cnr_source_server list.  If
2466	   the client includes the entries from the COPY_NOTIFY response's
2467	   cnr_source_server list in the ca_source_server list, the source
2468	   server can indicate a specific copy protocol for the destination
2469	   server to use by returning a URL, which specifies both a protocol
2470	   service and server name.  Server-to-server copy protocol
2471	   considerations are described in Section 2.2.4 and Section 2.4.1.

2473	   The ca_flags argument allows the copy operation to be customized in
2474	   the following ways using the guarded flag (COPY4_GUARDED) and the
2475	   metadata flag (COPY4_METADATA).

2477	   If the guarded flag is set and the destination exists on the server,
2478	   this operation will fail with NFS4ERR_EXIST.

2480	   If the guarded flag is not set and the destination exists on the
2481	   server, the behavior is implementation dependent.

2483	   If the metadata flag is set and the client is requesting a whole file
2484	   copy (i.e., ca_count is 0 (zero)), a subset of the destination file's
2485	   attributes MUST be the same as the source file's corresponding
2486	   attributes and a subset of the destination file's attributes SHOULD
2487	   be the same as the source file's corresponding attributes.  The
2488	   attributes in the MUST and SHOULD copy subsets will be defined for
2489	   each NFS version.

2491	   For NFSv4.2, Table 3 and Table 4 list the REQUIRED and RECOMMENDED
2492	   attributes respectively.  In the "Copy to destination file?" column,
2493	   a "MUST" indicates that the attribute is part of the MUST copy set.
2494	   A "SHOULD" indicates that the attribute is part of the SHOULD copy
2495	   set.  A "no" indicates that the attribute MUST NOT be copied.

2497	                            REQUIRED attributes

2499	          +--------------------+----+---------------------------+
2500	          | Name               | Id | Copy to destination file? |
2501	          +--------------------+----+---------------------------+
2502	          | supported_attrs    | 0  | no                        |
2503	          | type               | 1  | MUST                      |
2504	          | fh_expire_type     | 2  | no                        |
2505	          | change             | 3  | SHOULD                    |
2506	          | size               | 4  | MUST                      |
2507	          | link_support       | 5  | no                        |
2508	          | symlink_support    | 6  | no                        |
2509	          | named_attr         | 7  | no                        |
2510	          | fsid               | 8  | no                        |
2511	          | unique_handles     | 9  | no                        |
2512	          | lease_time         | 10 | no                        |
2513	          | rdattr_error       | 11 | no                        |
2514	          | filehandle         | 19 | no                        |
2515	          | suppattr_exclcreat | 75 | no                        |
2516	          +--------------------+----+---------------------------+

2518	                                  Table 3

2520	                          RECOMMENDED attributes

2522	          +--------------------+----+---------------------------+
2523	          | Name               | Id | Copy to destination file? |
2524	          +--------------------+----+---------------------------+
2525	          | acl                | 12 | MUST                      |
2526	          | aclsupport         | 13 | no                        |
2527	          | archive            | 14 | no                        |
2528	          | cansettime         | 15 | no                        |
2529	          | case_insensitive   | 16 | no                        |
2530	          | case_preserving    | 17 | no                        |
2531	          | change_attr_type   | 79 | no                        |
2532	          | change_policy      | 60 | no                        |
2533	          | chown_restricted   | 18 | MUST                      |
2534	          | dacl               | 58 | MUST                      |
2535	          | dir_notif_delay    | 56 | no                        |
2536	          | dirent_notif_delay | 57 | no                        |
2537	          | fileid             | 20 | no                        |
2538	          | files_avail        | 21 | no                        |
2539	          | files_free         | 22 | no                        |
2540	          | files_total        | 23 | no                        |
2541	          | fs_charset_cap     | 76 | no                        |
2542	          | fs_layout_type     | 62 | no                        |
2543	          | fs_locations       | 24 | no                        |
2544	          | fs_locations_info  | 67 | no                        |
2545	          | fs_status          | 61 | no                        |
2546	          | hidden             | 25 | MUST                      |
2547	          | homogeneous        | 26 | no                        |
2548	          | layout_alignment   | 66 | no                        |
2549	          | layout_blksize     | 65 | no                        |
2550	          | layout_hint        | 63 | no                        |
2551	          | layout_type        | 64 | no                        |
2552	          | maxfilesize        | 27 | no                        |
2553	          | maxlink            | 28 | no                        |
2554	          | maxname            | 29 | no                        |
2555	          | maxread            | 30 | no                        |
2556	          | maxwrite           | 31 | no                        |
2557	          | mdsthreshold       | 68 | no                        |
2558	          | mimetype           | 32 | MUST                      |
2559	          | mode               | 33 | MUST                      |
2560	          | mode_set_masked    | 74 | no                        |
2561	          | mounted_on_fileid  | 55 | no                        |
2562	          | no_trunc           | 34 | no                        |
2563	          | numlinks           | 35 | no                        |
2564	          | owner              | 36 | MUST                      |
2565	          | owner_group        | 37 | MUST                      |
2566	          | quota_avail_hard   | 38 | no                        |
2567	          | quota_avail_soft   | 39 | no                        |
2568	          | quota_used         | 40 | no                        |
2569	          | rawdev             | 41 | no                        |
2570	          | retentevt_get      | 71 | MUST                      |
2571	          | retentevt_set      | 72 | no                        |
2572	          | retention_get      | 69 | MUST                      |
2573	          | retention_hold     | 73 | MUST                      |
2574	          | retention_set      | 70 | no                        |
2575	          | sacl               | 59 | MUST                      |
2576	          | sec_label          | 80 | MUST                      |
2577	          | space_avail        | 42 | no                        |
2578	          | space_free         | 43 | no                        |
2579	          | space_freed        | 78 | no                        |
2580	          | space_reserved     | 77 | MUST                      |
2581	          | space_total        | 44 | no                        |
2582	          | space_used         | 45 | no                        |
2583	          | system             | 46 | MUST                      |
2584	          | time_access        | 47 | MUST                      |
2585	          | time_access_set    | 48 | no                        |
2586	          | time_backup        | 49 | no                        |
2587	          | time_create        | 50 | MUST                      |
2588	          | time_delta         | 51 | no                        |
2589	          | time_metadata      | 52 | SHOULD                    |
2590	          | time_modify        | 53 | MUST                      |
2591	          | time_modify_set    | 54 | no                        |
2592	          +--------------------+----+---------------------------+
2593	                                  Table 4

2595	   [NOTE: The source file's attribute values will take precedence over
2596	   any attribute values inherited by the destination file.]

2598	   In the case of an inter-server copy or an intra-server copy between
2599	   file systems, the attributes supported for the source file and
2600	   destination file could be different.  By definition,the REQUIRED
2601	   attributes will be supported in all cases.  If the metadata flag is
2602	   set and the source file has a RECOMMENDED attribute that is not
2603	   supported for the destination file, the copy MUST fail with
2604	   NFS4ERR_ATTRNOTSUPP.

2606	   Any attribute supported by the destination server that is not set on
2607	   the source file SHOULD be left unset.

2609	   Metadata attributes not exposed via the NFS protocol SHOULD be copied
2610	   to the destination file where appropriate.

2612	   The destination file's named attributes are not duplicated from the
2613	   source file.  After the copy process completes, the client MAY
2614	   attempt to duplicate named attributes using standard NFSv4
2615	   operations.  However, the destination file's named attribute
2616	   capabilities MAY be different from the source file's named attribute
2617	   capabilities.

2619	   If the metadata flag is not set and the client is requesting a whole
2620	   file copy (i.e., ca_count is 0 (zero)), the destination file's
2621	   metadata is implementation dependent.

2623	   If the client is requesting a partial file copy (i.e., ca_count is
2624	   not 0 (zero)), the client SHOULD NOT set the metadata flag and the
2625	   server MUST ignore the metadata flag.

2627	   If the operation does not result in an immediate failure, the server
2628	   will return NFS4_OK, and the CURRENT_FH will remain the destination's
2629	   filehandle.

2631	   If an immediate failure does occur, cr_bytes_copied will be set to
2632	   the number of bytes copied to the destination file before the error
2633	   occurred.  The cr_bytes_copied value indicates the number of bytes
2634	   copied but not which specific bytes have been copied.

2636	   A return of NFS4_OK indicates that either the operation is complete
2637	   or the operation was initiated and a callback will be used to deliver
2638	   the final status of the operation.

2640	   If the cr_callback_id is returned, this indicates that the operation
2641	   was initiated and a CB_COPY callback will deliver the final results
2642	   of the operation.  The cr_callback_id stateid is termed a copy
2643	   stateid in this context.  The server is given the option of returning
2644	   the results in a callback because the data may require a relatively
2645	   long period of time to copy.

2647	   If no cr_callback_id is returned, the operation completed
2648	   synchronously and no callback will be issued by the server.  The
2649	   completion status of the operation is indicated by cr_status.

2651	   If the copy completes successfully, either synchronously or
2652	   asynchronously, the data copied from the source file to the
2653	   destination file MUST appear identical to the NFS client.  However,
2654	   the NFS server's on disk representation of the data in the source
2655	   file and destination file MAY differ.  For example, the NFS server
2656	   might encrypt, compress, deduplicate, or otherwise represent the on
2657	   disk data in the source and destination file differently.

2659	   In the event of a failure the state of the destination file is
2660	   implementation dependent.  The COPY operation may fail for the
2661	   following reasons (this is a partial list).

2663	   o  NFS4ERR_MOVED

2665	   o  NFS4ERR_NOTSUPP

2667	   o  NFS4ERR_PARTNER_NOTSUPP

2669	   o  NFS4ERR_OFFLOAD_DENIED

2671	   o  NFS4ERR_PARTNER_NO_AUTH

2673	   o  NFS4ERR_FBIG

2675	   o  NFS4ERR_NOTDIR

2677	   o  NFS4ERR_WRONG_TYPE

2679	   o  NFS4ERR_ISDIR

2681	   o  NFS4ERR_INVAL

2683	   o  NFS4ERR_DELAY

2685	   o  NFS4ERR_METADATA_NOTSUPP

2687	   o  NFS4ERR_WRONGSEC

2689	13.2.  Operation 60: COPY_ABORT - Cancel a server-side copy

2691	13.2.1.  ARGUMENT

2693	   struct COPY_ABORT4args {
2694	           /* CURRENT_FH: destination file */
2695	           stateid4        caa_stateid;
2696	   };

2698	13.2.2.  RESULT

2700	   struct COPY_ABORT4res {
2701	           nfsstat4        car_status;
2702	   };

2704	13.2.3.  DESCRIPTION

2706	   COPY_ABORT is used for both intra- and inter-server asynchronous
2707	   copies.  The COPY_ABORT operation allows the client to cancel a
2708	   server-side copy operation that it initiated.  This operation is sent
2709	   in a COMPOUND request from the client to the destination server.
2710	   This operation may be used to cancel a copy when the application that
2711	   requested the copy exits before the operation is completed or for
2712	   some other reason.

2714	   The request contains the filehandle and copy stateid cookies that act
2715	   as the context for the previously initiated copy operation.

2717	   The result's car_status field indicates whether the cancel was
2718	   successful or not.  A value of NFS4_OK indicates that the copy
2719	   operation was canceled and no callback will be issued by the server.
2720	   A copy operation that is successfully canceled may result in none,
2721	   some, or all of the data and/or metadata copied.

2723	   If the server supports asynchronous copies, the server is REQUIRED to
2724	   support the COPY_ABORT operation.

2726	   The COPY_ABORT operation may fail for the following reasons (this is
2727	   a partial list):

2729	   o  NFS4ERR_NOTSUPP

2731	   o  NFS4ERR_RETRY

2733	   o  NFS4ERR_COMPLETE_ALREADY
2734	   o  NFS4ERR_SERVERFAULT

2736	13.3.  Operation 61: COPY_NOTIFY - Notify a source server of a future
2737	       copy

2739	13.3.1.  ARGUMENT

2741	   struct COPY_NOTIFY4args {
2742	           /* CURRENT_FH: source file */
2743	           netloc4         cna_destination_server;
2744	   };

2746	13.3.2.  RESULT

2748	   struct COPY_NOTIFY4resok {
2749	           nfstime4        cnr_lease_time;
2750	           netloc4         cnr_source_server<>;
2751	   };

2753	   union COPY_NOTIFY4res switch (nfsstat4 cnr_status) {
2754	           case NFS4_OK:
2755	                   COPY_NOTIFY4resok       resok4;
2756	           default:
2757	                   void;
2758	   };

2760	13.3.3.  DESCRIPTION

2762	   This operation is used for an inter-server copy.  A client sends this
2763	   operation in a COMPOUND request to the source server to authorize a
2764	   destination server identified by cna_destination_server to read the
2765	   file specified by CURRENT_FH on behalf of the given user.

2767	   The cna_destination_server MUST be specified using the netloc4
2768	   network location format.  The server is not required to resolve the
2769	   cna_destination_server address before completing this operation.

2771	   If this operation succeeds, the source server will allow the
2772	   cna_destination_server to copy the specified file on behalf of the
2773	   given user as long as both of the following conditions are met:

2775	   o  The destination server begins reading the source file before the
2776	      cnr_lease_time expires.  If the cnr_lease_time expires while the
2777	      destination server is still reading the source file, the
2778	      destination server is allowed to finish reading the file.

2780	   o  The client has not issued a COPY_REVOKE for the same combination
2781	      of user, filehandle, and destination server.

2783	   The cnr_lease_time is chosen by the source server.  A cnr_lease_time
2784	   of 0 (zero) indicates an infinite lease.  To avoid the need for
2785	   synchronized clocks, copy lease times are granted by the server as a
2786	   time delta.  To renew the copy lease time the client should resend
2787	   the same copy notification request to the source server.

2789	   A successful response will also contain a list of netloc4 network
2790	   location formats called cnr_source_server, on which the source is
2791	   willing to accept connections from the destination.  These might not
2792	   be reachable from the client and might be located on networks to
2793	   which the client has no connection.

2795	   If the client wishes to perform an inter-server copy, the client MUST
2796	   send a COPY_NOTIFY to the source server.  Therefore, the source
2797	   server MUST support COPY_NOTIFY.

2799	   For a copy only involving one server (the source and destination are
2800	   on the same server), this operation is unnecessary.

2802	   The COPY_NOTIFY operation may fail for the following reasons (this is
2803	   a partial list):

2805	   o  NFS4ERR_MOVED

2807	   o  NFS4ERR_NOTSUPP

2809	   o  NFS4ERR_WRONGSEC

2811	13.4.  Operation 62: COPY_REVOKE - Revoke a destination server's copy
2812	       privileges

2814	13.4.1.  ARGUMENT

2816	   struct COPY_REVOKE4args {
2817	           /* CURRENT_FH: source file */
2818	           netloc4         cra_destination_server;
2819	   };

2821	13.4.2.  RESULT

2823	   struct COPY_REVOKE4res {
2824	           nfsstat4        crr_status;
2825	   };

2827	13.4.3.  DESCRIPTION

2829	   This operation is used for an inter-server copy.  A client sends this
2830	   operation in a COMPOUND request to the source server to revoke the
2831	   authorization of a destination server identified by
2832	   cra_destination_server from reading the file specified by CURRENT_FH
2833	   on behalf of given user.  If the cra_destination_server has already
2834	   begun copying the file, a successful return from this operation
2835	   indicates that further access will be prevented.

2837	   The cra_destination_server MUST be specified using the netloc4
2838	   network location format.  The server is not required to resolve the
2839	   cra_destination_server address before completing this operation.

2841	   The client uses COPY_ABORT to inform the destination to stop the
2842	   active transfer and COPY_REVOKE to inform the source to not allow any
2843	   more copy requests from the destination.  The COPY_REVOKE operation
2844	   is also useful in situations in which the source server granted a
2845	   very long or infinite lease on the destination server's ability to
2846	   read the source file and all copy operations on the source file have
2847	   been completed.

2849	   For a copy only involving one server (the source and destination are
2850	   on the same server), this operation is unnecessary.

2852	   If the server supports COPY_NOTIFY, the server is REQUIRED to support
2853	   the COPY_REVOKE operation.

2855	   The COPY_REVOKE operation may fail for the following reasons (this is
2856	   a partial list):

2858	   o  NFS4ERR_MOVED

2860	   o  NFS4ERR_NOTSUPP

2862	13.5.  Operation 63: COPY_STATUS - Poll for status of a server-side copy

2864	13.5.1.  ARGUMENT

2866	   struct COPY_STATUS4args {
2867	           /* CURRENT_FH: destination file */
2868	           stateid4        csa_stateid;
2869	   };

2871	13.5.2.  RESULT

2873	   struct COPY_STATUS4resok {
2874	           length4         csr_bytes_copied;
2875	           nfsstat4        csr_complete<1>;
2876	   };

2878	   union COPY_STATUS4res switch (nfsstat4 csr_status) {
2879	           case NFS4_OK:
2880	                   COPY_STATUS4resok       resok4;
2881	           default:
2882	                   void;
2883	   };

2885	13.5.3.  DESCRIPTION

2887	   COPY_STATUS is used for both intra- and inter-server asynchronous
2888	   copies.  The COPY_STATUS operation allows the client to poll the
2889	   destination server to determine the status of an asynchronous copy
2890	   operation.

2892	   If this operation is successful, the number of bytes copied are
2893	   returned to the client in the csr_bytes_copied field.  The
2894	   csr_bytes_copied value indicates the number of bytes copied but not
2895	   which specific bytes have been copied.

2897	   If the optional csr_complete field is present, the copy has
2898	   completed.  In this case the status value indicates the result of the
2899	   asynchronous copy operation.  In all cases, the server will also
2900	   deliver the final results of the asynchronous copy in a CB_COPY
2901	   operation.

2903	   The failure of this operation does not indicate the result of the
2904	   asynchronous copy in any way.

2906	   If the server supports asynchronous copies, the server is REQUIRED to
2907	   support the COPY_STATUS operation.

2909	   The COPY_STATUS operation may fail for the following reasons (this is
2910	   a partial list):

2912	   o  NFS4ERR_NOTSUPP

2914	   o  NFS4ERR_BAD_STATEID

2916	   o  NFS4ERR_EXPIRED

2918	13.6.  Modification to Operation 42: EXCHANGE_ID - Instantiate Client ID

2920	13.6.1.  ARGUMENT

2922	      /* new */
2923	      const EXCHGID4_FLAG_SUPP_FENCE_OPS      = 0x00000004;

2925	13.6.2.  RESULT

2927	      Unchanged

2929	13.6.3.  MOTIVATION

2931	   Enterprise applications require guarantees that an operation has
2932	   either aborted or completed.  NFSv4.1 provides this guarantee as long
2933	   as the session is alive: simply send a SEQUENCE operation on the same
2934	   slot with a new sequence number, and the successful return of
2935	   SEQUENCE indicates the previous operation has completed.  However, if
2936	   the session is lost, there is no way to know when any in progress
2937	   operations have aborted or completed.  In hindsight, the NFSv4.1
2938	   specification should have mandated that DESTROY_SESSION either abort
2939	   or complete all outstanding operations.

2941	13.6.4.  DESCRIPTION

2943	   A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability
2944	   when it sends an EXCHANGE_ID operation.  The server SHOULD set this
2945	   capability in the EXCHANGE_ID reply whether the client requests it or
2946	   not.  It is the server's return that determines whether this
2947	   capability is in effect.  When it is in effect, the following will
2948	   occur:

2950	   o  The server will not reply to any DESTROY_SESSION invoked with the
2951	      client ID until all operations in progress are completed or
2952	      aborted.

2954	   o  The server will not reply to subsequent EXCHANGE_ID invoked on the
2955	      same client owner with a new verifier until all operations in
2956	      progress on the client ID's session are completed or aborted.

2958	   o  The NFS server SHOULD support client ID trunking, and if it does
2959	      and the EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a
2960	      session ID created on one node of the storage cluster MUST be
2961	      destroyable via DESTROY_SESSION.  In addition, DESTROY_CLIENTID
2962	      and an EXCHANGE_ID with a new verifier affects all sessions
2963	      regardless what node the sessions were created on.

2965	13.7.  Operation 64: INITIALIZE

2967	   This operation can be used to initialize the structure imposed by an
2968	   application onto a file, i.e., ADBs, and to punch a hole into a file.

2970	13.7.1.  ARGUMENT

2972	   /*
2973	    * We use data_content4 in case we wish to
2974	    * extend new types later. Note that we
2975	    * are explicitly disallowing data.
2976	    */
2977	   union initialize_arg4 switch (data_content4 content) {
2978	   case NFS4_CONTENT_APP_BLOCK:
2979	           app_data_block4 ia_adb;
2980	   case NFS4_CONTENT_HOLE:
2981	           data_info4      ia_hole;
2982	   default:
2983	           void;
2984	   };

2986	   struct INITIALIZE4args {
2987	           /* CURRENT_FH: file */
2988	           stateid4        ia_stateid;
2989	           stable_how4     ia_stable;
2990	           initialize_arg4 ia_data<>;
2991	   };

2993	13.7.2.  RESULT

2995	   struct INITIALIZE4resok {
2996	           count4          ir_count;
2997	           stable_how4     ir_committed;
2998	           verifier4       ir_writeverf;
2999	           data_content4   ir_sparse;
3000	   };

3002	   union INITIALIZE4res switch (nfsstat4 status) {
3003	   case NFS4_OK:
3004	           INITIALIZE4resok        resok4;
3005	   default:
3006	           void;
3007	   };

3009	13.7.3.  DESCRIPTION

3011	   Using the data_content4 (Section 6.1.2), INITIALIZE can be used
3012	   either to punch holes or to impose ADB structure on a file.

3014	13.7.3.1.  Hole punching

3016	   Whenever a client wishes to zero the blocks backing a particular
3017	   region in the file, it calls the INITIALIZE operation with the
3018	   current filehandle set to the filehandle of the file in question, and
3019	   the equivalent of start offset and length in bytes of the region set
3020	   in ia_hole.di_offset and ia_hole.di_length respectively.  If the
3021	   ia_hole.di_allocated is set to TRUE, then the blocks will be zeroed
3022	   and if it is set to FALSE, then they will be deallocated.  All
3023	   further reads to this region MUST return zeros until overwritten.
3024	   The filehandle specified must be that of a regular file.

3026	   Situations may arise where di_offset and/or di_offset + di_length
3027	   will not be aligned to a boundary that the server does allocations/
3028	   deallocations in.  For most file systems, this is the block size of
3029	   the file system.  In such a case, the server can deallocate as many
3030	   bytes as it can in the region.  The blocks that cannot be deallocated
3031	   MUST be zeroed.  Except for the block deallocation and maximum hole
3032	   punching capability, a INITIALIZE operation is to be treated similar
3033	   to a write of zeroes.

3035	   The server is not required to complete deallocating the blocks
3036	   specified in the operation before returning.  It is acceptable to
3037	   have the deallocation be deferred.  In fact, INITIALIZE is merely a
3038	   hint; it is valid for a server to return success without ever doing
3039	   anything towards deallocating the blocks backing the region
3040	   specified.  However, any future reads to the region MUST return
3041	   zeroes.

3043	   If used to hole punch, INITIALIZE will result in the space_used
3044	   attribute being decreased by the number of bytes that were
3045	   deallocated.  The space_freed attribute may or may not decrease,
3046	   depending on the support and whether the blocks backing the specified
3047	   range were shared or not.  The size attribute will remain unchanged.

3049	   The INITIALIZE operation MUST NOT change the space reservation
3050	   guarantee of the file.  While the server can deallocate the blocks
3051	   specified by di_offset and di_length, future writes to this region
3052	   MUST NOT fail with NFSERR_NOSPC.

3054	   The INITIALIZE operation may fail for the following reasons (this is
3055	   a partial list):

3057	   NFS4ERR_NOTSUPP  The Hole punch operations are not supported by the
3058	      NFS server receiving this request.

3060	   NFS4ERR_DIR  The current filehandle is of type NF4DIR.

3062	   NFS4ERR_SYMLINK  The current filehandle is of type NF4LNK.

3064	   NFS4ERR_WRONG_TYPE  The current filehandle does not designate an
3065	      ordinary file.

3067	13.7.3.2.  ADBs

3069	   If the server supports ADBs, then it MUST support the
3070	   NFS4_CONTENT_APP_BLOCK arm of the INITIALIZE operation.  The server
3071	   has no concept of the structure imposed by the application.  It is
3072	   only when the application writes to a section of the file does order
3073	   get imposed.  In order to detect corruption even before the
3074	   application utilizes the file, the application will want to
3075	   initialize a range of ADBs using INITIALIZE.

3077	   For ADBs, when the client invokes the INITIALIZE operation, it has
3078	   two desired results:

3080	   1.  The structure described by the app_data_block4 be imposed on the
3081	       file.

3083	   2.  The contents described by the app_data_block4 be sparse.

3085	   If the server supports the INITIALIZE operation, it still might not
3086	   support sparse files.  So if it receives the INITIALIZE operation,
3087	   then it MUST populate the contents of the file with the initialized
3088	   ADBs.

3090	   If the data was already initialized, there are two interesting
3091	   scenarios:

3093	   1.  The data blocks are allocated.

3095	   2.  Initializing in the middle of an existing ADB.

3097	   If the data blocks were already allocated, then the INITIALIZE is a
3098	   hole punch operation.  If INITIALIZE supports sparse files, then the
3099	   data blocks are to be deallocated.  If not, then the data blocks are
3100	   to be rewritten in the indicated ADB format.

3102	   Since the server has no knowledge of ADBs, it should not report
3103	   misaligned creation of ADBs.  Even while it can detect them, it
3104	   cannot disallow them, as the application might be in the process of
3105	   changing the size of the ADBs.  Thus the server must be prepared to
3106	   handle an INITIALIZE into an existing ADB.

3108	   This document does not mandate the manner in which the server stores
3109	   ADBs sparsely for a file.  It does assume that if ADBs are stored
3110	   sparsely, then the server can detect when an INITIALIZE arrives that
3111	   will force a new ADB to start inside an existing ADB.  For example,
3112	   assume that ADBi has a adb_block_size of 4k and that an INITIALIZE
3113	   starts 1k inside ADBi.  The server should [[Comment.3: Need to flesh
3114	   this out. --TH]]

3116	13.8.  Operation 67: IO_ADVISE - Application I/O access pattern hints

3118	   This section introduces a new operation, named IO_ADVISE, which
3119	   allows NFS clients to communicate application I/O access pattern
3120	   hints to the NFS server.  This new operation will allow hints to be
3121	   sent to the server when applications use posix_fadvise, direct I/O,
3122	   or at any other point at which the client finds useful.

3124	13.8.1.  ARGUMENT

3126	   enum IO_ADVISE_type4 {
3127	           IO_ADVISE4_NORMAL                       = 0,
3128	           IO_ADVISE4_SEQUENTIAL                   = 1,
3129	           IO_ADVISE4_SEQUENTIAL_BACKWARDS         = 2,
3130	           IO_ADVISE4_RANDOM                       = 3,
3131	           IO_ADVISE4_WILLNEED                     = 4,
3132	           IO_ADVISE4_WILLNEED_OPPORTUNISTIC       = 5,
3133	           IO_ADVISE4_DONTNEED                     = 6,
3134	           IO_ADVISE4_NOREUSE                      = 7,
3135	           IO_ADVISE4_READ                         = 8,
3136	           IO_ADVISE4_WRITE                        = 9,
3137	           IO_ADVISE4_INIT_PROXIMITY               = 10
3138	   };

3140	   struct IO_ADVISE4args {
3141	           /* CURRENT_FH: file */
3142	           stateid4        iar_stateid;
3143	           offset4         iar_offset;
3144	           length4         iar_count;
3145	           bitmap4         iar_hints;
3146	   };

3148	13.8.2.  RESULT

3150	   struct IO_ADVISE4resok {
3151	           bitmap4 ior_hints;
3152	   };

3154	   union IO_ADVISE4res switch (nfsstat4 _status) {
3155	   case NFS4_OK:
3156	           IO_ADVISE4resok resok4;
3157	   default:
3158	           void;
3159	   };

3161	13.8.3.  DESCRIPTION

3163	   The IO_ADVISE operation sends an I/O access pattern hint to the
3164	   server for the owner of stated for a given byte range specified by
3165	   iar_offset and iar_count.  The byte range specified by iar_offset and
3166	   iar_count need not currently exist in the file, but the iar_hints
3167	   will apply to the byte range when it does exist.  If iar_count is 0,
3168	   all data following iar_offset is specified.  The server MAY ignore
3169	   the advice.

3171	   The following are the possible hints:

3173	   IO_ADVISE4_NORMAL  Specifies that the application has no advice to
3174	      give on its behavior with respect to the specified data.  It is
3175	      the default characteristic if no advice is given.

3177	   IO_ADVISE4_SEQUENTIAL  Specifies that the stated holder expects to
3178	      access the specified data sequentially from lower offsets to
3179	      higher offsets.

3181	   IO_ADVISE4_SEQUENTIAL BACKWARDS  Specifies that the stated holder
3182	      expects to access the specified data sequentially from higher
3183	      offsets to lower offsets.

3185	   IO_ADVISE4_RANDOM  Specifies that the stated holder expects to access
3186	      the specified data in a random order.

3188	   IO_ADVISE4_WILLNEED  Specifies that the stated holder expects to
3189	      access the specified data in the near future.

3191	   IO_ADVISE4_WILLNEED_OPPORTUNISTIC  Specifies that the stated holder
3192	      expects to possibly access the data in the near future.  This is a
3193	      speculative hint, and therefore the server should prefetch data or
3194	      indirect blocks only if it can be done at a marginal cost.

3196	   IO_ADVISE_DONTNEED  Specifies that the stated holder expects that it
3197	      will not access the specified data in the near future.

3199	   IO_ADVISE_NOREUSE  Specifies that the stated holder expects to access
3200	      the specified data once and then not reuse it thereafter.

3202	   IO_ADVISE4_READ  Specifies that the stated holder expects to read the
3203	      specified data in the near future.

3205	   IO_ADVISE4_WRITE  Specifies that the stated holder expects to write
3206	      the specified data in the near future.

3208	   IO_ADVISE4_INIT_PROXIMITY  The client has recently accessed the byte
3209	      range in its own cache.  This informs the server that the data in
3210	      the byte range remains important to the client.  When the server
3211	      reaches resource exhaustion, knowing which data is more important
3212	      allows the server to make better choices about which data to, for
3213	      example purge from a cache, or move to secondary storage.  It also
3214	      informs the server which delegations are more important, since if
3215	      delegations are working correctly, once delegated to a client, a
3216	      server might never receive another I/O request for the file.

3218	   The server will return success if the operation is properly formed,
3219	   otherwise the server will return an error.  The server MUST NOT
3220	   return an error if it does not recognize or does not support the
3221	   requested advice.  This is also true even if the client sends
3222	   contradictory hints to the server, e.g., IO_ADVISE4_SEQUENTIAL and
3223	   IO_ADVISE4_RANDOM in a single IO_ADVISE operation.  In this case, the
3224	   server MUST return success and a ior_hints value that indicates the
3225	   hint it intends to optimize.  For contradictory hints, this may mean
3226	   simply returning IO_ADVISE4_NORMAL for example.

3228	   The ior_hints returned by the server is primarily for debugging
3229	   purposes since the server is under no obligation to carry out the
3230	   hints that it describes in the ior_hints result.  In addition, while
3231	   the server may have intended to implement the hints returned in
3232	   ior_hints, as time progresses, the server may need to change its
3233	   handling of a given file due to several reasons including, but not
3234	   limited to, memory pressure, additional IO_ADVISE hints sent by other
3235	   clients, and heuristically detected file access patterns.

3237	   The server MAY return different advice than what the client
3238	   requested.  If it does, then this might be due to one of several
3239	   conditions, including, but not limited to another client advising of
3240	   a different I/O access pattern; a different I/O access pattern from
3241	   another client that that the server has heuristically detected; or
3242	   the server is not able to support the requested I/O access pattern,
3243	   perhaps due to a temporary resource limitation.

3245	   Each issuance of the IO_ADVISE operation overrides all previous
3246	   issuances of IO_ADVISE for a given byte range.  This effectively
3247	   follows a strategy of last hint wins for a given stated and byte
3248	   range.

3250	   Clients should assume that hints included in an IO_ADVISE operation
3251	   will be forgotten once the file is closed.

3253	13.8.4.  IMPLEMENTATION

3255	   The NFS client may choose to issue an IO_ADVISE operation to the
3256	   server in several different instances.

3258	   The most obvious is in direct response to an application's execution
3259	   of posix_fadvise.  In this case, IO_ADVISE4_WRITE and IO_ADVISE4_READ
3260	   may be set based upon the type of file access specified when the file
3261	   was opened.

3263	   Another useful point would be when an application indicates it is
3264	   using direct I/O. Direct I/O may be specified at file open, in which
3265	   case a IO_ADVISE may be included in the same compound as the OPEN
3266	   operation with the IO_ADVISE4_NOREUSE flag set.  Direct I/O may also
3267	   be specified separately, in which case a IO_ADVISE operation can be
3268	   sent to the server separately.  As above, IO_ADVISE4_WRITE and
3269	   IO_ADVISE4_READ may be set based upon the type of file access
3270	   specified when the file was opened.

3272	13.8.5.  pNFS File Layout Data Type Considerations

3274	   The IO_ADVISE considerations for pNFS are very similar to the COMMIT
3275	   considerations for pNFS.  That is, as with COMMIT, some NFS server
3276	   implementations prefer IO_ADVISE be done on the DS, and some prefer
3277	   it be done on the MDS.

3279	   So for the file's layout type, it is proposed that NFSv4.2 include an
3280	   additional hint NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on
3281	   NFSv4.2 or higher.  Any file's layout obtained with NFSv4.1 MUST NOT
3282	   have NFL42_UFLG_IO_ADVISE_THRU_MDS set.  Any file's layout obtained
3283	   with NFSv4.2 MAY have NFL42_UFLG_IO_ADVISE_THRU_MDS set.  If the
3284	   client does not implement IO_ADVISE, then it MUST ignore
3285	   NFL42_UFLG_IO_ADVISE_THRU_MDS.

3287	   If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, then if the client
3288	   implements IO_ADVISE, then if it wants the DS to honor IO_ADVISE, the
3289	   client MUST send the operation to the MDS, and the server will
3290	   communicate the advice back each DS.  If the client sends IO_ADVISE
3291	   to the DS, then the server MAY return NFS4ERR_NOTSUPP.

3293	   If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then this indicates to
3294	   client that if wants to inform the server via IO_ADVISE of the
3295	   client's intended use of the file, then the client SHOULD send an
3296	   IO_ADVISE to each DS.  While the client MAY always send IO_ADVISE to
3297	   the MDS, if the server has not set NFL42_UFLG_IO_ADVISE_THRU_MDS, the
3298	   client should expect that such an IO_ADVISE is futile.  Note that a
3299	   client SHOULD use the same set of arguments on each IO_ADVISE sent to
3300	   a DS for the same open file reference.

3302	   The server is not required to support different advice for different
3303	   DS's with the same open file reference.

3305	13.8.5.1.  Dense and Sparse Packing Considerations

3307	   The IO_ADVISE operation MUST use the iar_offset and byte range as
3308	   dictated by the presence or absence of NFL4_UFLG_DENSE.

3310	   E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the DS
3311	   for iar_offset 0 really means iar_offset 10000 in the logical file,
3312	   then an IO_ADVISE for iar_offset 0 means iar_offset 10000.

3314	   E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the DS
3315	   for iar_offset 0 really means iar_offset 0 in the logical file, then
3316	   an IO_ADVISE for iar_offset 0 means iar_offset 0 in the logical file.

3318	   E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes
3319	   and the stripe count is 10, and the dense DS file is serving
3320	   iar_offset 0.  A READ or WRITE to the DS for iar_offsets 0, 1000,
3321	   2000, and 3000, really mean iar_offsets 10000, 20000, 30000, and
3322	   40000 (implying a stripe count of 10 and a stripe unit of 1000), then
3323	   an IO_ADVISE sent to the same DS with an iar_offset of 500, and a
3324	   iar_count of 3000 means that the IO_ADVISE applies to these byte
3325	   ranges of the dense DS file:

3327	     - 500 to 999
3328	     - 1000 to 1999
3329	     - 2000 to 2999
3330	     - 3000 to 3499

3332	   I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE.

3334	   It also applies to these byte ranges of the logical file:

3336	     - 10500 to 10999 (500 bytes)
3337	     - 20000 to 20999 (1000 bytes)
3338	     - 30000 to 30999 (1000 bytes)
3339	     - 40000 to 40499 (500 bytes)
3340	     (total            3000 bytes)

3342	   E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the
3343	   stripe count is 4, and the sparse DS file is serving iar_offset 0.
3344	   Then a READ or WRITE to the DS for iar_offsets 0, 1000, 2000, and
3345	   3000, really mean iar_offsets 0, 1000, 2000, and 3000 in the logical
3346	   file, keeping in mind that on the DS file,. byte ranges 250 to 999,
3347	   1250 to 1999, 2250 to 2999, and 3250 to 3999 are not accessible.
3348	   Then an IO_ADVISE sent to the same DS with an iar_offset of 500, and
3349	   a iar_count of 3000 means that the IO_ADVISE applies to these byte
3350	   ranges of the logical file and the sparse DS file:

3352	     - 500 to 999 (500 bytes)   - no effect
3353	     - 1000 to 1249 (250 bytes) - effective
3354	     - 1250 to 1999 (750 bytes) - no effect
3355	     - 2000 to 2249 (250 bytes) - effective
3356	     - 2250 to 2999 (750 bytes) - no effect
3357	     - 3000 to 3249 (250 bytes) - effective
3358	     - 3250 to 3499 (250 bytes) - no effect
3359	     (subtotal      2250 bytes) - no effect
3360	     (subtotal       750 bytes) - effective
3361	     (grand total   3000 bytes) - no effect + effective

3363	   If neither of the flags NFL42_UFLG_IO_ADVISE_THRU_MDS and
3364	   NFL4_UFLG_DENSE are set in the layout, then any IO_ADVISE request
3365	   sent to the data server with a byte range that overlaps stripe unit
3366	   that the data server does not serve MUST NOT result in the status
3367	   NFS4ERR_PNFS_IO_HOLE.  Instead, the response SHOULD be successful and
3368	   if the server applies IO_ADVISE hints on any stripe units that
3369	   overlap with the specified range, those hints SHOULD be indicated in
3370	   the response.

3372	13.8.6.  Number of Supported File Segments

3374	   In theory IO_ADVISE allows a client and server to support multiple
3375	   file segments, meaning that different, possibly overlapping, byte
3376	   ranges of the same open file reference will support different hints.
3377	   This is not practical, and in general the server will support just
3378	   one set of hints, and these will apply to the entire file.  However,
3379	   there are some hints that very ephemeral, and are essentially amount
3380	   to one time instructions to the NFS server, which will be forgotten
3381	   momentarily after IO_ADVISE is executed.

3383	   The following hints will always apply to the entire file, regardless
3384	   of the specified byte range:

3386	   o  IO_ADVISE4_NORMAL

3388	   o  IO_ADVISE4_SEQUENTIAL

3390	   o  IO_ADVISE4_SEQUENTIAL_BACKWARDS

3392	   o  IO_ADVISE4_RANDOM

3394	   The following hints will always apply to specified byte range, and
3395	   will treated as one time instructions:

3397	   o  IO_ADVISE4_WILLNEED

3399	   o  IO_ADVISE4_WILLNEED_OPPORTUNISTIC

3401	   o  IO_ADVISE4_DONTNEED

3403	   o  IO_ADVISE4_NOREUSE

3405	   The following hints are modifiers to all other hints, and will apply
3406	   to the entire file and/or to a one time instruction on the specified
3407	   byte range:

3409	   o  IO_ADVISE4_READ

3411	   o  IO_ADVISE4_WRITE

3413	13.9.  Changes to Operation 51: LAYOUTRETURN

3415	13.9.1.  Introduction

3417	   In the pNFS description provided in [2], the client is not capable to
3418	   relay an error code from the DS to the MDS.  In the specification of
3419	   the Objects-Based Layout protocol [9], use is made of the opaque
3420	   lrf_body field of the LAYOUTRETURN argument to do such a relaying of
3421	   error codes.  In this section, we define a new data structure to
3422	   enable the passing of error codes back to the MDS and provide some
3423	   guidelines on what both the client and MDS should expect in such
3424	   circumstances.

3426	   There are two broad classes of errors, transient and persistent.  The
3427	   client SHOULD strive to only use this new mechanism to report
3428	   persistent errors.  It MUST be able to deal with transient issues by
3429	   itself.  Also, while the client might consider an issue to be
3430	   persistent, it MUST be prepared for the MDS to consider such issues
3431	   to be transient.  A prime example of this is if the MDS fences off a
3432	   client from either a stateid or a filehandle.  The client will get an
3433	   error from the DS and might relay either NFS4ERR_ACCESS or
3434	   NFS4ERR_BAD_STATEID back to the MDS, with the belief that this is a
3435	   hard error.  If the MDS is informed by the client that there is an
3436	   error, it can safely ignore that.  For it, the mission is
3437	   accomplished in that the client has returned a layout that the MDS
3438	   had most likley recalled.

3440	   The client might also need to inform the MDS that it cannot reach one
3441	   or more of the DSes.  While the MDS can detect the connectivity of
3442	   both of these paths:

3444	   o  MDS to DS

3446	   o  MDS to client

3448	   it cannot determine if the client and DS path is working.  As with
3449	   the case of the DS passing errors to the client, it must be prepared
3450	   for the MDS to consider such outages as being transistory.

3452	   The existing LAYOUTRETURN operation is extended by introducing a new
3453	   data structure to report errors, layoutreturn_device_error4.  Also,
3454	   layoutreturn_device_error4 is introduced to enable an array of errors
3455	   to be reported.

3457	13.9.2.  ARGUMENT

3459	   The ARGUMENT specification of the LAYOUTRETURN operation in section
3460	   18.44.1 of [2] is augmented by the following XDR code [24]:

3462	   struct layoutreturn_device_error4 {
3463	           deviceid4       lrde_deviceid;
3464	           nfsstat4        lrde_status;
3465	           nfs_opnum4      lrde_opnum;
3466	   };

3468	   struct layoutreturn_error_report4 {
3469	           layoutreturn_device_error4      lrer_errors<>;
3470	   };

3472	13.9.3.  RESULT

3474	   The RESULT of the LAYOUTRETURN operation is unchanged; see section
3475	   18.44.2 of [2].

3477	13.9.4.  DESCRIPTION

3479	   The following text is added to the end of the LAYOUTRETURN operation
3480	   DESCRIPTION in section 18.44.3 of [2].

3482	   When a client uses LAYOUTRETURN with a type of LAYOUTRETURN4_FILE,
3483	   then if the lrf_body field is NULL, it indicates to the MDS that the
3484	   client experienced no errors.  If lrf_body is non-NULL, then the
3485	   field references error information which is layout type specific.
3486	   I.e., the Objects-Based Layout protocol can continue to utilize
3487	   lrf_body as specified in [9].  For both Files-Based and Block-Based
3488	   Layouts, the field references a layoutreturn_device_error4, which
3489	   contains an array of layoutreturn_device_error4.

3491	   Each individual layoutreturn_device_error4 descibes a single error
3492	   associated with a DS, which is identfied via lrde_deviceid.  The
3493	   operation which returned the error is identified via lrde_opnum.
3494	   Finally the NFS error value (nfsstat4) encountered is provided via
3495	   lrde_status and may consist of the following error codes:

3497	   NFS4ERR_NXIO:  The client was unable to establish any communication
3498	      with the DS.

3500	   NFS4ERR_*:  The client was able to establish communication with the
3501	      DS and is returning one of the allowed error codes for the
3502	      operation denoted by lrde_opnum.

3504	13.9.5.  IMPLEMENTATION

3506	   The following text is added to the end of the LAYOUTRETURN operation
3507	   IMPLEMENTATION in section 18.4.4 of [2].

3509	   Clients are expected to tolerate transient storage device errors, and
3510	   hence clients SHOULD NOT use the LAYOUTRETURN error handling for
3511	   device access problems that may be transient.  The methods by which a
3512	   client decides whether a device access problem is transient vs.
3513	   persistent are implementation-specific, but may include retrying I/Os
3514	   to a data server under appropriate conditions.

3516	   When an I/O fails to a storage device, the client SHOULD retry the
3517	   failed I/O via the MDS.  In this situation, before retrying the I/O,
3518	   the client SHOULD return the layout, or the affected portion thereof,
3519	   and SHOULD indicate which storage device or devices was problematic.
3520	   The client needs to do this when the DS is being unresponsive in
3521	   order to fence off any failed write attempts, and ensure that they do
3522	   not end up overwriting any later data being written through the MDS.
3523	   If the client does not do this, the MDS MAY issue a layout recall
3524	   callback in order to perform the retried I/O.

3526	   The client needs to be cognizant that since this error handling is
3527	   optional in the MDS, the MDS may silently ignore this functionality.
3528	   Also, as the MDS may consider some issues the client reports to be
3529	   expected (see Section 13.9.1), the client might find it difficult to
3530	   detect a MDS which has not implemented error handling via
3531	   LAYOUTRETURN.

3533	   If an MDS is aware that a storage device is proving problematic to a
3534	   client, the MDS SHOULD NOT include that storage device in any pNFS
3535	   layouts sent to that client.  If the MDS is aware that a storage
3536	   device is affecting many clients, then the MDS SHOULD NOT include
3537	   that storage device in any pNFS layouts sent out.  If a client asks
3538	   for a new layout for the file from the MDS, it MUST be prepared for
3539	   the MDS to return that storage device in the layout.  The MDS might
3540	   not have any choice in using the storage device, i.e., there might
3541	   only be one possible layout for the system.  Also, in the case of
3542	   existing files, the MDS might have no choice in which storage devices
3543	   to hand out to clients.

3545	   The MDS is not required to indefinitely retain per-client storage
3546	   device error information.  An MDS is also not required to
3547	   automatically reinstate use of a previously problematic storage
3548	   device; administrative intervention may be required instead.

3550	13.10.  Operation 65: READ_PLUS

3552	   READ_PLUS is a new variant of the NFSv4.1 READ operation [2].
3553	   Besides being able to support all of the data semantics of READ, it
3554	   can also be used by the server to return either holes or ADBs to the
3555	   client.  For holes, READ_PLUS extends the response to avoid returning
3556	   data for portions of the file which are either initialized and
3557	   contain no backing store or if the result would appear to be so.
3558	   I.e., if the result was a data block composed entirely of zeros, then
3559	   it is easier to return a hole.  Returning data blocks of unitialized
3560	   data wastes computational and network resources, thus reducing
3561	   performance.  For ADBs, READ_PLUS is used to return the metadata
3562	   describing the portions of the file which are either initialized and
3563	   contain no backing store.

3565	   If the client sends a READ operation, it is explicitly stating that
3566	   it is neither supporting sparse files nor ADBs.  So if a READ occurs
3567	   on a sparse ADB or file, then the server must expand such data to be
3568	   raw bytes.  If a READ occurs in the middle of a hole or ADB, the
3569	   server can only send back bytes starting from that offset.  In
3570	   contrast, if a READ_PLUS occurs in the middle of a hole or ADB, the
3571	   server can send back a range which starts before the offset and
3572	   extends past the range.

3574	   READ is inefficient for transfer of sparse sections of the file.  As
3575	   such, READ is marked as OBSOLETE in NFSv4.2.  Instead, a client
3576	   should issue READ_PLUS.  Note that as the client has no a priori
3577	   knowledge of whether either an ADB or a hole is present or not, it
3578	   should always use READ_PLUS.

3580	13.10.1.  ARGUMENT

3582	   struct READ_PLUS4args {
3583	           /* CURRENT_FH: file */
3584	           stateid4        rpa_stateid;
3585	           offset4         rpa_offset;
3586	           count4          rpa_count;
3587	   };

3589	13.10.2.  RESULT

3591	   union read_plus_content switch (data_content4 content) {
3592	   case NFS4_CONTENT_DATA:
3593	           opaque          rpc_data<>;
3594	   case NFS4_CONTENT_APP_BLOCK:
3595	           app_data_block4 rpc_block;
3596	   case NFS4_CONTENT_HOLE:
3597	           data_info4      rpc_hole;
3598	   default:
3599	           void;
3600	   };

3602	   /*
3603	    * Allow a return of an array of contents.
3604	    */
3605	   struct read_plus_res4 {
3606	           bool                    rpr_eof;
3607	           read_plus_content       rpr_contents<>;
3608	   };

3610	   union READ_PLUS4res switch (nfsstat4 status) {
3611	   case NFS4_OK:
3612	           read_plus_res4  resok4;
3613	   default:
3614	           void;
3615	   };

3617	13.10.3.  DESCRIPTION

3619	   The READ_PLUS operation is based upon the NFSv4.1 READ operation [2]
3620	   and similarly reads data from the regular file identified by the
3621	   current filehandle.

3623	   The client provides a rpa_offset of where the READ_PLUS is to start
3624	   and a rpa_count of how many bytes are to be read.  A rpa_offset of
3625	   zero means to read data starting at the beginning of the file.  If
3626	   rpa_offset is greater than or equal to the size of the file, the
3627	   status NFS4_OK is returned with di_length (the data length) set to
3628	   zero and eof set to TRUE.

3630	   The READ_PLUS result is comprised of an array of rpr_contents, each
3631	   of which describe a data_content4 type of data (Section 6.1.2).  For
3632	   NFSv4.2, the allowed values are data, ADB, and hole.  A server is
3633	   required to support the data type, but neither ADB nor hole.  Both an
3634	   ADB and a hole must be returned in its entirety - clients must be
3635	   prepared to get more information than they requested.

3637	   READ_PLUS has to support all of the errors which are returned by READ
3638	   plus NFS4ERR_UNION_NOTSUPP.  If the client asks for a hole and the
3639	   server does not support that arm of the discriminated union, but does
3640	   support one or more additional arms, it can signal to the client that
3641	   it supports the operation, but not the arm with
3642	   NFS4ERR_UNION_NOTSUPP.

3644	   If the data to be returned is comprised entirely of zeros, then the
3645	   server may elect to return that data as a hole.  The server
3646	   differentiates this to the client by setting di_allocated to TRUE in
3647	   this case.  Note that in such a scenario, the server is not required
3648	   to determine the full extent of the "hole" - it does not need to
3649	   determine where the zeros start and end.

3651	   The server may elect to return adjacent elements of the same type.
3652	   For example, the guard pattern or block size of an ADB might change,
3653	   which would require adjacent elements of type ADB.  Likewise if the
3654	   server has a range of data comprised entirely of zeros and then a
3655	   hole, it might want to return two adjacent holes to the client.

3657	   If the client specifies a rpa_count value of zero, the READ_PLUS
3658	   succeeds and returns zero bytes of data.  In all situations, the
3659	   server may choose to return fewer bytes than specified by the client.
3660	   The client needs to check for this condition and handle the condition
3661	   appropriately.

3663	   If the client specifies an rpa_offset and rpa_count value that is
3664	   entirely contained within a hole of the file, then the di_offset and
3665	   di_length returned must be for the entire hole.  This result is
3666	   considered valid until the file is changed (detected via the change
3667	   attribute).  The server MUST provide the same semantics for the hole
3668	   as if the client read the region and received zeroes; the implied
3669	   holes contents lifetime MUST be exactly the same as any other read
3670	   data.

3672	   If the client specifies an rpa_offset and rpa_count value that begins
3673	   in a non-hole of the file but extends into hole the server should
3674	   return an array comprised of both data and a hole.  The client MUST
3675	   be prepared for the server to return a short read describing just the
3676	   data.  The client will then issue another READ_PLUS for the remaining
3677	   bytes, which the server will respond with information about the hole
3678	   in the file.

3680	   Except when special stateids are used, the stateid value for a
3681	   READ_PLUS request represents a value returned from a previous byte-
3682	   range lock or share reservation request or the stateid associated
3683	   with a delegation.  The stateid identifies the associated owners if
3684	   any and is used by the server to verify that the associated locks are
3685	   still valid (e.g., have not been revoked).

3687	   If the read ended at the end-of-file (formally, in a correctly formed
3688	   READ_PLUS operation, if rpa_offset + rpa_count is equal to the size
3689	   of the file), or the READ_PLUS operation extends beyond the size of
3690	   the file (if rpa_offset + rpa_count is greater than the size of the
3691	   file), eof is returned as TRUE; otherwise, it is FALSE.  A successful
3692	   READ_PLUS of an empty file will always return eof as TRUE.

3694	   If the current filehandle is not an ordinary file, an error will be
3695	   returned to the client.  In the case that the current filehandle
3696	   represents an object of type NF4DIR, NFS4ERR_ISDIR is returned.  If
3697	   the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
3698	   returned.  In all other cases, NFS4ERR_WRONG_TYPE is returned.

3700	   For a READ_PLUS with a stateid value of all bits equal to zero, the
3701	   server MAY allow the READ_PLUS to be serviced subject to mandatory
3702	   byte-range locks or the current share deny modes for the file.  For a
3703	   READ_PLUS with a stateid value of all bits equal to one, the server
3704	   MAY allow READ_PLUS operations to bypass locking checks at the
3705	   server.

3707	   On success, the current filehandle retains its value.

3709	13.10.4.  IMPLEMENTATION

3711	   In general, the IMPLEMENTATION notes for READ in Section 18.22.4 of
3712	   [2] also apply to READ_PLUS.  One delta is that when the owner has a
3713	   locked byte range, the server MUST return an array of rpr_contents
3714	   with values inside that range.

3716	13.10.4.1.  Additional pNFS Implementation Information

3718	   With pNFS, the semantics of using READ_PLUS remains the same.  Any
3719	   data server MAY return a hole or ADB result for a READ_PLUS request
3720	   that it receives.  When a data server chooses to return such a
3721	   result, it has the option of returning information for the data
3722	   stored on that data server (as defined by the data layout), but it
3723	   MUST not return results for a byte range that includes data managed
3724	   by another data server.

3726	   A data server should do its best to return as much information about
3727	   a hole ADB as is feasible without having to contact the metadata
3728	   server.  If communication with the metadata server is required, then
3729	   every attempt should be taken to minimize the number of requests.

3731	   If mandatory locking is enforced, then the data server must also
3732	   ensure that to return only information that is within the owner's
3733	   locked byte range.

3735	13.10.5.  READ_PLUS with Sparse Files Example

3737	   The following table describes a sparse file.  For each byte range,
3738	   the file contains either non-zero data or a hole.  In addition, the
3739	   server in this example uses a Hole Threshold of 32K.

3741	                        +-------------+----------+
3742	                        | Byte-Range  | Contents |
3743	                        +-------------+----------+
3744	                        | 0-15999     | Hole     |
3745	                        | 16K-31999   | Non-Zero |
3746	                        | 32K-255999  | Hole     |
3747	                        | 256K-287999 | Non-Zero |
3748	                        | 288K-353999 | Hole     |
3749	                        | 354K-417999 | Non-Zero |
3750	                        +-------------+----------+

3752	                                  Table 5

3754	   Under the given circumstances, if a client was to read from the file
3755	   with a max read size of 64K, the following will be the results for
3756	   the given READ_PLUS calls.  This assumes the client has already
3757	   opened the file, acquired a valid stateid ('s' in the example), and
3758	   just needs to issue READ_PLUS requests.

3760	   1.  READ_PLUS(s, 0, 64K) --> NFS_OK, eof = false, <data[0,32K],
3761	       hole[32K,224K]>.  Since the first hole is less than the server's
3762	       Hole Threshhold, the first 32K of the file is returned as data
3763	       and the remaining 32K is returned as a hole which actually
3764	       extends to 256K.

3766	   2.  READ_PLUS(s, 32K, 64K) --> NFS_OK, eof = false, <hole[32K,224K]>
3767	       The requested range was all zeros, and the current hole begins at
3768	       offset 32K and is 224K in length.  Note that the client should
3769	       not have followed up the previous READ_PLUS request with this one
3770	       as the hole information from the previous call extended past what
3771	       the client was requesting.

3773	   3.  READ_PLUS(s, 256K, 64K) --> NFS_OK, eof = false, <data[256K,
3774	       288K], hole[288K, 354K]>.  Returns an array of the 32K data and
3775	       the hole which extends to 354K.

3777	   4.  READ_PLUS(s, 354K, 64K) --> NFS_OK, eof = true, <data[354K,
3778	       418K]>.  Returns the final 64K of data and informs the client
3779	       there is no more data in the file.

3781	13.11.  Operation 66: SEEK

3783	   SEEK is an operation that allows a client to determine the location
3784	   of the next data_content4 in a file.  It allows an implementation of
3785	   the emerging extension to lseek(2) to allow clients to determine
3786	   SEEK_HOLE and SEEK_DATA.

3788	13.11.1.  ARGUMENT

3790	   struct SEEK4args {
3791	           /* CURRENT_FH: file */
3792	           stateid4        sa_stateid;
3793	           offset4         sa_offset;
3794	           data_content4   sa_what;
3795	   };

3797	13.11.2.  RESULT

3799	   union seek_content switch (data_content4 content) {
3800	   case NFS4_CONTENT_DATA:
3801	           data_info4      sc_data;
3802	   case NFS4_CONTENT_APP_BLOCK:
3803	           app_data_block4 sc_block;
3804	   case NFS4_CONTENT_HOLE:
3805	           data_info4      sc_hole;
3806	   default:
3807	           void;
3808	   };

3810	   struct seek_res4 {
3811	           bool                    sr_eof;
3812	           seek_content            sr_contents;
3813	   };

3815	   union SEEK4res switch (nfsstat4 status) {
3816	   case NFS4_OK:
3817	           seek_res4       resok4;
3818	   default:
3819	           void;
3820	   };

3822	13.11.3.  DESCRIPTION

3824	   From the given sa_offset, find the next data_content4 of type sa_what
3825	   in the file.  For either a hole or ADB, this must return the
3826	   data_content4 in its entirety.  For data, it must not return the
3827	   actual data.

3829	   SEEK must follow the same rules for stateids as READ_PLUS
3830	   (Section 13.10.3).

3832	   If the server could not find a corresponding sa_what, then the status
3833	   would still be NFS4_OK, but sr_eof would be TRUE.  The sr_contents
3834	   would contain a zero-ed out content of the appropriate type.

3836	14.  NFSv4.2 Callback Operations

3838	14.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that the File's
3839	       Attributes Changed

3841	14.1.1.  ARGUMENTS

3843	   struct CB_ATTR_CHANGED4args {
3844	           nfs_fh4         acca_fh;
3845	           bitmap4         acca_critical;
3846	           bitmap4         acca_info;
3847	   };

3849	14.1.2.  RESULTS

3851	   struct CB_ATTR_CHANGED4res {
3852	           nfsstat4        accr_status;
3853	   };

3855	14.1.3.  DESCRIPTION

3857	   The CB_ATTR_CHANGED callback operation is used by the server to
3858	   indicate to the client that the file's attributes have been modified
3859	   on the server.  The server does not convey how the attributes have
3860	   changed, just that they have been modified.  The server can inform
3861	   the client about both critical and informational attribute changes in
3862	   the bitmask arguments.  The client SHOULD query the server about all
3863	   attributes set in acca_critical.  For all changes reflected in
3864	   acca_info, the client can decide whether or not it wants to poll the
3865	   server.

3867	   The CB_ATTR_CHANGED callback operation with the FATTR4_SEC_LABEL set
3868	   in acca_critical is the method used by the server to indicate that
3869	   the MAC label for the file referenced by acca_fh has changed.  In
3870	   many ways, the server does not care about the result returned by the
3871	   client.

3873	14.2.  Operation 15: CB_COPY - Report results of a server-side copy
3874	14.2.1.  ARGUMENT

3876	   union copy_info4 switch (nfsstat4 cca_status) {
3877	           case NFS4_OK:
3878	                   void;
3879	           default:
3880	                   length4         cca_bytes_copied;
3881	   };

3883	   struct CB_COPY4args {
3884	           nfs_fh4         cca_fh;
3885	           stateid4        cca_stateid;
3886	           copy_info4      cca_copy_info;
3887	   };

3889	14.2.2.  RESULT

3891	   struct CB_COPY4res {
3892	           nfsstat4        ccr_status;
3893	   };

3895	14.2.3.  DESCRIPTION

3897	   CB_COPY is used for both intra- and inter-server asynchronous copies.
3898	   The CB_COPY callback informs the client of the result of an
3899	   asynchronous server-side copy.  This operation is sent by the
3900	   destination server to the client in a CB_COMPOUND request.  The copy
3901	   is identified by the filehandle and stateid arguments.  The result is
3902	   indicated by the status field.  If the copy failed, cca_bytes_copied
3903	   contains the number of bytes copied before the failure occurred.  The
3904	   cca_bytes_copied value indicates the number of bytes copied but not
3905	   which specific bytes have been copied.

3907	   If the client supports the COPY operation, the client is REQUIRED to
3908	   support the CB_COPY operation.

3910	   There is a potential race between the reply to the original COPY on
3911	   the forechannel and the CB_COPY callback on the backchannel.
3912	   Sections 2.10.6.3 and 20.9.3 in [2] describes how to handle this type
3913	   of issue.

3915	   The CB_COPY operation may fail for the following reasons (this is a
3916	   partial list):

3918	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
3919	      NFS client receiving this request.

3921	15.  IANA Considerations

3923	   This section uses terms that are defined in [25].

3925	16.  References

3927	16.1.  Normative References

3929	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
3930	         Levels", March 1997.

3932	   [2]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
3933	         (NFS) Version 4 Minor Version 1 Protocol", RFC 5661,
3934	         January 2010.

3936	   [3]   Haynes, T., "Network File System (NFS) Version 4 Minor Version
3937	         2 External Data Representation Standard (XDR) Description",
3938	         March 2011.

3940	   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
3941	         Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
3942	         January 2005.

3944	   [5]   Haynes, T. and N. Williams, "Remote Procedure Call (RPC)
3945	         Security Version 3", draft-williams-rpcsecgssv3 (work in
3946	         progress), 2011.

3948	   [6]   The Open Group, "Section 'posix_fadvise()' of System Interfaces
3949	         of The Open Group Base Specifications Issue 6, IEEE Std 1003.1,
3950	         2004 Edition", 2004.

3952	   [7]   Haynes, T., "Requirements for Labeled NFS",
3953	         draft-ietf-nfsv4-labreqs-00 (work in progress).

3955	   [8]   Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
3956	         Specification", RFC 2203, September 1997.

3958	   [9]   Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel
3959	         NFS (pNFS) Operations", RFC 5664, January 2010.

3961	16.2.  Informative References

3963	   [10]  Haynes, T. and D. Noveck, "Network File System (NFS) version 4
3964	         Protocol", draft-ietf-nfsv4-rfc3530bis-09 (Work In Progress),
3965	         March 2011.

3967	   [11]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3968	         "NSDB Protocol for Federated Filesystems",
3969	         draft-ietf-nfsv4-federated-fs-protocol (Work In Progress),
3970	         2010.

3972	   [12]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3973	         "Administration Protocol for Federated Filesystems",
3974	         draft-ietf-nfsv4-federated-fs-admin (Work In Progress), 2010.

3976	   [13]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
3977	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
3978	         HTTP/1.1", RFC 2616, June 1999.

3980	   [14]  Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9,
3981	         RFC 959, October 1985.

3983	   [15]  Simpson, W., "PPP Challenge Handshake Authentication Protocol
3984	         (CHAP)", RFC 1994, August 1996.

3986	   [16]  VanDeBogart, S., Frost, C., and E. Kohler, "Reducing Seek
3987	         Overhead with Application-Directed Prefetching", Proceedings of
3988	         USENIX Annual Technical Conference , June 2009.

3990	   [17]  Strohm, R., "Chapter 2, Data Blocks, Extents, and Segments, of
3991	         Oracle Database Concepts 11g Release 1 (11.1)", January 2011.

3993	   [18]  Ashdown, L., "Chapter 15, Validating Database Files and
3994	         Backups, of Oracle Database Backup and Recovery User's Guide
3995	         11g Release 1 (11.1)", August 2008.

3997	   [19]  McDougall, R. and J. Mauro, "Section 11.4.3, Detecting Memory
3998	         Corruption of Solaris Internals", 2007.

4000	   [20]  Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-
4001	         Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data
4002	         Corruption in the Storage Stack", Proceedings of the 6th USENIX
4003	         Symposium on File and Storage Technologies (FAST '08) , 2008.

4005	   [21]  "Section 46.6. Multi-Level Security (MLS) of Deployment Guide:
4006	         Deployment, configuration and administration of Red Hat
4007	         Enterprise Linux 5, Edition 6", 2011.

4009	   [22]  Quigley, D. and J. Lu, "Registry Specification for MAC Security
4010	         Label Formats", draft-quigley-label-format-registry (work in
4011	         progress), 2011.

4013	   [23]  ISEG, "IESG Processing of RFC Errata for the IETF Stream",
4014	         2008.

4016	   [24]  Eisler, M., "XDR: External Data Representation Standard",
4017	         RFC 4506, May 2006.

4019	   [25]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
4020	         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

4022	Appendix A.  Acknowledgments

4024	   For the pNFS Access Permissions Check, the original draft was by
4025	   Sorin Faibish, David Black, Mike Eisler, and Jason Glasgow.  The work
4026	   was influenced by discussions with Benny Halevy and Bruce Fields.  A
4027	   review was done by Tom Haynes.

4029	   For the Sharing change attribute implementation details with NFSv4
4030	   clients, the original draft was by Trond Myklebust.

4032	   For the NFS Server-side Copy, the original draft was by James
4033	   Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul
4034	   Iyer.  Tom Talpey co-authored an unpublished version of that
4035	   document.  It was also was reviewed by a number of individuals:
4036	   Pranoop Erasani, Tom Haynes, Arthur Lent, Trond Myklebust, Dave
4037	   Noveck, Theresa Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani,
4038	   and Nico Williams.

4040	   For the NFS space reservation operations, the original draft was by
4041	   Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer.

4043	   For the sparse file support, the original draft was by Dean
4044	   Hildebrand and Marc Eshel.  Valuable input and advice was received
4045	   from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and
4046	   Richard Scheffenegger.

4048	   For the Application IO Hints, the original draft was by Dean
4049	   Hildebrand, Mike Eisler, Trond Myklebust, and Sam Falkner.  Some
4050	   early reviwers included Benny Halevy and Pranoop Erasani.

4052	   For Labeled NFS, the original draft was by David Quigley, James
4053	   Morris, Jarret Lu, and Tom Haynes.  Peter Staubach, Trond Myklebust,
4054	   Stephen Smalley, Sorrin Faibish, Nico Williams, and David Black also
4055	   contributed in the final push to get this accepted.

4057	   During the review process, Talia Reyes-Ortiz helped the sessions run
4058	   smoothly.  While many people contributed here and there, the core
4059	   reviewers were Andy Adamson, Pranoop Erasani, Bruce Fields, Chuck
4060	   Lever, Trond Myklebust, David Noveck, and Peter Staubach.

4062	Appendix B.  RFC Editor Notes

4064	   [RFC Editor: please remove this section prior to publishing this
4065	   document as an RFC]

4067	   [RFC Editor: prior to publishing this document as an RFC, please
4068	   replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
4069	   RFC number of this document]

4071	Author's Address

4073	   Thomas Haynes
4074	   NetApp
4075	   9110 E 66th St
4076	   Tulsa, OK  74133
4077	   USA

4079	   Phone: +1 918 307 1415
4080	   Email: thomas@netapp.com
4081	   URI:   http://www.tulsalabs.com