idnits 2.17.1 

draft-ietf-nfsv4-minorversion2-14.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  == There are 5 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 2795 has weird spacing: '...S4resok    res...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The second change is to provide a method for the server to notify
     the client that the attribute changed on an open file on the server.  If
     the file is closed, then during the open attempt, the client will gather
     the new attribute value.  The server MUST not communicate the new value
     of the attribute, the client MUST query it.  This requirement stems from
     the need for the client to provide sufficient access rights to the
     attribute.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     o  MUST not expose an object to either the client or server name
     space before its security information has been bound to it.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     With pNFS, the semantics of using READ_PLUS remains the same.  Any
     data server MAY return a hole or ADH result for a READ_PLUS request that
     it receives.  When a data server chooses to return such a result, it has
     the option of returning information for the data stored on that data
     server (as defined by the data layout), but it MUST not return results
     for a byte range that includes data managed by another data server.

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 30, 2012) is 4218 days in the past.  Is
     this intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 3631, but not defined

  -- Looks like a reference, but probably isn't: '32K' on line 3631

  == Unused Reference: '25' is defined on line 3889, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 5661 (ref. '2') (Obsoleted by RFC 8881)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  == Outdated reference: A later version (-05) exists of
     draft-ietf-nfsv4-labreqs-00

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-nfsv4-labreqs (ref. '7')

  == Outdated reference: A later version (-35) exists of
     draft-ietf-nfsv4-rfc3530bis-09

  -- Obsolete informational reference (is this intentional?): RFC 2616 (ref.
     '13') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC
     7235)

  -- Obsolete informational reference (is this intentional?): RFC 5226 (ref.
     '24') (Obsoleted by RFC 8126)


     Summary: 2 errors (**), 0 flaws (~~), 12 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                          T. Haynes
3	Internet-Draft                                                    Editor
4	Intended status: Standards Track                      September 30, 2012
5	Expires: April 3, 2013

7	                     NFS Version 4 Minor Version 2
8	                 draft-ietf-nfsv4-minorversion2-14.txt

10	Abstract

12	   This Internet-Draft describes NFS version 4 minor version two,
13	   focusing mainly on the protocol extensions made from NFS version 4
14	   minor version 0 and NFS version 4 minor version 1.  Major extensions
15	   introduced in NFS version 4 minor version two include: Server-side
16	   Copy, Application I/O Advise, Space Reservations, Sparse Files,
17	   Application Data Blocks, and Labeled NFS.

19	Requirements Language

21	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
22	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
23	   document are to be interpreted as described in RFC 2119 [1].

25	Status of this Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on April 3, 2013.

42	Copyright Notice

44	   Copyright (c) 2012 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	   This document may contain material from IETF Documents or IETF
58	   Contributions published or made publicly available before November
59	   10, 2008.  The person(s) controlling the copyright in some of this
60	   material may not have granted the IETF Trust the right to allow
61	   modifications of such material outside the IETF Standards Process.
62	   Without obtaining an adequate license from the person(s) controlling
63	   the copyright in such materials, this document may not be modified
64	   outside the IETF Standards Process, and derivative works of it may
65	   not be created outside the IETF Standards Process, except to format
66	   it for publication as an RFC or to translate it into languages other
67	   than English.

69	Table of Contents

71	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
72	     1.1.   The NFS Version 4 Minor Version 2 Protocol  . . . . . . .  5
73	     1.2.   Scope of This Document  . . . . . . . . . . . . . . . . .  5
74	     1.3.   NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . .  5
75	     1.4.   Overview of NFSv4.2 Features  . . . . . . . . . . . . . .  6
76	       1.4.1.  Sparse Files . . . . . . . . . . . . . . . . . . . . .  6
77	       1.4.2.  Application I/O Advise . . . . . . . . . . . . . . . .  6
78	     1.5.   Differences from NFSv4.1  . . . . . . . . . . . . . . . .  6
79	   2.  NFS Server-side Copy . . . . . . . . . . . . . . . . . . . . .  6
80	     2.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . .  6
81	     2.2.   Protocol Overview . . . . . . . . . . . . . . . . . . . .  7
82	       2.2.1.  Overview of Copy Operations  . . . . . . . . . . . . .  7
83	       2.2.2.  Locking the Files  . . . . . . . . . . . . . . . . . .  8
84	       2.2.3.  Intra-Server Copy  . . . . . . . . . . . . . . . . . .  8
85	       2.2.4.  Inter-Server Copy  . . . . . . . . . . . . . . . . . . 10
86	       2.2.5.  Server-to-Server Copy Protocol . . . . . . . . . . . . 14
87	     2.3.   Requirements for Operations . . . . . . . . . . . . . . . 15
88	       2.3.1.  netloc4 - Network Locations  . . . . . . . . . . . . . 16
89	       2.3.2.  Copy Offload Stateids  . . . . . . . . . . . . . . . . 16
90	     2.4.   Security Considerations . . . . . . . . . . . . . . . . . 17
91	       2.4.1.  Inter-Server Copy Security . . . . . . . . . . . . . . 17
92	   3.  Support for Application IO Hints . . . . . . . . . . . . . . . 25
93	   4.  Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . . 25
94	     4.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 25
95	     4.2.   Terminology . . . . . . . . . . . . . . . . . . . . . . . 26
96	   5.  Space Reservation  . . . . . . . . . . . . . . . . . . . . . . 26
97	     5.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 27
98	   6.  Application Data Hole Support  . . . . . . . . . . . . . . . . 29
99	     6.1.   Generic Framework . . . . . . . . . . . . . . . . . . . . 29
100	       6.1.1.  Data Hole Representation . . . . . . . . . . . . . . . 30
101	       6.1.2.  Data Content . . . . . . . . . . . . . . . . . . . . . 30
102	     6.2.   An Example of Detecting Corruption  . . . . . . . . . . . 31
103	     6.3.   Example of READ_PLUS  . . . . . . . . . . . . . . . . . . 32
104	   7.  Labeled NFS  . . . . . . . . . . . . . . . . . . . . . . . . . 33
105	     7.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 33
106	     7.2.   Definitions . . . . . . . . . . . . . . . . . . . . . . . 34
107	     7.3.   MAC Security Attribute  . . . . . . . . . . . . . . . . . 34
108	       7.3.1.  Delegations  . . . . . . . . . . . . . . . . . . . . . 35
109	       7.3.2.  Permission Checking  . . . . . . . . . . . . . . . . . 35
110	       7.3.3.  Object Creation  . . . . . . . . . . . . . . . . . . . 36
111	       7.3.4.  Existing Objects . . . . . . . . . . . . . . . . . . . 36
112	       7.3.5.  Label Changes  . . . . . . . . . . . . . . . . . . . . 36
113	     7.4.   pNFS Considerations . . . . . . . . . . . . . . . . . . . 37
114	     7.5.   Discovery of Server Labeled NFS Support . . . . . . . . . 37
115	     7.6.   MAC Security NFS Modes of Operation . . . . . . . . . . . 38
116	       7.6.1.  Full Mode  . . . . . . . . . . . . . . . . . . . . . . 38
117	       7.6.2.  Guest Mode . . . . . . . . . . . . . . . . . . . . . . 39
118	     7.7.   Security Considerations . . . . . . . . . . . . . . . . . 39
119	   8.  Sharing change attribute implementation details with NFSv4
120	       clients  . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
121	     8.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 40
122	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 40
123	   10. Error Values . . . . . . . . . . . . . . . . . . . . . . . . . 41
124	     10.1.  Error Definitions . . . . . . . . . . . . . . . . . . . . 41
125	       10.1.1. General Errors . . . . . . . . . . . . . . . . . . . . 41
126	       10.1.2. Server to Server Copy Errors . . . . . . . . . . . . . 41
127	       10.1.3. Labeled NFS Errors . . . . . . . . . . . . . . . . . . 42
128	   11. New File Attributes  . . . . . . . . . . . . . . . . . . . . . 42
129	     11.1.  New RECOMMENDED Attributes - List and Definition
130	            References  . . . . . . . . . . . . . . . . . . . . . . . 42
131	     11.2.  Attribute Definitions . . . . . . . . . . . . . . . . . . 43
132	   12. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . . 46
133	   13. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . . 50
134	     13.1.  Operation 59: COPY - Initiate a server-side copy  . . . . 50
135	     13.2.  Operation 60: OFFLOAD_ABORT - Cancel a server-side
136	            copy  . . . . . . . . . . . . . . . . . . . . . . . . . . 57
137	     13.3.  Operation 61: COPY_NOTIFY - Notify a source server of
138	            a future copy . . . . . . . . . . . . . . . . . . . . . . 58
139	     13.4.  Operation 62: OFFLOAD_REVOKE - Revoke a destination
140	            server's copy privileges  . . . . . . . . . . . . . . . . 60
141	     13.5.  Operation 63: OFFLOAD_STATUS - Poll for status of a
142	            server-side copy  . . . . . . . . . . . . . . . . . . . . 61
143	     13.6.  Modification to Operation 42: EXCHANGE_ID -
144	            Instantiate Client ID . . . . . . . . . . . . . . . . . . 62
145	     13.7.  Operation 64: INITIALIZE  . . . . . . . . . . . . . . . . 63
146	     13.8.  Operation 67: IO_ADVISE - Application I/O access
147	            pattern hints . . . . . . . . . . . . . . . . . . . . . . 67
148	     13.9.  Changes to Operation 51: LAYOUTRETURN . . . . . . . . . . 72
149	     13.10. Operation 65: READ_PLUS . . . . . . . . . . . . . . . . . 75
150	     13.11. Operation 66: SEEK  . . . . . . . . . . . . . . . . . . . 80
151	   14. NFSv4.2 Callback Operations  . . . . . . . . . . . . . . . . . 81
152	     14.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that
153	            the File's Attributes Changed . . . . . . . . . . . . . . 81
154	     14.2.  Operation 15: CB_COPY - Report results of a
155	            server-side copy  . . . . . . . . . . . . . . . . . . . . 82
156	   15. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 84
157	   16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 84
158	     16.1.  Normative References  . . . . . . . . . . . . . . . . . . 84
159	     16.2.  Informative References  . . . . . . . . . . . . . . . . . 85
160	   Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 86
161	   Appendix B.  RFC Editor Notes  . . . . . . . . . . . . . . . . . . 87
162	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 87

164	1.  Introduction

166	1.1.  The NFS Version 4 Minor Version 2 Protocol

168	   The NFS version 4 minor version 2 (NFSv4.2) protocol is the third
169	   minor version of the NFS version 4 (NFSv4) protocol.  The first minor
170	   version, NFSv4.0, is described in [10] and the second minor version,
171	   NFSv4.1, is described in [2].  It follows the guidelines for minor
172	   versioning that are listed in Section 11 of [10].

174	   As a minor version, NFSv4.2 is consistent with the overall goals for
175	   NFSv4, but extends the protocol so as to better meet those goals,
176	   based on experiences with NFSv4.1.  In addition, NFSv4.2 has adopted
177	   some additional goals, which motivate some of the major extensions in
178	   NFSv4.2.

180	1.2.  Scope of This Document

182	   This document describes the NFSv4.2 protocol.  With respect to
183	   NFSv4.0 and NFSv4.1, this document does not:

185	   o  describe the NFSv4.0 or NFSv4.1 protocols, except where needed to
186	      contrast with NFSv4.2.

188	   o  modify the specification of the NFSv4.0 or NFSv4.1 protocols.

190	   o  clarify the NFSv4.0 or NFSv4.1 protocols.  I.e., any
191	      clarifications made here apply to NFSv4.2 and neither of the prior
192	      protocols.

194	   The full XDR for NFSv4.2 is presented in [3].

196	1.3.  NFSv4.2 Goals

198	   The goal of the design of NFSv4.2 is to take common local file system
199	   features and offer them remotely.  These features might

201	   o  already be available on the servers, e.g., sparse files

203	   o  be under development as a new standard, e.g., SEEK_HOLE and
204	      SEEK_DATA

206	   o  be used by clients with the servers via some proprietary means,
207	      e.g., Labeled NFS

209	   but the clients are not able to leverage them on the server within
210	   the confines of the NFS protocol.

212	1.4.  Overview of NFSv4.2 Features

214	   [[Comment.1: This needs fleshing out! --TH]]

216	1.4.1.  Sparse Files

218	   Two new operations are defined to support the reading of sparse files
219	   (READ_PLUS) and the punching of holes to remove backing storage
220	   (INITIALIZE).

222	1.4.2.  Application I/O Advise

224	   We propose a new IO_ADVISE operation for NFSv4.2 that clients can use
225	   to communicate expected I/O behavior to the server.  By communicating
226	   future I/O behavior such as whether a file will be accessed
227	   sequentially or randomly, and whether a file will or will not be
228	   accessed in the near future, servers can optimize future I/O requests
229	   for a file by, for example, prefetching or evicting data.  This
230	   operation can be used to support the posix_fadvise function as well
231	   as other applications such as databases and video editors.

233	1.5.  Differences from NFSv4.1

235	   In NFSv4.1, the only way to introduce new variants of an operation
236	   was to introduce a new operation.  I.e., READ becomes either READ2 or
237	   READ_PLUS.  With the use of discriminated unions as parameters to
238	   such functions in NFSv4.2, it is possible to add a new arm in a
239	   subsequent minor version.  And it is also possible to move such an
240	   operation from OPTIONAL/RECOMMENDED to REQUIRED.  Forcing an
241	   implementation to adopt each arm of a discriminated union at such a
242	   time does not meet the spirit of the minor versioning rules.  As
243	   such, new arms of a discriminated union MUST follow the same
244	   guidelines for minor versioning as operations in NFSv4.1 - i.e., they
245	   may not be made REQUIRED.  To support this, a new error code,
246	   NFS4ERR_UNION_NOTSUPP, is introduced which allows the server to
247	   communicate to the client that the operation is supported, but the
248	   specific arm of the discriminated union is not.

250	2.  NFS Server-side Copy

252	2.1.  Introduction

254	   The server-side copy feature provides a mechanism for the NFS client
255	   to perform a file copy on the server without the data being
256	   transmitted back and forth over the network.  Without this feature,
257	   an NFS client copies data from one location to another by reading the
258	   data from the server over the network, and then writing the data back
259	   over the network to the server.  Using this server-side copy
260	   operation, the client is able to instruct the server to copy the data
261	   locally without the data being sent back and forth over the network
262	   unnecessarily.

264	   If the source object and destination object are on different file
265	   servers, the file servers will communicate with one another to
266	   perform the copy operation.  The server-to-server protocol by which
267	   this is accomplished is not defined in this document.

269	2.2.  Protocol Overview

271	   The server-side copy offload operations support both intra-server and
272	   inter-server file copies.  An intra-server copy is a copy in which
273	   the source file and destination file reside on the same server.  In
274	   an inter-server copy, the source file and destination file are on
275	   different servers.  In both cases, the copy may be performed
276	   synchronously or asynchronously.

278	   Throughout the rest of this document, we refer to the NFS server
279	   containing the source file as the "source server" and the NFS server
280	   to which the file is transferred as the "destination server".  In the
281	   case of an intra-server copy, the source server and destination
282	   server are the same server.  Therefore in the context of an intra-
283	   server copy, the terms source server and destination server refer to
284	   the single server performing the copy.

286	   The operations described below are designed to copy files.  Other
287	   file system objects can be copied by building on these operations or
288	   using other techniques.  For example if the user wishes to copy a
289	   directory, the client can synthesize a directory copy by first
290	   creating the destination directory and then copying the source
291	   directory's files to the new destination directory.  If the user
292	   wishes to copy a namespace junction [11] [12], the client can use the
293	   ONC RPC Federated Filesystem protocol [12] to perform the copy.
294	   Specifically the client can determine the source junction's
295	   attributes using the FEDFS_LOOKUP_FSN procedure and create a
296	   duplicate junction using the FEDFS_CREATE_JUNCTION procedure.

298	   For the inter-server copy, the operations are defined to be
299	   compatible with the traditional copy authentication approach.  The
300	   client and user are authorized at the source for reading.  Then they
301	   are authorized at the destination for writing.

303	2.2.1.  Overview of Copy Operations
304	   COPY_NOTIFY:  For inter-server copies, the client sends this
305	      operation to the source server to notify it of a future file copy
306	      from a given destination server for the given user.
307	      (Section 13.3)

309	   OFFLOAD_REVOKE:  Also for inter-server copies, the client sends this
310	      operation to the source server to revoke permission to copy a file
311	      for the given user.  (Section 13.4)

313	   COPY:  Used by the client to request a file copy.  (Section 13.1)

315	   OFFLOAD_ABORT:  Used by the client to abort an asynchronous file
316	      copy.  (Section 13.2)

318	   OFFLOAD_STATUS:  Used by the client to poll the status of an
319	      asynchronous file copy.  (Section 13.5)

321	   CB_COPY:  Used by the destination server to report the results of an
322	      asynchronous file copy to the client.  (Section 14.2)

324	2.2.2.  Locking the Files

326	   Both the source and destination file may need to be locked to protect
327	   the content during the copy operations.  A client can achieve this by
328	   a combination of OPEN and LOCK operations.  I.e., either share or
329	   byte range locks might be desired.

331	2.2.3.  Intra-Server Copy

333	   To copy a file on a single server, the client uses a COPY operation.
334	   The server may respond to the copy operation with the final results
335	   of the copy or it may perform the copy asynchronously and deliver the
336	   results using a CB_COPY operation callback.  If the copy is performed
337	   asynchronously, the client may poll the status of the copy using
338	   OFFLOAD_STATUS or cancel the copy using OFFLOAD_ABORT.

340	   A synchronous intra-server copy is shown in Figure 1.  In this
341	   example, the NFS server chooses to perform the copy synchronously.
342	   The copy operation is completed, either successfully or
343	   unsuccessfully, before the server replies to the client's request.
344	   The server's reply contains the final result of the operation.

346	     Client                                  Server
347	        +                                      +
348	        |                                      |
349	        |--- OPEN ---------------------------->| Client opens
350	        |<------------------------------------/| the source file
351	        |                                      |
352	        |--- OPEN ---------------------------->| Client opens
353	        |<------------------------------------/| the destination file
354	        |                                      |
355	        |--- COPY ---------------------------->| Client requests
356	        |<------------------------------------/| a file copy
357	        |                                      |
358	        |--- CLOSE --------------------------->| Client closes
359	        |<------------------------------------/| the destination file
360	        |                                      |
361	        |--- CLOSE --------------------------->| Client closes
362	        |<------------------------------------/| the source file
363	        |                                      |
364	        |                                      |

366	                Figure 1: A synchronous intra-server copy.

368	   An asynchronous intra-server copy is shown in Figure 2.  In this
369	   example, the NFS server performs the copy asynchronously.  The
370	   server's reply to the copy request indicates that the copy operation
371	   was initiated and the final result will be delivered at a later time.
372	   The server's reply also contains a copy stateid.  The client may use
373	   this copy stateid to poll for status information (as shown) or to
374	   cancel the copy using a OFFLOAD_ABORT.  When the server completes the
375	   copy, the server performs a callback to the client and reports the
376	   results.

378	     Client                                  Server
379	        +                                      +
380	        |                                      |
381	        |--- OPEN ---------------------------->| Client opens
382	        |<------------------------------------/| the source file
383	        |                                      |
384	        |--- OPEN ---------------------------->| Client opens
385	        |<------------------------------------/| the destination file
386	        |                                      |
387	        |--- COPY ---------------------------->| Client requests
388	        |<------------------------------------/| a file copy
389	        |                                      |
390	        |                                      |
391	        |--- OFFLOAD_STATUS ------------------>| Client may poll
392	        |<------------------------------------/| for status
393	        |                                      |
394	        |                  .                   | Multiple OFFLOAD_STATUS
395	        |                  .                   | operations may be sent.
396	        |                  .                   |
397	        |                                      |
398	        |<-- CB_COPY --------------------------| Server reports results
399	        |\------------------------------------>|
400	        |                                      |
401	        |--- CLOSE --------------------------->| Client closes
402	        |<------------------------------------/| the destination file
403	        |                                      |
404	        |--- CLOSE --------------------------->| Client closes
405	        |<------------------------------------/| the source file
406	        |                                      |
407	        |                                      |

409	               Figure 2: An asynchronous intra-server copy.

411	2.2.4.  Inter-Server Copy

413	   A copy may also be performed between two servers.  The copy protocol
414	   is designed to accommodate a variety of network topologies.  As shown
415	   in Figure 3, the client and servers may be connected by multiple
416	   networks.  In particular, the servers may be connected by a
417	   specialized, high speed network (network 192.168.33.0/24 in the
418	   diagram) that does not include the client.  The protocol allows the
419	   client to setup the copy between the servers (over network
420	   10.11.78.0/24 in the diagram) and for the servers to communicate on
421	   the high speed network if they choose to do so.

423	                             192.168.33.0/24
424	                 +-------------------------------------+
425	                 |                                     |
426	                 |                                     |
427	                 | 192.168.33.18                       | 192.168.33.56
428	         +-------+------+                       +------+------+
429	         |     Source   |                       | Destination |
430	         +-------+------+                       +------+------+
431	                 | 10.11.78.18                         | 10.11.78.56
432	                 |                                     |
433	                 |                                     |
434	                 |             10.11.78.0/24           |
435	                 +------------------+------------------+
436	                                    |
437	                                    |
438	                                    | 10.11.78.243
439	                              +-----+-----+
440	                              |   Client  |
441	                              +-----------+

443	            Figure 3: An example inter-server network topology.

445	   For an inter-server copy, the client notifies the source server that
446	   a file will be copied by the destination server using a COPY_NOTIFY
447	   operation.  The client then initiates the copy by sending the COPY
448	   operation to the destination server.  The destination server may
449	   perform the copy synchronously or asynchronously.

451	   A synchronous inter-server copy is shown in Figure 4.  In this case,
452	   the destination server chooses to perform the copy before responding
453	   to the client's COPY request.

455	   An asynchronous copy is shown in Figure 5.  In this case, the
456	   destination server chooses to respond to the client's COPY request
457	   immediately and then perform the copy asynchronously.

459	     Client                Source         Destination
460	        +                    +                 +
461	        |                    |                 |
462	        |--- OPEN        --->|                 | Returns os1
463	        |<------------------/|                 |
464	        |                    |                 |
465	        |--- COPY_NOTIFY --->|                 |
466	        |<------------------/|                 |
467	        |                    |                 |
468	        |--- OPEN ---------------------------->| Returns os2
469	        |<------------------------------------/|
470	        |                    |                 |
471	        |--- COPY ---------------------------->|
472	        |                    |                 |
473	        |                    |                 |
474	        |                    |<----- read -----|
475	        |                    |\--------------->|
476	        |                    |                 |
477	        |                    |        .        | Multiple reads may
478	        |                    |        .        | be necessary
479	        |                    |        .        |
480	        |                    |                 |
481	        |                    |                 |
482	        |<------------------------------------/| Destination replies
483	        |                    |                 | to COPY
484	        |                    |                 |
485	        |--- CLOSE --------------------------->| Release open state
486	        |<------------------------------------/|
487	        |                    |                 |
488	        |--- CLOSE       --->|                 | Release open state
489	        |<------------------/|                 |

491	                Figure 4: A synchronous inter-server copy.

493	     Client                Source         Destination
494	       +                    +                 +
495	       |                    |                 |
496	       |--- OPEN        --->|                 | Returns os1
497	       |<------------------/|                 |
498	       |                    |                 |
499	       |--- LOCK        --->|                 | Optional, could be done
500	       |<------------------/|                 | with a share lock
501	       |                    |                 |
502	       |--- COPY_NOTIFY --->|                 | Need to pass in
503	       |<------------------/|                 | os1 or lock state
504	       |                    |                 |
505	       |                    |                 |
506	       |                    |                 |
507	       |--- OPEN ---------------------------->| Returns os2
508	       |<------------------------------------/|
509	       |                    |                 |
510	       |--- LOCK ---------------------------->| Optional ...
511	       |<------------------------------------/|
512	       |                    |                 |
513	       |--- COPY ---------------------------->| Need to pass in
514	       |<------------------------------------/| os2 or lock state
515	       |                    |                 |
516	       |                    |                 |
517	       |                    |<----- read -----|
518	       |                    |\--------------->|
519	       |                    |                 |
520	       |                    |        .        | Multiple reads may
521	       |                    |        .        | be necessary
522	       |                    |        .        |
523	       |                    |                 |
524	       |                    |                 |
525	       |--- OFFLOAD_STATUS ------------------>| Client may poll
526	       |<------------------------------------/| for status
527	       |                    |                 |
528	       |                    |        .        | Multiple OFFLOAD_STATUS
529	       |                    |        .        | operations may be sent
530	       |                    |        .        |
531	       |                    |                 |
532	       |                    |                 |
533	       |                    |                 |
534	       |<-- CB_COPY --------------------------| Destination reports
535	       |\------------------------------------>| results
536	       |                    |                 |
537	       |--- LOCKU --------------------------->| Only if LOCK was done
538	       |<------------------------------------/|
539	       |                    |                 |
540	       |--- CLOSE --------------------------->| Release open state
541	       |<------------------------------------/|
542	       |                    |                 |
543	       |--- LOCKU       --->|                 | Only if LOCK was done
544	       |<------------------/|                 |
545	       |                    |                 |
546	       |--- CLOSE       --->|                 | Release open state
547	       |<------------------/|                 |
548	       |                    |                 |

550	               Figure 5: An asynchronous inter-server copy.

552	2.2.5.  Server-to-Server Copy Protocol

554	   The source server and destination server are not required to use a
555	   specific protocol to transfer the file data.  The choice of what
556	   protocol to use is ultimately the destination server's decision.

558	2.2.5.1.  Using NFSv4.x as a Server-to-Server Copy Protocol

560	   The destination server MAY use standard NFSv4.x (where x >= 1) to
561	   read the data from the source server.  If NFSv4.x is used for the
562	   server-to-server copy protocol, the destination server can use the
563	   filehandle contained in the COPY request with standard NFSv4.x
564	   operations to read data from the source server.  Specifically, the
565	   destination server may use the NFSv4.x OPEN operation's CLAIM_FH
566	   facility to open the file being copied and obtain an open stateid.
567	   Using the stateid, the destination server may then use NFSv4.x READ
568	   operations to read the file.

570	2.2.5.2.  Using an alternative Server-to-Server Copy Protocol

572	   In a homogeneous environment, the source and destination servers
573	   might be able to perform the file copy extremely efficiently using
574	   specialized protocols.  For example the source and destination
575	   servers might be two nodes sharing a common file system format for
576	   the source and destination file systems.  Thus the source and
577	   destination are in an ideal position to efficiently render the image
578	   of the source file to the destination file by replicating the file
579	   system formats at the block level.  Another possibility is that the
580	   source and destination might be two nodes sharing a common storage
581	   area network, and thus there is no need to copy any data at all, and
582	   instead ownership of the file and its contents might simply be re-
583	   assigned to the destination.  To allow for these possibilities, the
584	   destination server is allowed to use a server-to-server copy protocol
585	   of its choice.

587	   In a heterogeneous environment, using a protocol other than NFSv4.x
588	   (e.g., HTTP [13] or FTP [14]) presents some challenges.  In
589	   particular, the destination server is presented with the challenge of
590	   accessing the source file given only an NFSv4.x filehandle.

592	   One option for protocols that identify source files with path names
593	   is to use an ASCII hexadecimal representation of the source
594	   filehandle as the file name.

596	   Another option for the source server is to use URLs to direct the
597	   destination server to a specialized service.  For example, the
598	   response to COPY_NOTIFY could include the URL
599	   ftp://s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII
600	   hexadecimal representation of the source filehandle.  When the
601	   destination server receives the source server's URL, it would use
602	   "_FH/0x12345" as the file name to pass to the FTP server listening on
603	   port 9999 of s1.example.com.  On port 9999 there would be a special
604	   instance of the FTP service that understands how to convert NFS
605	   filehandles to an open file descriptor (in many operating systems,
606	   this would require a new system call, one which is the inverse of the
607	   makefh() function that the pre-NFSv4 MOUNT service needs).

609	   Authenticating and identifying the destination server to the source
610	   server is also a challenge.  Recommendations for how to accomplish
611	   this are given in Section 2.4.1.2.4 and Section 2.4.1.4.

613	2.3.  Requirements for Operations

615	   The implementation of server-side copy is OPTIONAL by the client and
616	   the server.  However, in order to successfully copy a file, some
617	   operations MUST be supported by the client and/or server.

619	   If a client desires an intra-server file copy, then it MUST support
620	   the COPY and CB_COPY operations.  If COPY returns a stateid, then the
621	   client MAY use the OFFLOAD_ABORT and OFFLOAD_STATUS operations.

623	   If a client desires an inter-server file copy, then it MUST support
624	   the COPY, COPY_NOTICE, and CB_COPY operations, and MAY use the
625	   OFFLOAD_REVOKE operation.  If COPY returns a stateid, then the client
626	   MAY use the OFFLOAD_ABORT and OFFLOAD_STATUS operations.

628	   If a server supports intra-server copy, then the server MUST support
629	   the COPY operation.  If a server's COPY operation returns a stateid,
630	   then the server MUST also support these operations: CB_COPY,
631	   OFFLOAD_ABORT, and OFFLOAD_STATUS.

633	   If a source server supports inter-server copy, then the source server
634	   MUST support all these operations: COPY_NOTIFY and OFFLOAD_REVOKE.
635	   If a destination server supports inter-server copy, then the
636	   destination server MUST support the COPY operation.  If a destination
637	   server's COPY operation returns a stateid, then the destination
638	   server MUST also support these operations: CB_COPY, OFFLOAD_ABORT,
639	   COPY_NOTIFY, OFFLOAD_REVOKE, and OFFLOAD_STATUS.

641	   Each operation is performed in the context of the user identified by
642	   the ONC RPC credential of its containing COMPOUND or CB_COMPOUND
643	   request.  For example, a OFFLOAD_ABORT operation issued by a given
644	   user indicates that a specified COPY operation initiated by the same
645	   user be canceled.  Therefore a OFFLOAD_ABORT MUST NOT interfere with
646	   a copy of the same file initiated by another user.

648	   An NFS server MAY allow an administrative user to monitor or cancel
649	   copy operations using an implementation specific interface.

651	2.3.1.  netloc4 - Network Locations

653	   The server-side copy operations specify network locations using the
654	   netloc4 data type shown below:

656	   enum netloc_type4 {
657	           NL4_NAME        = 0,
658	           NL4_URL         = 1,
659	           NL4_NETADDR     = 2
660	   };
661	   union netloc4 switch (netloc_type4 nl_type) {
662	           case NL4_NAME:          utf8str_cis nl_name;
663	           case NL4_URL:           utf8str_cis nl_url;
664	           case NL4_NETADDR:       netaddr4    nl_addr;
665	   };

667	   If the netloc4 is of type NL4_NAME, the nl_name field MUST be
668	   specified as a UTF-8 string.  The nl_name is expected to be resolved
669	   to a network address via DNS, LDAP, NIS, /etc/hosts, or some other
670	   means.  If the netloc4 is of type NL4_URL, a server URL [4]
671	   appropriate for the server-to-server copy operation is specified as a
672	   UTF-8 string.  If the netloc4 is of type NL4_NETADDR, the nl_addr
673	   field MUST contain a valid netaddr4 as defined in Section 3.3.9 of
674	   [2].

676	   When netloc4 values are used for an inter-server copy as shown in
677	   Figure 3, their values may be evaluated on the source server,
678	   destination server, and client.  The network environment in which
679	   these systems operate should be configured so that the netloc4 values
680	   are interpreted as intended on each system.

682	2.3.2.  Copy Offload Stateids

684	   A server may perform a copy offload operation asynchronously.  An
685	   asynchronous copy is tracked using a copy offload stateid.  Copy
686	   offload stateids are included in the COPY, OFFLOAD_ABORT,
687	   OFFLOAD_STATUS, and CB_COPY operations.

689	   Section 8.2.4 of [2] specifies that stateids are valid until either
690	   (A) the client or server restart or (B) the client returns the
691	   resource.

693	   A copy offload stateid will be valid until either (A) the client or
694	   server restarts or (B) the client returns the resource by issuing a
695	   OFFLOAD_ABORT operation or the client replies to a CB_COPY operation.

697	   A copy offload stateid's seqid MUST NOT be 0.  In the context of a
698	   copy offload operation, it is ambiguous to indicate the most recent
699	   copy offload operation using a stateid with seqid of 0.  Therefore a
700	   copy offload stateid with seqid of 0 MUST be considered invalid.

702	2.4.  Security Considerations

704	   The security considerations pertaining to NFSv4 [10] apply to this
705	   chapter.

707	   The standard security mechanisms provide by NFSv4 [10] may be used to
708	   secure the protocol described in this chapter.

710	   NFSv4 clients and servers supporting the inter-server copy operations
711	   described in this chapter are REQUIRED to implement [5], including
712	   the RPCSEC_GSSv3 privileges copy_from_auth and copy_to_auth.  If the
713	   server-to-server copy protocol is ONC RPC based, the servers are also
714	   REQUIRED to implement the RPCSEC_GSSv3 privilege copy_confirm_auth.
715	   These requirements to implement are not requirements to use.  NFSv4
716	   clients and servers are RECOMMENDED to use [5] to secure server-side
717	   copy operations.

719	2.4.1.  Inter-Server Copy Security

721	2.4.1.1.  Requirements for Secure Inter-Server Copy

723	   Inter-server copy is driven by several requirements:

725	   o  The specification MUST NOT mandate an inter-server copy protocol.
726	      There are many ways to copy data.  Some will be more optimal than
727	      others depending on the identities of the source server and
728	      destination server.  For example the source and destination
729	      servers might be two nodes sharing a common file system format for
730	      the source and destination file systems.  Thus the source and
731	      destination are in an ideal position to efficiently render the
732	      image of the source file to the destination file by replicating
733	      the file system formats at the block level.  In other cases, the
734	      source and destination might be two nodes sharing a common storage
735	      area network, and thus there is no need to copy any data at all,
736	      and instead ownership of the file and its contents simply gets re-
737	      assigned to the destination.

739	   o  The specification MUST provide guidance for using NFSv4.x as a
740	      copy protocol.  For those source and destination servers willing
741	      to use NFSv4.x there are specific security considerations that
742	      this specification can and does address.

744	   o  The specification MUST NOT mandate pre-configuration between the
745	      source and destination server.  Requiring that the source and
746	      destination first have a "copying relationship" increases the
747	      administrative burden.  However the specification MUST NOT
748	      preclude implementations that require pre-configuration.

750	   o  The specification MUST NOT mandate a trust relationship between
751	      the source and destination server.  The NFSv4 security model
752	      requires mutual authentication between a principal on an NFS
753	      client and a principal on an NFS server.  This model MUST continue
754	      with the introduction of COPY.

756	2.4.1.2.  Inter-Server Copy with RPCSEC_GSSv3

758	   When the client sends a COPY_NOTIFY to the source server to expect
759	   the destination to attempt to copy data from the source server, it is
760	   expected that this copy is being done on behalf of the principal
761	   (called the "user principal") that sent the RPC request that encloses
762	   the COMPOUND procedure that contains the COPY_NOTIFY operation.  The
763	   user principal is identified by the RPC credentials.  A mechanism
764	   that allows the user principal to authorize the destination server to
765	   perform the copy in a manner that lets the source server properly
766	   authenticate the destination's copy, and without allowing the
767	   destination to exceed its authorization is necessary.

769	   An approach that sends delegated credentials of the client's user
770	   principal to the destination server is not used for the following
771	   reasons.  If the client's user delegated its credentials, the
772	   destination would authenticate as the user principal.  If the
773	   destination were using the NFSv4 protocol to perform the copy, then
774	   the source server would authenticate the destination server as the
775	   user principal, and the file copy would securely proceed.  However,
776	   this approach would allow the destination server to copy other files.
777	   The user principal would have to trust the destination server to not
778	   do so.  This is counter to the requirements, and therefore is not
779	   considered.  Instead an approach using RPCSEC_GSSv3 [5] privileges is
780	   proposed.

782	   One of the stated applications of the proposed RPCSEC_GSSv3 protocol
783	   is compound client host and user authentication [+ privilege
784	   assertion].  For inter-server file copy, we require compound NFS
785	   server host and user authentication [+ privilege assertion].  The
786	   distinction between the two is one without meaning.

788	   RPCSEC_GSSv3 introduces the notion of privileges.  We define three
789	   privileges:

791	   copy_from_auth:  A user principal is authorizing a source principal
792	      ("nfs@<source>") to allow a destination principal ("nfs@
793	      <destination>") to copy a file from the source to the destination.
794	      This privilege is established on the source server before the user
795	      principal sends a COPY_NOTIFY operation to the source server.

797	   struct copy_from_auth_priv {
798	           secret4             cfap_shared_secret;
799	           netloc4             cfap_destination;
800	           /* the NFSv4 user name that the user principal maps to */
801	           utf8str_mixed       cfap_username;
802	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
803	           unsigned int        cfap_seq_num;
804	   };

806	      cfp_shared_secret is a secret value the user principal generates.

808	   copy_to_auth:  A user principal is authorizing a destination
809	      principal ("nfs@<destination>") to allow it to copy a file from
810	      the source to the destination.  This privilege is established on
811	      the destination server before the user principal sends a COPY
812	      operation to the destination server.

814	   struct copy_to_auth_priv {
815	           /* equal to cfap_shared_secret */
816	           secret4              ctap_shared_secret;
817	           netloc4              ctap_source;
818	           /* the NFSv4 user name that the user principal maps to */
819	           utf8str_mixed        ctap_username;
820	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
821	           unsigned int         ctap_seq_num;
822	   };

824	      ctap_shared_secret is a secret value the user principal generated
825	      and was used to establish the copy_from_auth privilege with the
826	      source principal.

828	   copy_confirm_auth:  A destination principal is confirming with the
829	      source principal that it is authorized to copy data from the
830	      source on behalf of the user principal.  When the inter-server
831	      copy protocol is NFSv4, or for that matter, any protocol capable
832	      of being secured via RPCSEC_GSSv3 (i.e., any ONC RPC protocol),
833	      this privilege is established before the file is copied from the
834	      source to the destination.

836	   struct copy_confirm_auth_priv {
837	           /* equal to GSS_GetMIC() of cfap_shared_secret */
838	           opaque              ccap_shared_secret_mic<>;
839	           /* the NFSv4 user name that the user principal maps to */
840	           utf8str_mixed       ccap_username;
841	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
842	           unsigned int        ccap_seq_num;
843	   };

845	2.4.1.2.1.  Establishing a Security Context

847	   When the user principal wants to COPY a file between two servers, if
848	   it has not established copy_from_auth and copy_to_auth privileges on
849	   the servers, it establishes them:

851	   o  The user principal generates a secret it will share with the two
852	      servers.  This shared secret will be placed in the
853	      cfap_shared_secret and ctap_shared_secret fields of the
854	      appropriate privilege data types, copy_from_auth_priv and
855	      copy_to_auth_priv.

857	   o  An instance of copy_from_auth_priv is filled in with the shared
858	      secret, the destination server, and the NFSv4 user id of the user
859	      principal.  It will be sent with an RPCSEC_GSS3_CREATE procedure,
860	      and so cfap_seq_num is set to the seq_num of the credential of the
861	      RPCSEC_GSS3_CREATE procedure.  Because cfap_shared_secret is a
862	      secret, after XDR encoding copy_from_auth_priv, GSS_Wrap() (with
863	      privacy) is invoked on copy_from_auth_priv.  The
864	      RPCSEC_GSS3_CREATE procedure's arguments are:

866	      struct {
867	         rpc_gss3_gss_binding    *compound_binding;
868	         rpc_gss3_chan_binding   *chan_binding_mic;
869	         rpc_gss3_assertion      assertions<>;
870	         rpc_gss3_extension      extensions<>;
871	      } rpc_gss3_create_args;

873	      The string "copy_from_auth" is placed in assertions[0].privs.  The
874	      output of GSS_Wrap() is placed in extensions[0].data.  The field
875	      extensions[0].critical is set to TRUE.  The source server calls
876	      GSS_Unwrap() on the privilege, and verifies that the seq_num
877	      matches the credential.  It then verifies that the NFSv4 user id
878	      being asserted matches the source server's mapping of the user
879	      principal.  If it does, the privilege is established on the source
880	      server as: <"copy_from_auth", user id, destination>.  The
881	      successful reply to RPCSEC_GSS3_CREATE has:

883	      struct {
884	         opaque                  handle<>;
885	         rpc_gss3_chan_binding   *chan_binding_mic;
886	         rpc_gss3_assertion      granted_assertions<>;
887	         rpc_gss3_assertion      server_assertions<>;
888	         rpc_gss3_extension      extensions<>;
889	      } rpc_gss3_create_res;

891	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
892	      use on COPY_NOTIFY requests involving the source and destination
893	      server. granted_assertions[0].privs will be equal to
894	      "copy_from_auth".  The server will return a GSS_Wrap() of
895	      copy_to_auth_priv.

897	   o  An instance of copy_to_auth_priv is filled in with the shared
898	      secret, the source server, and the NFSv4 user id.  It will be sent
899	      with an RPCSEC_GSS3_CREATE procedure, and so ctap_seq_num is set
900	      to the seq_num of the credential of the RPCSEC_GSS3_CREATE
901	      procedure.  Because ctap_shared_secret is a secret, after XDR
902	      encoding copy_to_auth_priv, GSS_Wrap() is invoked on
903	      copy_to_auth_priv.  The RPCSEC_GSS3_CREATE procedure's arguments
904	      are:

906	      struct {
907	         rpc_gss3_gss_binding    *compound_binding;
908	         rpc_gss3_chan_binding   *chan_binding_mic;
909	         rpc_gss3_assertion      assertions<>;
910	         rpc_gss3_extension      extensions<>;
911	      } rpc_gss3_create_args;

913	      The string "copy_to_auth" is placed in assertions[0].privs.  The
914	      output of GSS_Wrap() is placed in extensions[0].data.  The field
915	      extensions[0].critical is set to TRUE.  After unwrapping,
916	      verifying the seq_num, and the user principal to NFSv4 user ID
917	      mapping, the destination establishes a privilege of
918	      <"copy_to_auth", user id, source>.  The successful reply to
919	      RPCSEC_GSS3_CREATE has:

921	      struct {
922	         opaque                  handle<>;
923	         rpc_gss3_chan_binding   *chan_binding_mic;
924	         rpc_gss3_assertion      granted_assertions<>;
925	         rpc_gss3_assertion      server_assertions<>;
926	         rpc_gss3_extension      extensions<>;

928	      } rpc_gss3_create_res;

930	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
931	      use on COPY requests involving the source and destination server.
932	      The field granted_assertions[0].privs will be equal to
933	      "copy_to_auth".  The server will return a GSS_Wrap() of
934	      copy_to_auth_priv.

936	2.4.1.2.2.  Starting a Secure Inter-Server Copy

938	   When the client sends a COPY_NOTIFY request to the source server, it
939	   uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle.
940	   cna_destination_server in COPY_NOTIFY MUST be the same as the name of
941	   the destination server specified in copy_from_auth_priv.  Otherwise,
942	   COPY_NOTIFY will fail with NFS4ERR_ACCESS.  The source server
943	   verifies that the privilege <"copy_from_auth", user id, destination>
944	   exists, and annotates it with the source filehandle, if the user
945	   principal has read access to the source file, and if administrative
946	   policies give the user principal and the NFS client read access to
947	   the source file (i.e., if the ACCESS operation would grant read
948	   access).  Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS.

950	   When the client sends a COPY request to the destination server, it
951	   uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle.
952	   ca_source_server in COPY MUST be the same as the name of the source
953	   server specified in copy_to_auth_priv.  Otherwise, COPY will fail
954	   with NFS4ERR_ACCESS.  The destination server verifies that the
955	   privilege <"copy_to_auth", user id, source> exists, and annotates it
956	   with the source and destination filehandles.  If the client has
957	   failed to establish the "copy_to_auth" policy it will reject the
958	   request with NFS4ERR_PARTNER_NO_AUTH.

960	   If the client sends a OFFLOAD_REVOKE to the source server to rescind
961	   the destination server's copy privilege, it uses the privileged
962	   "copy_from_auth" RPCSEC_GSSv3 handle and the cra_destination_server
963	   in OFFLOAD_REVOKE MUST be the same as the name of the destination
964	   server specified in copy_from_auth_priv.  The source server will then
965	   delete the <"copy_from_auth", user id, destination> privilege and
966	   fail any subsequent copy requests sent under the auspices of this
967	   privilege from the destination server.

969	2.4.1.2.3.  Securing ONC RPC Server-to-Server Copy Protocols

971	   After a destination server has a "copy_to_auth" privilege established
972	   on it, and it receives a COPY request, if it knows it will use an ONC
973	   RPC protocol to copy data, it will establish a "copy_confirm_auth"
974	   privilege on the source server, using nfs@<destination> as the
975	   initiator principal, and nfs@<source> as the target principal.

977	   The value of the field ccap_shared_secret_mic is a GSS_VerifyMIC() of
978	   the shared secret passed in the copy_to_auth privilege.  The field
979	   ccap_username is the mapping of the user principal to an NFSv4 user
980	   name ("user"@"domain" form), and MUST be the same as ctap_username
981	   and cfap_username.  The field ccap_seq_num is the seq_num of the
982	   RPCSEC_GSSv3 credential used for the RPCSEC_GSS3_CREATE procedure the
983	   destination will send to the source server to establish the
984	   privilege.

986	   The source server verifies the privilege, and establishes a
987	   <"copy_confirm_auth", user id, destination> privilege.  If the source
988	   server fails to verify the privilege, the COPY operation will be
989	   rejected with NFS4ERR_PARTNER_NO_AUTH.  All subsequent ONC RPC
990	   requests sent from the destination to copy data from the source to
991	   the destination will use the RPCSEC_GSSv3 handle returned by the
992	   source's RPCSEC_GSS3_CREATE response.

994	   Note that the use of the "copy_confirm_auth" privilege accomplishes
995	   the following:

997	   o  if a protocol like NFS is being used, with export policies, export
998	      policies can be overridden in case the destination server as-an-
999	      NFS-client is not authorized

1001	   o  manual configuration to allow a copy relationship between the
1002	      source and destination is not needed.

1004	   If the attempt to establish a "copy_confirm_auth" privilege fails,
1005	   then when the user principal sends a COPY request to destination, the
1006	   destination server will reject it with NFS4ERR_PARTNER_NO_AUTH.

1008	2.4.1.2.4.  Securing Non ONC RPC Server-to-Server Copy Protocols

1010	   If the destination won't be using ONC RPC to copy the data, then the
1011	   source and destination are using an unspecified copy protocol.  The
1012	   destination could use the shared secret and the NFSv4 user id to
1013	   prove to the source server that the user principal has authorized the
1014	   copy.

1016	   For protocols that authenticate user names with passwords (e.g., HTTP
1017	   [13] and FTP [14]), the nfsv4 user id could be used as the user name,
1018	   and an ASCII hexadecimal representation of the RPCSEC_GSSv3 shared
1019	   secret could be used as the user password or as input into non-
1020	   password authentication methods like CHAP [15].

1022	2.4.1.3.  Inter-Server Copy via ONC RPC but without RPCSEC_GSSv3

1024	   ONC RPC security flavors other than RPCSEC_GSSv3 MAY be used with the
1025	   server-side copy offload operations described in this chapter.  In
1026	   particular, host-based ONC RPC security flavors such as AUTH_NONE and
1027	   AUTH_SYS MAY be used.  If a host-based security flavor is used, a
1028	   minimal level of protection for the server-to-server copy protocol is
1029	   possible.

1031	   In the absence of strong security mechanisms such as RPCSEC_GSSv3,
1032	   the challenge is how the source server and destination server
1033	   identify themselves to each other, especially in the presence of
1034	   multi-homed source and destination servers.  In a multi-homed
1035	   environment, the destination server might not contact the source
1036	   server from the same network address specified by the client in the
1037	   COPY_NOTIFY.  This can be overcome using the procedure described
1038	   below.

1040	   When the client sends the source server the COPY_NOTIFY operation,
1041	   the source server may reply to the client with a list of target
1042	   addresses, names, and/or URLs and assign them to the unique
1043	   quadruple: <random number, source fh, user ID, destination address
1044	   Y>.  If the destination uses one of these target netlocs to contact
1045	   the source server, the source server will be able to uniquely
1046	   identify the destination server, even if the destination server does
1047	   not connect from the address specified by the client in COPY_NOTIFY.
1048	   The level of assurance in this identification depends on the
1049	   unpredictability, strength and secrecy of the random number.

1051	   For example, suppose the network topology is as shown in Figure 3.
1052	   If the source filehandle is 0x12345, the source server may respond to
1053	   a COPY_NOTIFY for destination 10.11.78.56 with the URLs:

1055	      nfs://10.11.78.18//_COPY/FvhH1OKbu8VrxvV1erdjvR7N/10.11.78.56/_FH/
1056	      0x12345

1058	      nfs://192.168.33.18//_COPY/FvhH1OKbu8VrxvV1erdjvR7N/10.11.78.56/
1059	      _FH/0x12345

1061	   The name component after _COPY is 24 characters of base 64, more than
1062	   enough to encode a 128 bit random number.

1064	   The client will then send these URLs to the destination server in the
1065	   COPY operation.  Suppose that the 192.168.33.0/24 network is a high
1066	   speed network and the destination server decides to transfer the file
1067	   over this network.  If the destination contacts the source server
1068	   from 192.168.33.56 over this network using NFSv4.1, it does the
1069	   following:

1071	   COMPOUND  { PUTROOTFH, LOOKUP "_COPY" ; LOOKUP
1072	      "FvhH1OKbu8VrxvV1erdjvR7N" ; LOOKUP "10.11.78.56"; LOOKUP "_FH" ;
1073	      OPEN "0x12345" ; GETFH }

1075	   Provided that the random number is unpredictable and has been kept
1076	   secret by the parties involved, the source server will therefore know
1077	   that these NFSv4.x operations are being issued by the destination
1078	   server identified in the COPY_NOTIFY.  This random number technique
1079	   only provides initial authentication of the destination server, and
1080	   cannot defend against man-in-the-middle attacks after authentication
1081	   or an eavesdropper that observes the random number on the wire.
1082	   Other secure communication techniques (e.g., IPsec) are necessary to
1083	   block these attacks.

1085	2.4.1.4.  Inter-Server Copy without ONC RPC and RPCSEC_GSSv3

1087	   The same techniques as Section 2.4.1.3, using unique URLs for each
1088	   destination server, can be used for other protocols (e.g., HTTP [13]
1089	   and FTP [14]) as well.

1091	3.  Support for Application IO Hints

1093	   Applications can issue client I/O hints via posix_fadvise() [6] to
1094	   the NFS client.  While this can help the NFS client optimize I/O and
1095	   caching for a file, it does not allow the NFS server and its exported
1096	   file system to do likewise.  We add an IO_ADVISE procedure
1097	   (Section 13.8) to communicate the client file access patterns to the
1098	   NFS server.  The NFS server upon receiving a IO_ADVISE operation MAY
1099	   choose to alter its I/O and caching behavior, but is under no
1100	   obligation to do so.

1102	   Application specific NFS clients such as those used by hypervisors
1103	   and databases can also leverage application hints to communicate
1104	   their specialized requirements.

1106	4.  Sparse Files

1108	4.1.  Introduction

1110	   A sparse file is a common way of representing a large file without
1111	   having to utilize all of the disk space for it.  Consequently, a
1112	   sparse file uses less physical space than its size indicates.  This
1113	   means the file contains 'holes', byte ranges within the file that
1114	   contain no data.  Most modern file systems support sparse files,
1115	   including most UNIX file systems and NTFS, but notably not Apple's
1116	   HFS+.  Common examples of sparse files include Virtual Machine (VM)
1117	   OS/disk images, database files, log files, and even checkpoint
1118	   recovery files most commonly used by the HPC community.

1120	   If an application reads a hole in a sparse file, the file system must
1121	   return all zeros to the application.  For local data access there is
1122	   little penalty, but with NFS these zeroes must be transferred back to
1123	   the client.  If an application uses the NFS client to read data into
1124	   memory, this wastes time and bandwidth as the application waits for
1125	   the zeroes to be transferred.

1127	   A sparse file is typically created by initializing the file to be all
1128	   zeros - nothing is written to the data in the file, instead the hole
1129	   is recorded in the metadata for the file.  So a 8G disk image might
1130	   be represented initially by a couple hundred bits in the inode and
1131	   nothing on the disk.  If the VM then writes 100M to a file in the
1132	   middle of the image, there would now be two holes represented in the
1133	   metadata and 100M in the data.

1135	   Two new operations INITIALIZE (Section 13.7) and READ_PLUS
1136	   (Section 13.10) are introduced.  INITIALIZE allows for the creation
1137	   of a sparse file and for hole punching.  An application might want to
1138	   zero out a range of the file.  READ_PLUS supports all the features of
1139	   READ but includes an extension to support sparse pattern files
1140	   (Section 6.1.2).  READ_PLUS is guaranteed to perform no worse than
1141	   READ, and can dramatically improve performance with sparse files.
1142	   READ_PLUS does not depend on pNFS protocol features, but can be used
1143	   by pNFS to support sparse files.

1145	4.2.  Terminology

1147	   Regular file:  An object of file type NF4REG or NF4NAMEDATTR.

1149	   Sparse file:  A Regular file that contains one or more Holes.

1151	   Hole:  A byte range within a Sparse file that contains regions of all
1152	      zeroes.  For block-based file systems, this could also be an
1153	      unallocated region of the file.

1155	   Hole Threshold:  The minimum length of a Hole as determined by the
1156	      server.  If a server chooses to define a Hole Threshold, then it
1157	      would not return hole information about holes with a length
1158	      shorter than the Hole Threshold.

1160	5.  Space Reservation
1161	5.1.  Introduction

1163	   This section describes a set of operations that allow applications
1164	   such as hypervisors to reserve space for a file, report the amount of
1165	   actual disk space a file occupies and freeup the backing space of a
1166	   file when it is not required.  In virtualized environments, virtual
1167	   disk files are often stored on NFS mounted volumes.  Since virtual
1168	   disk files represent the hard disks of virtual machines, hypervisors
1169	   often have to guarantee certain properties for the file.

1171	   One such example is space reservation.  When a hypervisor creates a
1172	   virtual disk file, it often tries to preallocate the space for the
1173	   file so that there are no future allocation related errors during the
1174	   operation of the virtual machine.  Such errors prevent a virtual
1175	   machine from continuing execution and result in downtime.

1177	   Currently, in order to achieve such a guarantee, applications zero
1178	   the entire file.  The initial zeroing allocates the backing blocks
1179	   and all subsequent writes are overwrites of already allocated blocks.
1180	   This approach is not only inefficient in terms of the amount of I/O
1181	   done, it is also not guaranteed to work on file systems that are log
1182	   structured or deduplicated.  An efficient way of guaranteeing space
1183	   reservation would be beneficial to such applications.

1185	   If the space_reserved attribute (see Section 11.2.3) is set on a
1186	   file, it is guaranteed that writes that do not grow the file will not
1187	   fail with NFSERR_NOSPC.

1189	   Another useful feature would be the ability to report the number of
1190	   blocks that would be freed when a file is deleted.  Currently, NFS
1191	   reports two size attributes:

1193	   size  The logical file size of the file.

1195	   space_used  The size in bytes that the file occupies on disk

1197	   While these attributes are sufficient for space accounting in
1198	   traditional file systems, they prove to be inadequate in modern file
1199	   systems that support block sharing.  In such file systems, multiple
1200	   inodes can point to a single block with a block reference count to
1201	   guard against premature freeing.  Having a way to tell the number of
1202	   blocks that would be freed if the file was deleted would be useful to
1203	   applications that wish to migrate files when a volume is low on
1204	   space.

1206	   Since virtual disks represent a hard drive in a virtual machine, a
1207	   virtual disk can be viewed as a file system within a file.  Since not
1208	   all blocks within a file system are in use, there is an opportunity
1209	   to reclaim blocks that are no longer in use.  A call to deallocate
1210	   blocks could result in better space efficiency.  Lesser space MAY be
1211	   consumed for backups after block deallocation.

1213	   The following operations and attributes can be used to resolve this
1214	   issues:

1216	   space_reserved  This attribute specifies whether the blocks backing
1217	      the file have been preallocated.

1219	   space_freed  This attribute specifies the space freed when a file is
1220	      deleted, taking block sharing into consideration.

1222	   INITIALIZE  This operation zeroes and/or deallocates the blocks
1223	      backing a region of the file.

1225	   If space_used of a file is interpreted to mean the size in bytes of
1226	   all disk blocks pointed to by the inode of the file, then shared
1227	   blocks get double counted, over-reporting the space utilization.
1228	   This also has the adverse effect that the deletion of a file with
1229	   shared blocks frees up less than space_used bytes.

1231	   On the other hand, if space_used is interpreted to mean the size in
1232	   bytes of those disk blocks unique to the inode of the file, then
1233	   shared blocks are not counted in any file, resulting in under-
1234	   reporting of the space utilization.

1236	   For example, two files A and B have 10 blocks each.  Let 6 of these
1237	   blocks be shared between them.  Thus, the combined space utilized by
1238	   the two files is 14 * BLOCK_SIZE bytes.  In the former case, the
1239	   combined space utilization of the two files would be reported as 20 *
1240	   BLOCK_SIZE.  However, deleting either would only result in 4 *
1241	   BLOCK_SIZE being freed.  Conversely, the latter interpretation would
1242	   report that the space utilization is only 8 * BLOCK_SIZE.

1244	   Adding another size attribute, space_freed (see Section 11.2.4), is
1245	   helpful in solving this problem. space_freed is the number of blocks
1246	   that are allocated to the given file that would be freed on its
1247	   deletion.  In the example, both A and B would report space_freed as 4
1248	   * BLOCK_SIZE and space_used as 10 * BLOCK_SIZE.  If A is deleted, B
1249	   will report space_freed as 10 * BLOCK_SIZE as the deletion of B would
1250	   result in the deallocation of all 10 blocks.

1252	   The addition of this problem doesn't solve the problem of space being
1253	   over-reported.  However, over-reporting is better than under-
1254	   reporting.

1256	6.  Application Data Hole Support

1258	   At the OS level, files are contained on disk blocks.  Applications
1259	   are also free to impose structure on the data contained in a file and
1260	   we can define an Application Data Block (ADB) to be such a structure.
1261	   From the application's viewpoint, it only wants to handle ADBs and
1262	   not raw bytes (see [16]).  An ADB is typically comprised of two
1263	   sections: a header and data.  The header describes the
1264	   characteristics of the block and can provide a means to detect
1265	   corruption in the data payload.  The data section is typically
1266	   initialized to all zeros.

1268	   The format of the header is application specific, but there are two
1269	   main components typically encountered:

1271	   1.  A logical block number which allows the application to determine
1272	       which data block is being referenced.  This is useful when the
1273	       client is not storing the blocks in contiguous memory.

1275	   2.  Fields to describe the state of the ADB and a means to detect
1276	       block corruption.  For both pieces of data, a useful property is
1277	       that allowed values be unique in that if passed across the
1278	       network, corruption due to translation between big and little
1279	       endian architectures are detectable.  For example, 0xF0DEDEF0 has
1280	       the same bit pattern in both architectures.

1282	   Applications already impose structures on files [16] and detect
1283	   corruption in data blocks [17].  What they are not able to do is
1284	   efficiently transfer and store ADBs.  To initialize a file with ADBs,
1285	   the client must send the full ADB to the server and that must be
1286	   stored on the server.

1288	   In this section, we are going to define an Application Data Hole
1289	   (ADH), which is a generic framework for transfering the ADB, present
1290	   one approach to detecting corruption in a given ADH implementation,
1291	   and describe the model for how the client and server can support
1292	   efficient initialization of ADHs, reading of ADH holes, punching ADH
1293	   holes in a file, and space reservation.  We define the ADHN to be the
1294	   Application Data Hole Number, which is the logical block number
1295	   discussed earlier.

1297	6.1.  Generic Framework

1299	   We want the representation of the ADH to be flexible enough to
1300	   support many different applications.  The most basic approach is no
1301	   imposition of a block at all, which means we are working with the raw
1302	   bytes.  Such an approach would be useful for storing holes, punching
1303	   holes, etc.  In more complex deployments, a server might be
1304	   supporting multiple applications, each with their own definition of
1305	   the ADH.  One might store the ADHN at the start of the block and then
1306	   have a guard pattern to detect corruption [18].  The next might store
1307	   the ADHN at an offset of 100 bytes within the block and have no guard
1308	   pattern at all, i.e., existing applications might already have well
1309	   defined formats for their data blocks.

1311	   The guard pattern can be used to represent the state of the block, to
1312	   protect against corruption, or both.  Again, it needs to be able to
1313	   be placed anywhere within the ADH.

1315	   We need to be able to represent the starting offset of the block and
1316	   the size of the block.  Note that nothing prevents the application
1317	   from defining different sized blocks in a file.

1319	6.1.1.  Data Hole Representation

1321	   struct app_data_hole4 {
1322	           offset4         adh_offset;
1323	           length4         adh_block_size;
1324	           length4         adh_block_count;
1325	           length4         adh_reloff_blocknum;
1326	           count4          adh_block_num;
1327	           length4         adh_reloff_pattern;
1328	           opaque          adh_pattern<>;
1329	   };

1331	   The app_data_hole4 structure captures the abstraction presented for
1332	   the ADH.  The additional fields present are to allow the transmission
1333	   of adh_block_count ADHs at one time.  We also use adh_block_num to
1334	   convey the ADHN of the first block in the sequence.  Each ADH will
1335	   contain the same adh_pattern string.

1337	   As both adh_block_num and adh_pattern are optional, if either
1338	   adh_reloff_pattern or adh_reloff_blocknum is set to NFS4_UINT64_MAX,
1339	   then the corresponding field is not set in any of the ADH.

1341	6.1.2.  Data Content

1343	   /*
1344	    * Use an enum such that we can extend new types.
1345	    */
1346	   enum data_content4 {
1347	           NFS4_CONTENT_DATA = 0,
1348	           NFS4_CONTENT_APP_DATA_HOLE = 1,
1349	           NFS4_CONTENT_HOLE = 2
1350	   };
1351	   New operations might need to differentiate between wanting to access
1352	   data versus an ADH.  Also, future minor versions might want to
1353	   introduce new data formats.  This enumeration allows that to occur.

1355	6.2.  An Example of Detecting Corruption

1357	   In this section, we define an ADH format in which corruption can be
1358	   detected.  Note that this is just one possible format and means to
1359	   detect corruption.

1361	   Consider a very basic implementation of an operating system's disk
1362	   blocks.  A block is either data or it is an indirect block which
1363	   allows for files to be larger than one block.  It is desired to be
1364	   able to initialize a block.  Lastly, to quickly unlink a file, a
1365	   block can be marked invalid.  The contents remain intact - which
1366	   would enable this OS application to undelete a file.

1368	   The application defines 4k sized data blocks, with an 8 byte block
1369	   counter occurring at offset 0 in the block, and with the guard
1370	   pattern occurring at offset 8 inside the block.  Furthermore, the
1371	   guard pattern can take one of four states:

1373	   0xfeedface -   This is the FREE state and indicates that the ADH
1374	      format has been applied.

1376	   0xcafedead -   This is the DATA state and indicates that real data
1377	      has been written to this block.

1379	   0xe4e5c001 -   This is the INDIRECT state and indicates that the
1380	      block contains block counter numbers that are chained off of this
1381	      block.

1383	   0xba1ed4a3 -   This is the INVALID state and indicates that the block
1384	      contains data whose contents are garbage.

1386	   Finally, it also defines an 8 byte checksum [19] starting at byte 16
1387	   which applies to the remaining contents of the block.  If the state
1388	   is FREE, then that checksum is trivially zero.  As such, the
1389	   application has no need to transfer the checksum implicitly inside
1390	   the ADH - it need not make the transfer layer aware of the fact that
1391	   there is a checksum (see [17] for an example of checksums used to
1392	   detect corruption in application data blocks).

1394	   Corruption in each ADH can be detected thusly:

1396	   o  If the guard pattern is anything other than one of the allowed
1397	      values, including all zeros.

1399	   o  If the guard pattern is FREE and any other byte in the remainder
1400	      of the ADH is anything other than zero.

1402	   o  If the guard pattern is anything other than FREE, then if the
1403	      stored checksum does not match the computed checksum.

1405	   o  If the guard pattern is INDIRECT and one of the stored indirect
1406	      block numbers has a value greater than the number of ADHs in the
1407	      file.

1409	   o  If the guard pattern is INDIRECT and one of the stored indirect
1410	      block numbers is a duplicate of another stored indirect block
1411	      number.

1413	   As can be seen, the application can detect errors based on the
1414	   combination of the guard pattern state and the checksum.  But also,
1415	   the application can detect corruption based on the state and the
1416	   contents of the ADH.  This last point is important in validating the
1417	   minimum amount of data we incorporated into our generic framework.
1418	   I.e., the guard pattern is sufficient in allowing applications to
1419	   design their own corruption detection.

1421	   Finally, it is important to note that none of these corruption checks
1422	   occur in the transport layer.  The server and client components are
1423	   totally unaware of the file format and might report everything as
1424	   being transferred correctly even in the case the application detects
1425	   corruption.

1427	6.3.  Example of READ_PLUS

1429	   The hypothetical application presented in Section 6.2 can be used to
1430	   illustrate how READ_PLUS would return an array of results.  A file is
1431	   created and initialized with 100 4k ADHs in the FREE state:

1433	      INITIALIZE {0, 4k, 100, 0, 0, 8, 0xfeedface}

1435	   Further, assume the application writes a single ADH at 16k, changing
1436	   the guard pattern to 0xcafedead, we would then have in memory:

1438	      0 -> (16k - 1)   : 4k, 4, 0, 0, 8, 0xfeedface
1439	      16k -> (20k - 1) : 00 00 00 05 ca fe de ad XX XX ... XX XX
1440	      20k -> 400k      : 4k, 95, 0, 6, 0xfeedface

1442	   And when the client did a READ_PLUS of 64k at the start of the file,
1443	   it would get back a result of an ADH, some data, and a final ADH:

1445	      ADH {0, 4, 0, 0, 8, 0xfeedface}
1446	      data 4k
1447	      ADH {20k, 4k, 59, 0, 6, 0xfeedface}

1449	7.  Labeled NFS

1451	7.1.  Introduction

1453	   Access control models such as Unix permissions or Access Control
1454	   Lists are commonly referred to as Discretionary Access Control (DAC)
1455	   models.  These systems base their access decisions on user identity
1456	   and resource ownership.  In contrast Mandatory Access Control (MAC)
1457	   models base their access control decisions on the label on the
1458	   subject (usually a process) and the object it wishes to access [7].
1459	   These labels may contain user identity information but usually
1460	   contain additional information.  In DAC systems users are free to
1461	   specify the access rules for resources that they own.  MAC models
1462	   base their security decisions on a system wide policy established by
1463	   an administrator or organization which the users do not have the
1464	   ability to override.  In this section, we add a MAC model to NFSv4.2.

1466	   The first change necessary is to devise a method for transporting and
1467	   storing security label data on NFSv4 file objects.  Security labels
1468	   have several semantics that are met by NFSv4 recommended attributes
1469	   such as the ability to set the label value upon object creation.
1470	   Access control on these attributes are done through a combination of
1471	   two mechanisms.  As with other recommended attributes on file objects
1472	   the usual DAC checks (ACLs and permission bits) will be performed to
1473	   ensure that proper file ownership is enforced.  In addition a MAC
1474	   system MAY be employed on the client, server, or both to enforce
1475	   additional policy on what subjects may modify security label
1476	   information.

1478	   The second change is to provide a method for the server to notify the
1479	   client that the attribute changed on an open file on the server.  If
1480	   the file is closed, then during the open attempt, the client will
1481	   gather the new attribute value.  The server MUST not communicate the
1482	   new value of the attribute, the client MUST query it.  This
1483	   requirement stems from the need for the client to provide sufficient
1484	   access rights to the attribute.

1486	   The final change necessary is a modification to the RPC layer used in
1487	   NFSv4 in the form of a new version of the RPCSEC_GSS [8] framework.
1488	   In order for an NFSv4 server to apply MAC checks it must obtain
1489	   additional information from the client.  Several methods were
1490	   explored for performing this and it was decided that the best
1491	   approach was to incorporate the ability to make security attribute
1492	   assertions through the RPC mechanism.  RPCSECGSSv3 [5] outlines a
1493	   method to assert additional security information such as security
1494	   labels on gss context creation and have that data bound to all RPC
1495	   requests that make use of that context.

1497	7.2.  Definitions

1499	   Label Format Specifier (LFS):  is an identifier used by the client to
1500	      establish the syntactic format of the security label and the
1501	      semantic meaning of its components.  These specifiers exist in a
1502	      registry associated with documents describing the format and
1503	      semantics of the label.

1505	   Label Format Registry:  is the IANA registry containing all
1506	      registered LFS along with references to the documents that
1507	      describe the syntactic format and semantics of the security label.

1509	   Policy Identifier (PI):  is an optional part of the definition of a
1510	      Label Format Specifier which allows for clients and server to
1511	      identify specific security policies.

1513	   Object:  is a passive resource within the system that we wish to be
1514	      protected.  Objects can be entities such as files, directories,
1515	      pipes, sockets, and many other system resources relevant to the
1516	      protection of the system state.

1518	   Subject:  is an active entity usually a process which is requesting
1519	      access to an object.

1521	   MAC-Aware:  is a server which can transmit and store object labels.

1523	   MAC-Functional:  is a client or server which is Labeled NFS enabled.
1524	      Such a system can interpret labels and apply policies based on the
1525	      security system.

1527	   Multi-Level Security (MLS):  is a traditional model where objects are
1528	      given a sensitivity level (Unclassified, Secret, Top Secret, etc)
1529	      and a category set [20].

1531	7.3.  MAC Security Attribute

1533	   MAC models base access decisions on security attributes bound to
1534	   subjects and objects.  This information can range from a user
1535	   identity for an identity based MAC model, sensitivity levels for
1536	   Multi-level security, or a type for Type Enforcement.  These models
1537	   base their decisions on different criteria but the semantics of the
1538	   security attribute remain the same.  The semantics required by the
1539	   security attributes are listed below:

1541	   o  MUST provide flexibility with respect to the MAC model.

1543	   o  MUST provide the ability to atomically set security information
1544	      upon object creation.

1546	   o  MUST provide the ability to enforce access control decisions both
1547	      on the client and the server.

1549	   o  MUST not expose an object to either the client or server name
1550	      space before its security information has been bound to it.

1552	   NFSv4 implements the security attribute as a recommended attribute.
1553	   These attributes have a fixed format and semantics, which conflicts
1554	   with the flexible nature of the security attribute.  To resolve this
1555	   the security attribute consists of two components.  The first
1556	   component is a LFS as defined in [21] to allow for interoperability
1557	   between MAC mechanisms.  The second component is an opaque field
1558	   which is the actual security attribute data.  To allow for various
1559	   MAC models, NFSv4 should be used solely as a transport mechanism for
1560	   the security attribute.  It is the responsibility of the endpoints to
1561	   consume the security attribute and make access decisions based on
1562	   their respective models.  In addition, creation of objects through
1563	   OPEN and CREATE allows for the security attribute to be specified
1564	   upon creation.  By providing an atomic create and set operation for
1565	   the security attribute it is possible to enforce the second and
1566	   fourth requirements.  The recommended attribute FATTR4_SEC_LABEL (see
1567	   Section 11.2.2) will be used to satisfy this requirement.

1569	7.3.1.  Delegations

1571	   In the event that a security attribute is changed on the server while
1572	   a client holds a delegation on the file, both the server and the
1573	   client MUST follow the NFSv4.1 protocol (see Chapter 10 of [2]) with
1574	   respect to attribute changes.  It SHOULD flush all changes back to
1575	   the server and relinquish the delegation.

1577	7.3.2.  Permission Checking

1579	   It is not feasible to enumerate all possible MAC models and even
1580	   levels of protection within a subset of these models.  This means
1581	   that the NFSv4 client and servers cannot be expected to directly make
1582	   access control decisions based on the security attribute.  Instead
1583	   NFSv4 should defer permission checking on this attribute to the host
1584	   system.  These checks are performed in addition to existing DAC and
1585	   ACL checks outlined in the NFSv4 protocol.  Section 7.6 gives a
1586	   specific example of how the security attribute is handled under a
1587	   particular MAC model.

1589	7.3.3.  Object Creation

1591	   When creating files in NFSv4 the OPEN and CREATE operations are used.
1592	   One of the parameters to these operations is an fattr4 structure
1593	   containing the attributes the file is to be created with.  This
1594	   allows NFSv4 to atomically set the security attribute of files upon
1595	   creation.  When a client is MAC-Functional it must always provide the
1596	   initial security attribute upon file creation.  In the event that the
1597	   server is MAC-Functional as well, it should determine by policy
1598	   whether it will accept the attribute from the client or instead make
1599	   the determination itself.  If the client is not MAC-Functional, then
1600	   the MAC-Functional server must decide on a default label.  A more in
1601	   depth explanation can be found in Section 7.6.

1603	7.3.4.  Existing Objects

1605	   Note that under the MAC model, all objects must have labels.
1606	   Therefore, if an existing server is upgraded to include Labeled NFS
1607	   support, then it is the responsibility of the security system to
1608	   define the behavior for existing objects.

1610	7.3.5.  Label Changes

1612	   As per the requirements, when a file's security label is modified,
1613	   the server must notify all clients which have the file opened of the
1614	   change in label.  It does so with CB_ATTR_CHANGED.  There are
1615	   preconditions to making an attribute change imposed by NFSv4 and the
1616	   security system might want to impose others.  In the process of
1617	   meeting these preconditions, the server may chose to either serve the
1618	   request in whole or return NFS4ERR_DELAY to the SETATTR operation.

1620	   If there are open delegations on the file belonging to client other
1621	   than the one making the label change, then the process described in
1622	   Section 7.3.1 must be followed.

1624	   As the server is always presented with the subject label from the
1625	   client, it does not necessarily need to communicate the fact that the
1626	   label has changed to the client.  In the cases where the change
1627	   outright denies the client access, the client will be able to quickly
1628	   determine that there is a new label in effect.  It is in cases where
1629	   the client may share the same object between multiple subjects or a
1630	   security system which is not strictly hierarchical that the
1631	   CB_ATTR_CHANGED callback is very useful.  It allows the server to
1632	   inform the clients that the cached security attribute is now stale.

1634	   Consider a system in which the clients enforce MAC checks and and the
1635	   server has a very simple security system which just stores the
1636	   labels.  In this system, the MAC label check always allows access,
1637	   regardless of the subject label.

1639	   The way in which MAC labels are enforced is by the client.  So if
1640	   client A changes a security label on a file, then the server MUST
1641	   inform all clients that have the file opened that the label has
1642	   changed via CB_ATTR_CHANGED.  Then the clients MUST retrieve the new
1643	   label and MUST enforce access via the new attribute values.

1645	7.4.  pNFS Considerations

1647	   This section examines the issues in deploying Labeled NFS in a pNFS
1648	   community of servers.

1650	7.4.1.  MAC Label Checks

1652	   The new FATTR4_SEC_LABEL attribute is metadata information and as
1653	   such the DS is not aware of the value contained on the MDS.
1654	   Fortunately, the NFSv4.1 protocol [2] already has provisions for
1655	   doing access level checks from the DS to the MDS.  In order for the
1656	   DS to validate the subject label presented by the client, it SHOULD
1657	   utilize this mechanism.

1659	   If a file's FATTR4_SEC_LABEL is changed, then the MDS should utilize
1660	   CB_ATTR_CHANGED to inform the client of that fact.  If the MDS is
1661	   maintaining [[Comment.2: Houston, we seem to have a problem! --TH]]

1663	7.5.  Discovery of Server Labeled NFS Support

1665	   The server can easily determine that a client supports Labeled NFS
1666	   when it queries for the FATTR4_SEC_LABEL label for an object.  Note
1667	   that it cannot assume that the presence of RPCSEC_GSSv3 indicates
1668	   Labeled NFS support.  The client might need to discover which LFS the
1669	   server supports.

1671	   A server which supports Labeled NFS MUST allow a client with any
1672	   subject label to retrieve the FATTR4_SEC_LABEL attribute for the root
1673	   filehandle, ROOTFH.  The following compound must always succeed as
1674	   far as a MAC label check is concerned:

1676	        PUTROOTFH, GETATTR {FATTR4_SEC_LABEL}

1678	   Note that the server might have imposed a security flavor on the root
1679	   that precludes such access.  I.e., if the server requires kerberized
1680	   access and the client presents a compound with AUTH_SYS, then the
1681	   server is allowed to return NFS4ERR_WRONGSEC in this case.  But if
1682	   the client presents a correct security flavor, then the server MUST
1683	   return the FATTR4_SEC_LABEL attribute with the supported LFS filled
1684	   in.

1686	7.6.  MAC Security NFS Modes of Operation

1688	   A system using Labeled NFS may operate in two modes.  The first mode
1689	   provides the most protection and is called "full mode".  In this mode
1690	   both the client and server implement a MAC model allowing each end to
1691	   make an access control decision.  The remaining mode is called the
1692	   "guest mode" and in this mode one end of the connection is not
1693	   implementing a MAC model and thus offers less protection than full
1694	   mode.

1696	7.6.1.  Full Mode

1698	   Full mode environments consist of MAC-Functional NFSv4 servers and
1699	   clients and may be composed of mixed MAC models and policies.  The
1700	   system requires that both the client and server have an opportunity
1701	   to perform an access control check based on all relevant information
1702	   within the network.  The file object security attribute is provided
1703	   using the mechanism described in Section 7.3.  The security attribute
1704	   of the subject making the request is transported at the RPC layer
1705	   using the mechanism described in RPCSECGSSv3 [5].

1707	7.6.1.1.  Initial Labeling and Translation

1709	   The ability to create a file is an action that a MAC model may wish
1710	   to mediate.  The client is given the responsibility to determine the
1711	   initial security attribute to be placed on a file.  This allows the
1712	   client to make a decision as to the acceptable security attributes to
1713	   create a file with before sending the request to the server.  Once
1714	   the server receives the creation request from the client it may
1715	   choose to evaluate if the security attribute is acceptable.

1717	   Security attributes on the client and server may vary based on MAC
1718	   model and policy.  To handle this the security attribute field has an
1719	   LFS component.  This component is a mechanism for the host to
1720	   identify the format and meaning of the opaque portion of the security
1721	   attribute.  A full mode environment may contain hosts operating in
1722	   several different LFSs.  In this case a mechanism for translating the
1723	   opaque portion of the security attribute is needed.  The actual
1724	   translation function will vary based on MAC model and policy and is
1725	   out of the scope of this document.  If a translation is unavailable
1726	   for a given LFS then the request MUST be denied.  Another recourse is
1727	   to allow the host to provide a fallback mapping for unknown security
1728	   attributes.

1730	7.6.1.2.  Policy Enforcement

1732	   In full mode access control decisions are made by both the clients
1733	   and servers.  When a client makes a request it takes the security
1734	   attribute from the requesting process and makes an access control
1735	   decision based on that attribute and the security attribute of the
1736	   object it is trying to access.  If the client denies that access an
1737	   RPC call to the server is never made.  If however the access is
1738	   allowed the client will make a call to the NFS server.

1740	   When the server receives the request from the client it extracts the
1741	   security attribute conveyed in the RPC request.  The server then uses
1742	   this security attribute and the attribute of the object the client is
1743	   trying to access to make an access control decision.  If the server's
1744	   policy allows this access it will fulfill the client's request,
1745	   otherwise it will return NFS4ERR_ACCESS.

1747	   Implementations MAY validate security attributes supplied over the
1748	   network to ensure that they are within a set of attributes permitted
1749	   from a specific peer, and if not, reject them.  Note that a system
1750	   may permit a different set of attributes to be accepted from each
1751	   peer.

1753	7.6.1.3.  Limited Server

1755	   A Limited Server mode (see Section 3.5.2 of [7]) consists of a server
1756	   which is label aware, but does not enforce policies.  Such a server
1757	   will store and retrieve all object labels presented by clients,
1758	   notify the clients of any label changes via CB_ATTR_CHANGED, but will
1759	   not restrict access via the subject label.  Instead, it will expect
1760	   the clients to enforce all such access locally.

1762	7.6.2.  Guest Mode

1764	   Guest mode implies that either the client or the server does not
1765	   handle labels.  If the client is not Labeled NFS aware, then it will
1766	   not offer subject labels to the server.  The server is the only
1767	   entity enforcing policy, and may selectively provide standard NFS
1768	   services to clients based on their authentication credentials and/or
1769	   associated network attributes (e.g., IP address, network interface).
1770	   The level of trust and access extended to a client in this mode is
1771	   configuration-specific.  If the server is not Labeled NFS aware, then
1772	   it will not return object labels to the client.  Clients in this
1773	   environment are may consist of groups implementing different MAC
1774	   model policies.  The system requires that all clients in the
1775	   environment be responsible for access control checks.

1777	7.7.  Security Considerations

1779	   This entire chapter deals with security issues.

1781	   Depending on the level of protection the MAC system offers there may
1782	   be a requirement to tightly bind the security attribute to the data.

1784	   When only one of the client or server enforces labels, it is
1785	   important to realize that the other side is not enforcing MAC
1786	   protections.  Alternate methods might be in use to handle the lack of
1787	   MAC support and care should be taken to identify and mitigate threats
1788	   from possible tampering outside of these methods.

1790	   An example of this is that a server that modifies READDIR or LOOKUP
1791	   results based on the client's subject label might want to always
1792	   construct the same subject label for a client which does not present
1793	   one.  This will prevent a non-Labeled NFS client from mixing entries
1794	   in the directory cache.

1796	8.  Sharing change attribute implementation details with NFSv4 clients

1798	8.1.  Introduction

1800	   Although both the NFSv4 [10] and NFSv4.1 protocol [2], define the
1801	   change attribute as being mandatory to implement, there is little in
1802	   the way of guidance.  The only mandated feature is that the value
1803	   must change whenever the file data or metadata change.

1805	   While this allows for a wide range of implementations, it also leaves
1806	   the client with a conundrum: how does it determine which is the most
1807	   recent value for the change attribute in a case where several RPC
1808	   calls have been issued in parallel?  In other words if two COMPOUNDs,
1809	   both containing WRITE and GETATTR requests for the same file, have
1810	   been issued in parallel, how does the client determine which of the
1811	   two change attribute values returned in the replies to the GETATTR
1812	   requests correspond to the most recent state of the file?  In some
1813	   cases, the only recourse may be to send another COMPOUND containing a
1814	   third GETATTR that is fully serialised with the first two.

1816	   NFSv4.2 avoids this kind of inefficiency by allowing the server to
1817	   share details about how the change attribute is expected to evolve,
1818	   so that the client may immediately determine which, out of the
1819	   several change attribute values returned by the server, is the most
1820	   recent. change_attr_type is defined as a new recommended attribute
1821	   (see Section 11.2.1), and is per file system.

1823	9.  Security Considerations
1824	10.  Error Values

1826	   NFS error numbers are assigned to failed operations within a Compound
1827	   (COMPOUND or CB_COMPOUND) request.  A Compound request contains a
1828	   number of NFS operations that have their results encoded in sequence
1829	   in a Compound reply.  The results of successful operations will
1830	   consist of an NFS4_OK status followed by the encoded results of the
1831	   operation.  If an NFS operation fails, an error status will be
1832	   entered in the reply and the Compound request will be terminated.

1834	10.1.  Error Definitions

1836	                        Protocol Error Definitions

1838	         +--------------------------+--------+------------------+
1839	         | Error                    | Number | Description      |
1840	         +--------------------------+--------+------------------+
1841	         | NFS4ERR_BADLABEL         | 10093  | Section 10.1.3.1 |
1842	         | NFS4ERR_METADATA_NOTSUPP | 10090  | Section 10.1.2.1 |
1843	         | NFS4ERR_OFFLOAD_DENIED   | 10091  | Section 10.1.2.2 |
1844	         | NFS4ERR_PARTNER_NO_AUTH  | 10089  | Section 10.1.2.3 |
1845	         | NFS4ERR_PARTNER_NOTSUPP  | 10088  | Section 10.1.2.4 |
1846	         | NFS4ERR_UNION_NOTSUPP    | 10094  | Section 10.1.1.1 |
1847	         | NFS4ERR_WRONG_LFS        | 10092  | Section 10.1.3.2 |
1848	         +--------------------------+--------+------------------+

1850	                                  Table 1

1852	10.1.1.  General Errors

1854	   This section deals with errors that are applicable to a broad set of
1855	   different purposes.

1857	10.1.1.1.  NFS4ERR_UNION_NOTSUPP (Error Code 10094)

1859	   One of the arguments to the operation is a discriminated union and
1860	   while the server supports the given operation, it does not support
1861	   the selected arm of the discriminated union.  For an example, see
1862	   READ_PLUS (Section 13.10).

1864	10.1.2.  Server to Server Copy Errors

1866	   These errors deal with the interaction between server to server
1867	   copies.

1869	10.1.2.1.  NFS4ERR_METADATA_NOTSUPP (Error Code 10090)

1871	   The destination file cannot support the same metadata as the source
1872	   file.

1874	10.1.2.2.  NFS4ERR_OFFLOAD_DENIED (Error Code 10091)

1876	   The copy offload operation is supported by both the source and the
1877	   destination, but the destination is not allowing it for this file.
1878	   If the client sees this error, it should fall back to the normal copy
1879	   semantics.

1881	10.1.2.3.  NFS4ERR_PARTNER_NO_AUTH (Error Code 10089)

1883	   The source server does not authorize a server-to-server copy offload
1884	   operation.  This may be due to the client's failure to send the
1885	   COPY_NOTIFY operation to the source server, the source server
1886	   receiving a server-to-server copy offload request after the copy
1887	   lease time expired, or for some other permission problem.

1889	10.1.2.4.  NFS4ERR_PARTNER_NOTSUPP (Error Code 10088)

1891	   The remote server does not support the server-to-server copy offload
1892	   protocol.

1894	10.1.3.  Labeled NFS Errors

1896	   These errors are used in Labeled NFS.

1898	10.1.3.1.  NFS4ERR_BADLABEL (Error Code 10093)

1900	   The label specified is invalid in some manner.

1902	10.1.3.2.  NFS4ERR_WRONG_LFS (Error Code 10092)

1904	   The LFS specified in the subject label is not compatible with the LFS
1905	   in the object label.

1907	11.  New File Attributes

1909	11.1.  New RECOMMENDED Attributes - List and Definition References

1911	   The list of new RECOMMENDED attributes appears in Table 2.  The
1912	   meaning of the columns of the table are:

1914	   Name:  The name of the attribute.

1916	   Id:  The number assigned to the attribute.  In the event of conflicts
1917	      between the assigned number and [3], the latter is likely
1918	      authoritative, but should be resolved with Errata to this document
1919	      and/or [3].  See [22] for the Errata process.

1921	   Data Type:  The XDR data type of the attribute.

1923	   Acc:  Access allowed to the attribute.

1925	      R  means read-only (GETATTR may retrieve, SETATTR may not set).

1927	      W  means write-only (SETATTR may set, GETATTR may not retrieve).

1929	      R W   means read/write (GETATTR may retrieve, SETATTR may set).

1931	   Defined in:  The section of this specification that describes the
1932	      attribute.

1934	   +------------------+----+-------------------+-----+----------------+
1935	   | Name             | Id | Data Type         | Acc | Defined in     |
1936	   +------------------+----+-------------------+-----+----------------+
1937	   | change_attr_type | 79 | change_attr_type4 | R   | Section 11.2.1 |
1938	   | sec_label        | 80 | sec_label4        | R W | Section 11.2.2 |
1939	   | space_reserved   | 77 | boolean           | R W | Section 11.2.3 |
1940	   | space_freed      | 78 | length4           | R   | Section 11.2.4 |
1941	   +------------------+----+-------------------+-----+----------------+

1943	                                  Table 2

1945	11.2.  Attribute Definitions

1947	11.2.1.  Attribute 79: change_attr_type

1949	   enum change_attr_type4 {
1950	              NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR         = 0,
1951	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER        = 1,
1952	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2,
1953	              NFS4_CHANGE_TYPE_IS_TIME_METADATA          = 3,
1954	              NFS4_CHANGE_TYPE_IS_UNDEFINED              = 4
1955	   };

1957	   change_attr_type is a per file system attribute which enables the
1958	   NFSv4.2 server to provide additional information about how it expects
1959	   the change attribute value to evolve after the file data, or metadata
1960	   has changed.  While Section 5.4 of [2] discusses per file system
1961	   attributes, it is expected that the value of change_attr_type not
1962	   depend on the value of "homogeneous" and only changes in the event of
1963	   a migration.

1965	   NFS4_CHANGE_TYPE_IS_UNDEFINED:  The change attribute does not take
1966	      values that fit into any of these categories.

1968	   NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR:  The change attribute value MUST
1969	      monotonically increase for every atomic change to the file
1970	      attributes, data, or directory contents.

1972	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER:  The change attribute value MUST
1973	      be incremented by one unit for every atomic change to the file
1974	      attributes, data, or directory contents.  This property is
1975	      preserved when writing to pNFS data servers.

1977	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS:  The change attribute
1978	      value MUST be incremented by one unit for every atomic change to
1979	      the file attributes, data, or directory contents.  In the case
1980	      where the client is writing to pNFS data servers, the number of
1981	      increments is not guaranteed to exactly match the number of
1982	      writes.

1984	   NFS4_CHANGE_TYPE_IS_TIME_METADATA:  The change attribute is
1985	      implemented as suggested in the NFSv4 spec [10] in terms of the
1986	      time_metadata attribute.

1988	   If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR,
1989	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or
1990	   NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at
1991	   the very least that the change attribute is monotonically increasing,
1992	   which is sufficient to resolve the question of which value is the
1993	   most recent.

1995	   If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then
1996	   by inspecting the value of the 'time_delta' attribute it additionally
1997	   has the option of detecting rogue server implementations that use
1998	   time_metadata in violation of the spec.

2000	   If the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it has the
2001	   ability to predict what the resulting change attribute value should
2002	   be after a COMPOUND containing a SETATTR, WRITE, or CREATE.  This
2003	   again allows it to detect changes made in parallel by another client.
2004	   The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits the
2005	   same, but only if the client is not doing pNFS WRITEs.

2007	   Finally, if the server does not support change_attr_type or if
2008	   NFS4_CHANGE_TYPE_IS_UNDEFINED is set, then the server SHOULD make an
2009	   effort to implement the change attribute in terms of the
2010	   time_metadata attribute.

2012	11.2.2.  Attribute 80: sec_label

2014	   typedef uint32_t  policy4;

2016	   struct labelformat_spec4 {
2017	           policy4 lfs_lfs;
2018	           policy4 lfs_pi;
2019	   };

2021	   struct sec_label4 {
2022	           labelformat_spec4       slai_lfs;
2023	           opaque                  slai_data<>;
2024	   };

2026	   The FATTR4_SEC_LABEL contains an array of two components with the
2027	   first component being an LFS.  It serves to provide the receiving end
2028	   with the information necessary to translate the security attribute
2029	   into a form that is usable by the endpoint.  Label Formats assigned
2030	   an LFS may optionally choose to include a Policy Identifier field to
2031	   allow for complex policy deployments.  The LFS and Label Format
2032	   Registry are described in detail in [21].  The translation used to
2033	   interpret the security attribute is not specified as part of the
2034	   protocol as it may depend on various factors.  The second component
2035	   is an opaque section which contains the data of the attribute.  This
2036	   component is dependent on the MAC model to interpret and enforce.

2038	   In particular, it is the responsibility of the LFS specification to
2039	   define a maximum size for the opaque section, slai_data<>.  When
2040	   creating or modifying a label for an object, the client needs to be
2041	   guaranteed that the server will accept a label that is sized
2042	   correctly.  By both client and server being part of a specific MAC
2043	   model, the client will be aware of the size.

2045	11.2.3.  Attribute 77: space_reserved

2047	   The space_reserve attribute is a read/write attribute of type
2048	   boolean.  It is a per file attribute.  When the space_reserved
2049	   attribute is set via SETATTR, the server must ensure that there is
2050	   disk space to accommodate every byte in the file before it can return
2051	   success.  If the server cannot guarantee this, it must return
2052	   NFS4ERR_NOSPC.

2054	   If the client tries to grow a file which has the space_reserved
2055	   attribute set, the server must guarantee that there is disk space to
2056	   accommodate every byte in the file with the new size before it can
2057	   return success.  If the server cannot guarantee this, it must return
2058	   NFS4ERR_NOSPC.

2060	   It is not required that the server allocate the space to the file
2061	   before returning success.  The allocation can be deferred, however,
2062	   it must be guaranteed that it will not fail for lack of space.

2064	   The value of space_reserved can be obtained at any time through
2065	   GETATTR.

2067	   In order to avoid ambiguity, the space_reserve bit cannot be set
2068	   along with the size bit in SETATTR.  Increasing the size of a file
2069	   with space_reserve set will fail if space reservation cannot be
2070	   guaranteed for the new size.  If the file size is decreased, space
2071	   reservation is only guaranteed for the new size and the extra blocks
2072	   backing the file can be released.

2074	11.2.4.  Attribute 78: space_freed

2076	   space_freed gives the number of bytes freed if the file is deleted.
2077	   This attribute is read only and is of type length4.  It is a per file
2078	   attribute.

2080	12.  Operations: REQUIRED, RECOMMENDED, or OPTIONAL

2082	   The following tables summarize the operations of the NFSv4.2 protocol
2083	   and the corresponding designation of REQUIRED, RECOMMENDED, and
2084	   OPTIONAL to implement or either OBSOLETE if implemented or MUST NOT
2085	   implement.  The designation of OBSOLETE if implemented is reserved
2086	   for those operations which are defined in either NFSv4.0 or NFSV4.1,
2087	   can be implemented in NFSv4.2, and are intended to be MUST NOT be
2088	   implemented in NFSv4.3.  The designation of MUST NOT implement is
2089	   reserved for those operations that were defined in either NFSv4.0 or
2090	   NFSV4.1 and MUST NOT be implemented in NFSv4.2.

2092	   For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation
2093	   for operations sent by the client is for the server implementation.
2094	   The client is generally required to implement the operations needed
2095	   for the operating environment for which it serves.  For example, a
2096	   read-only NFSv4.2 client would have no need to implement the WRITE
2097	   operation and is not required to do so.

2099	   The REQUIRED or OPTIONAL designation for callback operations sent by
2100	   the server is for both the client and server.  Generally, the client
2101	   has the option of creating the backchannel and sending the operations
2102	   on the fore channel that will be a catalyst for the server sending
2103	   callback operations.  A partial exception is CB_RECALL_SLOT; the only
2104	   way the client can avoid supporting this operation is by not creating
2105	   a backchannel.

2107	   Since this is a summary of the operations and their designation,
2108	   there are subtleties that are not presented here.  Therefore, if
2109	   there is a question of the requirements of implementation, the
2110	   operation descriptions themselves must be consulted along with other
2111	   relevant explanatory text within this either specification or that of
2112	   NFSv4.1 [2].

2114	   The abbreviations used in the second and third columns of the table
2115	   are defined as follows.

2117	   REQ  REQUIRED to implement

2119	   REC  RECOMMEND to implement

2121	   OPT  OPTIONAL to implement

2123	   OBS  MUST NOT implement

2125	   MNI  MUST NOT implement

2127	   For the NFSv4.2 features that are OPTIONAL, the operations that
2128	   support those features are OPTIONAL, and the server would return
2129	   NFS4ERR_NOTSUPP in response to the client's use of those operations.
2130	   If an OPTIONAL feature is supported, it is possible that a set of
2131	   operations related to the feature become REQUIRED to implement.  The
2132	   third column of the table designates the feature(s) and if the
2133	   operation is REQUIRED or OPTIONAL in the presence of support for the
2134	   feature.

2136	   The OPTIONAL features identified and their abbreviations are as
2137	   follows:

2139	   pNFS  Parallel NFS

2141	   FDELG  File Delegations

2143	   DDELG  Directory Delegations

2145	   COPY  Server Side Copy

2147	   ADH  Application Data Holes

2149	                                Operations

2151	   +----------------------+--------------------+-----------------------+
2152	   | Operation            | REQ, REC, OPT, or  | Feature (REQ, REC, or |
2153	   |                      | MNI                | OPT)                  |
2154	   +----------------------+--------------------+-----------------------+
2155	   | ACCESS               | REQ                |                       |
2156	   | BACKCHANNEL_CTL      | REQ                |                       |
2157	   | BIND_CONN_TO_SESSION | REQ                |                       |
2158	   | CLOSE                | REQ                |                       |
2159	   | COMMIT               | REQ                |                       |
2160	   | COPY                 | OPT                | COPY (REQ)            |
2161	   | OFFLOAD_ABORT        | OPT                | COPY (REQ)            |
2162	   | COPY_NOTIFY          | OPT                | COPY (REQ)            |
2163	   | OFFLOAD_REVOKE       | OPT                | COPY (REQ)            |
2164	   | OFFLOAD_STATUS       | OPT                | COPY (REQ)            |
2165	   | CREATE               | REQ                |                       |
2166	   | CREATE_SESSION       | REQ                |                       |
2167	   | DELEGPURGE           | OPT                | FDELG (REQ)           |
2168	   | DELEGRETURN          | OPT                | FDELG, DDELG, pNFS    |
2169	   |                      |                    | (REQ)                 |
2170	   | DESTROY_CLIENTID     | REQ                |                       |
2171	   | DESTROY_SESSION      | REQ                |                       |
2172	   | EXCHANGE_ID          | REQ                |                       |
2173	   | FREE_STATEID         | REQ                |                       |
2174	   | GETATTR              | REQ                |                       |
2175	   | GETDEVICEINFO        | OPT                | pNFS (REQ)            |
2176	   | GETDEVICELIST        | OPT                | pNFS (OPT)            |
2177	   | GETFH                | REQ                |                       |
2178	   | INITIALIZE           | OPT                | ADH (REQ)             |
2179	   | GET_DIR_DELEGATION   | OPT                | DDELG (REQ)           |
2180	   | LAYOUTCOMMIT         | OPT                | pNFS (REQ)            |
2181	   | LAYOUTGET            | OPT                | pNFS (REQ)            |
2182	   | LAYOUTRETURN         | OPT                | pNFS (REQ)            |
2183	   | LINK                 | OPT                |                       |
2184	   | LOCK                 | REQ                |                       |
2185	   | LOCKT                | REQ                |                       |
2186	   | LOCKU                | REQ                |                       |
2187	   | LOOKUP               | REQ                |                       |
2188	   | LOOKUPP              | REQ                |                       |
2189	   | NVERIFY              | REQ                |                       |
2190	   | OPEN                 | REQ                |                       |
2191	   | OPENATTR             | OPT                |                       |
2192	   | OPEN_CONFIRM         | MNI                |                       |
2193	   | OPEN_DOWNGRADE       | REQ                |                       |
2194	   | PUTFH                | REQ                |                       |
2195	   | PUTPUBFH             | REQ                |                       |
2196	   | PUTROOTFH            | REQ                |                       |
2197	   | READ                 | OBS                |                       |
2198	   | READDIR              | REQ                |                       |
2199	   | READLINK             | OPT                |                       |
2200	   | READ_PLUS            | OPT                | ADH (REQ)             |
2201	   | RECLAIM_COMPLETE     | REQ                |                       |
2202	   | RELEASE_LOCKOWNER    | MNI                |                       |
2203	   | REMOVE               | REQ                |                       |
2204	   | RENAME               | REQ                |                       |
2205	   | RENEW                | MNI                |                       |
2206	   | RESTOREFH            | REQ                |                       |
2207	   | SAVEFH               | REQ                |                       |
2208	   | SECINFO              | REQ                |                       |
2209	   | SECINFO_NO_NAME      | REC                | pNFS file layout      |
2210	   |                      |                    | (REQ)                 |
2211	   | SEQUENCE             | REQ                |                       |
2212	   | SETATTR              | REQ                |                       |
2213	   | SETCLIENTID          | MNI                |                       |
2214	   | SETCLIENTID_CONFIRM  | MNI                |                       |
2215	   | SET_SSV              | REQ                |                       |
2216	   | TEST_STATEID         | REQ                |                       |
2217	   | VERIFY               | REQ                |                       |
2218	   | WANT_DELEGATION      | OPT                | FDELG (OPT)           |
2219	   | WRITE                | REQ                |                       |
2220	   +----------------------+--------------------+-----------------------+

2222	                            Callback Operations

2224	   +-------------------------+-------------------+---------------------+
2225	   | Operation               | REQ, REC, OPT, or | Feature (REQ, REC,  |
2226	   |                         | MNI               | or OPT)             |
2227	   +-------------------------+-------------------+---------------------+
2228	   | CB_COPY                 | OPT               | COPY (REQ)          |
2229	   | CB_GETATTR              | OPT               | FDELG (REQ)         |
2230	   | CB_LAYOUTRECALL         | OPT               | pNFS (REQ)          |
2231	   | CB_NOTIFY               | OPT               | DDELG (REQ)         |
2232	   | CB_NOTIFY_DEVICEID      | OPT               | pNFS (OPT)          |
2233	   | CB_NOTIFY_LOCK          | OPT               |                     |
2234	   | CB_PUSH_DELEG           | OPT               | FDELG (OPT)         |
2235	   | CB_RECALL               | OPT               | FDELG, DDELG, pNFS  |
2236	   |                         |                   | (REQ)               |
2237	   | CB_RECALL_ANY           | OPT               | FDELG, DDELG, pNFS  |
2238	   |                         |                   | (REQ)               |
2239	   | CB_RECALL_SLOT          | REQ               |                     |
2240	   | CB_RECALLABLE_OBJ_AVAIL | OPT               | DDELG, pNFS (REQ)   |
2241	   | CB_SEQUENCE             | OPT               | FDELG, DDELG, pNFS  |
2242	   |                         |                   | (REQ)               |
2243	   | CB_WANTS_CANCELLED      | OPT               | FDELG, DDELG, pNFS  |
2244	   |                         |                   | (REQ)               |
2245	   +-------------------------+-------------------+---------------------+

2247	13.  NFSv4.2 Operations

2249	13.1.  Operation 59: COPY - Initiate a server-side copy

2251	13.1.1.  ARGUMENT

2253	   const COPY4_GUARDED     = 0x00000001;
2254	   const COPY4_METADATA    = 0x00000002;

2256	   struct COPY4args {
2257	           /* SAVED_FH: source file */
2258	           /* CURRENT_FH: destination file or */
2259	           /*             directory           */
2260	           stateid4        ca_src_stateid;
2261	           stateid4        ca_dst_stateid;
2262	           offset4         ca_src_offset;
2263	           offset4         ca_dst_offset;
2264	           length4         ca_count;
2265	           uint32_t        ca_flags;
2266	           component4      ca_destination;
2267	           netloc4         ca_source_server<>;
2268	   };

2270	13.1.2.  RESULT

2272	   union COPY4res switch (nfsstat4 cr_status) {
2273	           case NFS4_OK:
2274	                   stateid4        cr_callback_id<1>;
2275	           default:
2276	                   length4         cr_bytes_copied;
2277	   };

2279	13.1.3.  DESCRIPTION

2281	   The COPY operation is used for both intra-server and inter-server
2282	   copies.  In both cases, the COPY is always sent from the client to
2283	   the destination server of the file copy.  The COPY operation requests
2284	   that a file be copied from the location specified by the SAVED_FH
2285	   value to the location specified by the combination of CURRENT_FH and
2286	   ca_destination.

2288	   The SAVED_FH must be a regular file.  If SAVED_FH is not a regular
2289	   file, the operation MUST fail and return NFS4ERR_WRONG_TYPE.

2291	   In order to set SAVED_FH to the source file handle, the compound
2292	   procedure requesting the COPY will include a sub-sequence of
2293	   operations such as

2295	      PUTFH source-fh
2296	      SAVEFH

2298	   If the request is for a server-to-server copy, the source-fh is a
2299	   filehandle from the source server and the compound procedure is being
2300	   executed on the destination server.  In this case, the source-fh is a
2301	   foreign filehandle on the server receiving the COPY request.  If
2302	   either PUTFH or SAVEFH checked the validity of the filehandle, the
2303	   operation would likely fail and return NFS4ERR_STALE.

2305	   If a server supports the server-to-server COPY feature, a PUTFH
2306	   followed by a SAVEFH MUST NOT return NFS4ERR_STALE for either
2307	   operation.  These restrictions do not pose substantial difficulties
2308	   for servers.  The CURRENT_FH and SAVED_FH may be validated in the
2309	   context of the operation referencing them and an NFS4ERR_STALE error
2310	   returned for an invalid file handle at that point.

2312	   For an intra-server copy, both the ca_src_stateid and ca_dst_stateid
2313	   MUST refer to either open or locking states provided earlier by the
2314	   server.  If either stateid is invalid, then the operation MUST fail.
2315	   If the request is for a inter-server copy, then the ca_src_stateid
2316	   can be ignored.  If ca_dst_stateid is invalid, then the operation
2317	   MUST fail.

2319	   The CURRENT_FH and ca_destination together specify the destination of
2320	   the copy operation.  If ca_destination is of 0 (zero) length, then
2321	   CURRENT_FH specifies the target file.  In this case, CURRENT_FH MUST
2322	   be a regular file and not a directory.  If ca_destination is not of 0
2323	   (zero) length, the ca_destination argument specifies the file name to
2324	   which the data will be copied within the directory identified by
2325	   CURRENT_FH.  In this case, CURRENT_FH MUST be a directory and not a
2326	   regular file.

2328	   If the file named by ca_destination does not exist and the operation
2329	   completes successfully, the file will be visible in the file system
2330	   namespace.  If the file does not exist and the operation fails, the
2331	   file MAY be visible in the file system namespace depending on when
2332	   the failure occurs and on the implementation of the NFS server
2333	   receiving the COPY operation.  If the ca_destination name cannot be
2334	   created in the destination file system (due to file name
2335	   restrictions, such as case or length), the operation MUST fail.

2337	   The ca_src_offset is the offset within the source file from which the
2338	   data will be read, the ca_dst_offset is the offset within the
2339	   destination file to which the data will be written, and the ca_count
2340	   is the number of bytes that will be copied.  An offset of 0 (zero)
2341	   specifies the start of the file.  A count of 0 (zero) requests that
2342	   all bytes from ca_src_offset through EOF be copied to the
2343	   destination.  If concurrent modifications to the source file overlap
2344	   with the source file region being copied, the data copied may include
2345	   all, some, or none of the modifications.  The client can use standard
2346	   NFS operations (e.g., OPEN with OPEN4_SHARE_DENY_WRITE or mandatory
2347	   byte range locks) to protect against concurrent modifications if the
2348	   client is concerned about this.  If the source file's end of file is
2349	   being modified in parallel with a copy that specifies a count of 0
2350	   (zero) bytes, the amount of data copied is implementation dependent
2351	   (clients may guard against this case by specifying a non-zero count
2352	   value or preventing modification of the source file as mentioned
2353	   above).

2355	   If the source offset or the source offset plus count is greater than
2356	   or equal to the size of the source file, the operation will fail with
2357	   NFS4ERR_INVAL.  The destination offset or destination offset plus
2358	   count may be greater than the size of the destination file.  This
2359	   allows for the client to issue parallel copies to implement
2360	   operations such as "cat file1 file2 file3 file4 > dest".

2362	   If the destination file is created as a result of this command, the
2363	   destination file's size will be equal to the number of bytes
2364	   successfully copied.  If the destination file already existed, the
2365	   destination file's size may increase as a result of this operation
2366	   (e.g. if ca_dst_offset plus ca_count is greater than the
2367	   destination's initial size).

2369	   If the ca_source_server list is specified, then this is an inter-
2370	   server copy operation and the source file is on a remote server.  The
2371	   client is expected to have previously issued a successful COPY_NOTIFY
2372	   request to the remote source server.  The ca_source_server list MUST
2373	   be the same as the COPY_NOTIFY response's cnr_source_server list.  If
2374	   the client includes the entries from the COPY_NOTIFY response's
2375	   cnr_source_server list in the ca_source_server list, the source
2376	   server can indicate a specific copy protocol for the destination
2377	   server to use by returning a URL, which specifies both a protocol
2378	   service and server name.  Server-to-server copy protocol
2379	   considerations are described in Section 2.2.5 and Section 2.4.1.

2381	   The ca_flags argument allows the copy operation to be customized in
2382	   the following ways using the guarded flag (COPY4_GUARDED) and the
2383	   metadata flag (COPY4_METADATA).

2385	   If the guarded flag is set and the destination exists on the server,
2386	   this operation will fail with NFS4ERR_EXIST.

2388	   If the guarded flag is not set and the destination exists on the
2389	   server, the behavior is implementation dependent.

2391	   If the metadata flag is set and the client is requesting a whole file
2392	   copy (i.e., ca_count is 0 (zero)), a subset of the destination file's
2393	   attributes MUST be the same as the source file's corresponding
2394	   attributes and a subset of the destination file's attributes SHOULD
2395	   be the same as the source file's corresponding attributes.  The
2396	   attributes in the MUST and SHOULD copy subsets will be defined for
2397	   each NFS version.

2399	   For NFSv4.2, Table 3 and Table 4 list the REQUIRED and RECOMMENDED
2400	   attributes respectively.  In the "Copy to destination file?" column,
2401	   a "MUST" indicates that the attribute is part of the MUST copy set.
2402	   A "SHOULD" indicates that the attribute is part of the SHOULD copy
2403	   set.  A "no" indicates that the attribute MUST NOT be copied.

2405	                            REQUIRED attributes

2407	          +--------------------+----+---------------------------+
2408	          | Name               | Id | Copy to destination file? |
2409	          +--------------------+----+---------------------------+
2410	          | supported_attrs    | 0  | no                        |
2411	          | type               | 1  | MUST                      |
2412	          | fh_expire_type     | 2  | no                        |
2413	          | change             | 3  | SHOULD                    |
2414	          | size               | 4  | MUST                      |
2415	          | link_support       | 5  | no                        |
2416	          | symlink_support    | 6  | no                        |
2417	          | named_attr         | 7  | no                        |
2418	          | fsid               | 8  | no                        |
2419	          | unique_handles     | 9  | no                        |
2420	          | lease_time         | 10 | no                        |
2421	          | rdattr_error       | 11 | no                        |
2422	          | filehandle         | 19 | no                        |
2423	          | suppattr_exclcreat | 75 | no                        |
2424	          +--------------------+----+---------------------------+

2426	                                  Table 3

2428	                          RECOMMENDED attributes

2430	          +--------------------+----+---------------------------+
2431	          | Name               | Id | Copy to destination file? |
2432	          +--------------------+----+---------------------------+
2433	          | acl                | 12 | MUST                      |
2434	          | aclsupport         | 13 | no                        |
2435	          | archive            | 14 | no                        |
2436	          | cansettime         | 15 | no                        |
2437	          | case_insensitive   | 16 | no                        |
2438	          | case_preserving    | 17 | no                        |
2439	          | change_attr_type   | 79 | no                        |
2440	          | change_policy      | 60 | no                        |
2441	          | chown_restricted   | 18 | MUST                      |
2442	          | dacl               | 58 | MUST                      |
2443	          | dir_notif_delay    | 56 | no                        |
2444	          | dirent_notif_delay | 57 | no                        |
2445	          | fileid             | 20 | no                        |
2446	          | files_avail        | 21 | no                        |
2447	          | files_free         | 22 | no                        |
2448	          | files_total        | 23 | no                        |
2449	          | fs_charset_cap     | 76 | no                        |
2450	          | fs_layout_type     | 62 | no                        |
2451	          | fs_locations       | 24 | no                        |
2452	          | fs_locations_info  | 67 | no                        |
2453	          | fs_status          | 61 | no                        |
2454	          | hidden             | 25 | MUST                      |
2455	          | homogeneous        | 26 | no                        |
2456	          | layout_alignment   | 66 | no                        |
2457	          | layout_blksize     | 65 | no                        |
2458	          | layout_hint        | 63 | no                        |
2459	          | layout_type        | 64 | no                        |
2460	          | maxfilesize        | 27 | no                        |
2461	          | maxlink            | 28 | no                        |
2462	          | maxname            | 29 | no                        |
2463	          | maxread            | 30 | no                        |
2464	          | maxwrite           | 31 | no                        |
2465	          | mdsthreshold       | 68 | no                        |
2466	          | mimetype           | 32 | MUST                      |
2467	          | mode               | 33 | MUST                      |
2468	          | mode_set_masked    | 74 | no                        |
2469	          | mounted_on_fileid  | 55 | no                        |
2470	          | no_trunc           | 34 | no                        |
2471	          | numlinks           | 35 | no                        |
2472	          | owner              | 36 | MUST                      |
2473	          | owner_group        | 37 | MUST                      |
2474	          | quota_avail_hard   | 38 | no                        |
2475	          | quota_avail_soft   | 39 | no                        |
2476	          | quota_used         | 40 | no                        |
2477	          | rawdev             | 41 | no                        |
2478	          | retentevt_get      | 71 | MUST                      |
2479	          | retentevt_set      | 72 | no                        |
2480	          | retention_get      | 69 | MUST                      |
2481	          | retention_hold     | 73 | MUST                      |
2482	          | retention_set      | 70 | no                        |
2483	          | sacl               | 59 | MUST                      |
2484	          | sec_label          | 80 | MUST                      |
2485	          | space_avail        | 42 | no                        |
2486	          | space_free         | 43 | no                        |
2487	          | space_freed        | 78 | no                        |
2488	          | space_reserved     | 77 | MUST                      |
2489	          | space_total        | 44 | no                        |
2490	          | space_used         | 45 | no                        |
2491	          | system             | 46 | MUST                      |
2492	          | time_access        | 47 | MUST                      |
2493	          | time_access_set    | 48 | no                        |
2494	          | time_backup        | 49 | no                        |
2495	          | time_create        | 50 | MUST                      |
2496	          | time_delta         | 51 | no                        |
2497	          | time_metadata      | 52 | SHOULD                    |
2498	          | time_modify        | 53 | MUST                      |
2499	          | time_modify_set    | 54 | no                        |
2500	          +--------------------+----+---------------------------+

2502	                                  Table 4

2504	   [NOTE: The source file's attribute values will take precedence over
2505	   any attribute values inherited by the destination file.]

2507	   In the case of an inter-server copy or an intra-server copy between
2508	   file systems, the attributes supported for the source file and
2509	   destination file could be different.  By definition,the REQUIRED
2510	   attributes will be supported in all cases.  If the metadata flag is
2511	   set and the source file has a RECOMMENDED attribute that is not
2512	   supported for the destination file, the copy MUST fail with
2513	   NFS4ERR_ATTRNOTSUPP.

2515	   Any attribute supported by the destination server that is not set on
2516	   the source file SHOULD be left unset.

2518	   Metadata attributes not exposed via the NFS protocol SHOULD be copied
2519	   to the destination file where appropriate.

2521	   The destination file's named attributes are not duplicated from the
2522	   source file.  After the copy process completes, the client MAY
2523	   attempt to duplicate named attributes using standard NFSv4
2524	   operations.  However, the destination file's named attribute
2525	   capabilities MAY be different from the source file's named attribute
2526	   capabilities.

2528	   If the metadata flag is not set and the client is requesting a whole
2529	   file copy (i.e., ca_count is 0 (zero)), the destination file's
2530	   metadata is implementation dependent.

2532	   If the client is requesting a partial file copy (i.e., ca_count is
2533	   not 0 (zero)), the client SHOULD NOT set the metadata flag and the
2534	   server MUST ignore the metadata flag.

2536	   If the operation does not result in an immediate failure, the server
2537	   will return NFS4_OK, and the CURRENT_FH will remain the destination's
2538	   filehandle.

2540	   If an immediate failure does occur, cr_bytes_copied will be set to
2541	   the number of bytes copied to the destination file before the error
2542	   occurred.  The cr_bytes_copied value indicates the number of bytes
2543	   copied but not which specific bytes have been copied.

2545	   A return of NFS4_OK indicates that either the operation is complete
2546	   or the operation was initiated and a callback will be used to deliver
2547	   the final status of the operation.

2549	   If the cr_callback_id is returned, this indicates that the operation
2550	   was initiated and a CB_COPY callback will deliver the final results
2551	   of the operation.  The cr_callback_id stateid is termed a copy
2552	   stateid in this context.  The server is given the option of returning
2553	   the results in a callback because the data may require a relatively
2554	   long period of time to copy.

2556	   If no cr_callback_id is returned, the operation completed
2557	   synchronously and no callback will be issued by the server.  The
2558	   completion status of the operation is indicated by cr_status.

2560	   If the copy completes successfully, either synchronously or
2561	   asynchronously, the data copied from the source file to the
2562	   destination file MUST appear identical to the NFS client.  However,
2563	   the NFS server's on disk representation of the data in the source
2564	   file and destination file MAY differ.  For example, the NFS server
2565	   might encrypt, compress, deduplicate, or otherwise represent the on
2566	   disk data in the source and destination file differently.

2568	   In the event of a failure the state of the destination file is
2569	   implementation dependent.  The COPY operation may fail for the
2570	   following reasons (this is a partial list).

2572	   o  NFS4ERR_MOVED

2574	   o  NFS4ERR_NOTSUPP

2576	   o  NFS4ERR_PARTNER_NOTSUPP

2578	   o  NFS4ERR_OFFLOAD_DENIED
2579	   o  NFS4ERR_PARTNER_NO_AUTH

2581	   o  NFS4ERR_FBIG

2583	   o  NFS4ERR_NOTDIR

2585	   o  NFS4ERR_WRONG_TYPE

2587	   o  NFS4ERR_ISDIR

2589	   o  NFS4ERR_INVAL

2591	   o  NFS4ERR_DELAY

2593	   o  NFS4ERR_METADATA_NOTSUPP

2595	   o  NFS4ERR_WRONGSEC

2597	13.2.  Operation 60: OFFLOAD_ABORT - Cancel a server-side copy

2599	13.2.1.  ARGUMENT

2601	   struct OFFLOAD_ABORT4args {
2602	           /* CURRENT_FH: destination file */
2603	           stateid4        oaa_stateid;
2604	   };

2606	13.2.2.  RESULT

2608	   struct OFFLOAD_ABORT4res {
2609	           nfsstat4        oar_status;
2610	   };

2612	13.2.3.  DESCRIPTION

2614	   OFFLOAD_ABORT is used for both intra- and inter-server asynchronous
2615	   copies.  The OFFLOAD_ABORT operation allows the client to cancel a
2616	   server-side copy operation that it initiated.  This operation is sent
2617	   in a COMPOUND request from the client to the destination server.
2618	   This operation may be used to cancel a copy when the application that
2619	   requested the copy exits before the operation is completed or for
2620	   some other reason.

2622	   The request contains the filehandle and copy stateid cookies that act
2623	   as the context for the previously initiated copy operation.

2625	   The result's oar_status field indicates whether the cancel was
2626	   successful or not.  A value of NFS4_OK indicates that the copy
2627	   operation was canceled and no callback will be issued by the server.
2628	   A copy operation that is successfully canceled may result in none,
2629	   some, or all of the data and/or metadata copied.

2631	   If the server supports asynchronous copies, the server is REQUIRED to
2632	   support the OFFLOAD_ABORT operation.

2634	   The OFFLOAD_ABORT operation may fail for the following reasons (this
2635	   is a partial list):

2637	   o  NFS4ERR_NOTSUPP

2639	   o  NFS4ERR_RETRY

2641	   o  NFS4ERR_COMPLETE_ALREADY

2643	   o  NFS4ERR_SERVERFAULT

2645	13.3.  Operation 61: COPY_NOTIFY - Notify a source server of a future
2646	       copy

2648	13.3.1.  ARGUMENT

2650	   struct COPY_NOTIFY4args {
2651	           /* CURRENT_FH: source file */
2652	           stateid4        cna_src_stateid;
2653	           netloc4         cna_destination_server;
2654	   };

2656	13.3.2.  RESULT

2658	   struct COPY_NOTIFY4resok {
2659	           nfstime4        cnr_lease_time;
2660	           netloc4         cnr_source_server<>;
2661	   };

2663	   union COPY_NOTIFY4res switch (nfsstat4 cnr_status) {
2664	           case NFS4_OK:
2665	                   COPY_NOTIFY4resok       resok4;
2666	           default:
2667	                   void;
2668	   };

2670	13.3.3.  DESCRIPTION

2672	   This operation is used for an inter-server copy.  A client sends this
2673	   operation in a COMPOUND request to the source server to authorize a
2674	   destination server identified by cna_destination_server to read the
2675	   file specified by CURRENT_FH on behalf of the given user.

2677	   The cna_src_stateid MUST refer to either open or locking states
2678	   provided earlier by the server.  If it is invalid, then the operation
2679	   MUST fail.

2681	   The cna_destination_server MUST be specified using the netloc4
2682	   network location format.  The server is not required to resolve the
2683	   cna_destination_server address before completing this operation.

2685	   If this operation succeeds, the source server will allow the
2686	   cna_destination_server to copy the specified file on behalf of the
2687	   given user as long as both of the following conditions are met:

2689	   o  The destination server begins reading the source file before the
2690	      cnr_lease_time expires.  If the cnr_lease_time expires while the
2691	      destination server is still reading the source file, the
2692	      destination server is allowed to finish reading the file.

2694	   o  The client has not issued a COPY_REVOKE for the same combination
2695	      of user, filehandle, and destination server.

2697	   The cnr_lease_time is chosen by the source server.  A cnr_lease_time
2698	   of 0 (zero) indicates an infinite lease.  To avoid the need for
2699	   synchronized clocks, copy lease times are granted by the server as a
2700	   time delta.  To renew the copy lease time the client should resend
2701	   the same copy notification request to the source server.

2703	   A successful response will also contain a list of netloc4 network
2704	   location formats called cnr_source_server, on which the source is
2705	   willing to accept connections from the destination.  These might not
2706	   be reachable from the client and might be located on networks to
2707	   which the client has no connection.

2709	   If the client wishes to perform an inter-server copy, the client MUST
2710	   send a COPY_NOTIFY to the source server.  Therefore, the source
2711	   server MUST support COPY_NOTIFY.

2713	   For a copy only involving one server (the source and destination are
2714	   on the same server), this operation is unnecessary.

2716	   The COPY_NOTIFY operation may fail for the following reasons (this is
2717	   a partial list):

2719	   o  NFS4ERR_MOVED

2721	   o  NFS4ERR_NOTSUPP

2723	   o  NFS4ERR_WRONGSEC

2725	13.4.  Operation 62: OFFLOAD_REVOKE - Revoke a destination server's copy
2726	       privileges

2728	13.4.1.  ARGUMENT

2730	   struct OFFLOAD_REVOKE4args {
2731	           /* CURRENT_FH: source file */
2732	           netloc4         ora_destination_server;
2733	   };

2735	13.4.2.  RESULT

2737	   struct OFFLOAD_REVOKE4res {
2738	           nfsstat4        orr_status;
2739	   };

2741	13.4.3.  DESCRIPTION

2743	   This operation is used for an inter-server copy.  A client sends this
2744	   operation in a COMPOUND request to the source server to revoke the
2745	   authorization of a destination server identified by
2746	   ora_destination_server from reading the file specified by CURRENT_FH
2747	   on behalf of given user.  If the ora_destination_server has already
2748	   begun copying the file, a successful return from this operation
2749	   indicates that further access will be prevented.

2751	   The ora_destination_server MUST be specified using the netloc4
2752	   network location format.  The server is not required to resolve the
2753	   ora_destination_server address before completing this operation.

2755	   The client uses OFFLOAD_ABORT to inform the destination to stop the
2756	   active transfer and OFFLOAD_REVOKE to inform the source to not allow
2757	   any more copy requests from the destination.  The OFFLOAD_REVOKE
2758	   operation is also useful in situations in which the source server
2759	   granted a very long or infinite lease on the destination server's
2760	   ability to read the source file and all copy operations on the source
2761	   file have been completed.

2763	   For a copy only involving one server (the source and destination are
2764	   on the same server), this operation is unnecessary.

2766	   If the server supports COPY_NOTIFY, the server is REQUIRED to support
2767	   the OFFLOAD_REVOKE operation.

2769	   The OFFLOAD_REVOKE operation may fail for the following reasons (this
2770	   is a partial list):

2772	   o  NFS4ERR_MOVED

2774	   o  NFS4ERR_NOTSUPP

2776	13.5.  Operation 63: OFFLOAD_STATUS - Poll for status of a server-side
2777	       copy

2779	13.5.1.  ARGUMENT

2781	   struct OFFLOAD_STATUS4args {
2782	           /* CURRENT_FH: destination file */
2783	           stateid4        osa_stateid;
2784	   };

2786	13.5.2.  RESULT

2788	   struct OFFLOAD_STATUS4resok {
2789	           length4         osr_bytes_copied;
2790	           nfsstat4        osr_complete<1>;
2791	   };

2793	   union OFFLOAD_STATUS4res switch (nfsstat4 osr_status) {
2794	           case NFS4_OK:
2795	                   OFFLOAD_STATUS4resok    resok4;
2796	           default:
2797	                   void;
2798	   };

2800	13.5.3.  DESCRIPTION

2802	   OFFLOAD_STATUS is used for both intra- and inter-server asynchronous
2803	   copies.  The OFFLOAD_STATUS operation allows the client to poll the
2804	   destination server to determine the status of an asynchronous copy
2805	   operation.

2807	   If this operation is successful, the number of bytes copied are
2808	   returned to the client in the osr_bytes_copied field.  The
2809	   osr_bytes_copied value indicates the number of bytes copied but not
2810	   which specific bytes have been copied.

2812	   If the optional osr_complete field is present, the copy has
2813	   completed.  In this case the status value indicates the result of the
2814	   asynchronous copy operation.  In all cases, the server will also
2815	   deliver the final results of the asynchronous copy in a CB_COPY
2816	   operation.

2818	   The failure of this operation does not indicate the result of the
2819	   asynchronous copy in any way.

2821	   If the server supports asynchronous copies, the server is REQUIRED to
2822	   support the OFFLOAD_STATUS operation.

2824	   The OFFLOAD_STATUS operation may fail for the following reasons (this
2825	   is a partial list):

2827	   o  NFS4ERR_NOTSUPP

2829	   o  NFS4ERR_BAD_STATEID

2831	   o  NFS4ERR_EXPIRED

2833	13.6.  Modification to Operation 42: EXCHANGE_ID - Instantiate Client ID

2835	13.6.1.  ARGUMENT

2837	      /* new */
2838	      const EXCHGID4_FLAG_SUPP_FENCE_OPS      = 0x00000004;

2840	13.6.2.  RESULT

2842	      Unchanged

2844	13.6.3.  MOTIVATION

2846	   Enterprise applications require guarantees that an operation has
2847	   either aborted or completed.  NFSv4.1 provides this guarantee as long
2848	   as the session is alive: simply send a SEQUENCE operation on the same
2849	   slot with a new sequence number, and the successful return of
2850	   SEQUENCE indicates the previous operation has completed.  However, if
2851	   the session is lost, there is no way to know when any in progress
2852	   operations have aborted or completed.  In hindsight, the NFSv4.1
2853	   specification should have mandated that DESTROY_SESSION either abort
2854	   or complete all outstanding operations.

2856	13.6.4.  DESCRIPTION

2858	   A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability
2859	   when it sends an EXCHANGE_ID operation.  The server SHOULD set this
2860	   capability in the EXCHANGE_ID reply whether the client requests it or
2861	   not.  It is the server's return that determines whether this
2862	   capability is in effect.  When it is in effect, the following will
2863	   occur:

2865	   o  The server will not reply to any DESTROY_SESSION invoked with the
2866	      client ID until all operations in progress are completed or
2867	      aborted.

2869	   o  The server will not reply to subsequent EXCHANGE_ID invoked on the
2870	      same client owner with a new verifier until all operations in
2871	      progress on the client ID's session are completed or aborted.

2873	   o  The NFS server SHOULD support client ID trunking, and if it does
2874	      and the EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a
2875	      session ID created on one node of the storage cluster MUST be
2876	      destroyable via DESTROY_SESSION.  In addition, DESTROY_CLIENTID
2877	      and an EXCHANGE_ID with a new verifier affects all sessions
2878	      regardless what node the sessions were created on.

2880	13.7.  Operation 64: INITIALIZE

2882	   This operation can be used to initialize the structure imposed by an
2883	   application onto a file, i.e., ADHs, and to punch a hole into a file.

2885	13.7.1.  ARGUMENT

2887	   struct data_info4 {
2888	           offset4         di_offset;
2889	           length4         di_length;
2890	           bool            di_allocated;
2891	   };
2892	   /*
2893	    * We use data_content4 in case we wish to
2894	    * extend new types later. Note that we
2895	    * are explicitly disallowing data.
2896	    */
2897	   union initialize_arg4 switch (data_content4 content) {
2898	   case NFS4_CONTENT_APP_DATA_HOLE:
2899	           app_data_hole4  ia_adh;
2900	   case NFS4_CONTENT_HOLE:
2901	           data_info4      ia_hole;
2902	   default:
2903	           void;
2904	   };

2906	   struct INITIALIZE4args {
2907	           /* CURRENT_FH: file */
2908	           stateid4        ia_stateid;
2909	           stable_how4     ia_stable;
2910	           initialize_arg4 ia_data<>;
2911	   };

2913	13.7.2.  RESULT

2915	   struct INITIALIZE4resok {
2916	           count4          ir_count;
2917	           stable_how4     ir_committed;
2918	           verifier4       ir_writeverf;
2919	           data_content4   ir_sparse;
2920	   };

2922	   union INITIALIZE4res switch (nfsstat4 status) {
2923	   case NFS4_OK:
2924	           INITIALIZE4resok        resok4;
2925	   default:
2926	           void;
2927	   };

2929	13.7.3.  DESCRIPTION

2931	   Using the data_content4 (Section 6.1.2), INITIALIZE can be used
2932	   either to punch holes or to impose ADH structure on a file.

2934	13.7.3.1.  Hole punching

2936	   Whenever a client wishes to zero the blocks backing a particular
2937	   region in the file, it calls the INITIALIZE operation with the
2938	   current filehandle set to the filehandle of the file in question, and
2939	   the equivalent of start offset and length in bytes of the region set
2940	   in ia_hole.di_offset and ia_hole.di_length respectively.  If the
2941	   ia_hole.di_allocated is set to TRUE, then the blocks will be zeroed
2942	   and if it is set to FALSE, then they will be deallocated.  All
2943	   further reads to this region MUST return zeros until overwritten.
2944	   The filehandle specified must be that of a regular file.

2946	   Situations may arise where di_offset and/or di_offset + di_length
2947	   will not be aligned to a boundary that the server does allocations/
2948	   deallocations in.  For most file systems, this is the block size of
2949	   the file system.  In such a case, the server can deallocate as many
2950	   bytes as it can in the region.  The blocks that cannot be deallocated
2951	   MUST be zeroed.  Except for the block deallocation and maximum hole
2952	   punching capability, a INITIALIZE operation is to be treated similar
2953	   to a write of zeroes.

2955	   The server is not required to complete deallocating the blocks
2956	   specified in the operation before returning.  It is acceptable to
2957	   have the deallocation be deferred.  In fact, INITIALIZE is merely a
2958	   hint; it is valid for a server to return success without ever doing
2959	   anything towards deallocating the blocks backing the region
2960	   specified.  However, any future reads to the region MUST return
2961	   zeroes.

2963	   If used to hole punch, INITIALIZE will result in the space_used
2964	   attribute being decreased by the number of bytes that were
2965	   deallocated.  The space_freed attribute may or may not decrease,
2966	   depending on the support and whether the blocks backing the specified
2967	   range were shared or not.  The size attribute will remain unchanged.

2969	   The INITIALIZE operation MUST NOT change the space reservation
2970	   guarantee of the file.  While the server can deallocate the blocks
2971	   specified by di_offset and di_length, future writes to this region
2972	   MUST NOT fail with NFSERR_NOSPC.

2974	   The INITIALIZE operation may fail for the following reasons (this is
2975	   a partial list):

2977	   NFS4ERR_NOTSUPP  The Hole punch operations are not supported by the
2978	      NFS server receiving this request.

2980	   NFS4ERR_DIR  The current filehandle is of type NF4DIR.

2982	   NFS4ERR_SYMLINK  The current filehandle is of type NF4LNK.

2984	   NFS4ERR_WRONG_TYPE  The current filehandle does not designate an
2985	      ordinary file.

2987	13.7.3.2.  ADHs

2989	   If the server supports ADHs, then it MUST support the
2990	   NFS4_CONTENT_APP_DATA_HOLE arm of the INITIALIZE operation.  The
2991	   server has no concept of the structure imposed by the application.
2992	   It is only when the application writes to a section of the file does
2993	   order get imposed.  In order to detect corruption even before the
2994	   application utilizes the file, the application will want to
2995	   initialize a range of ADHs using INITIALIZE.

2997	   For ADHs, when the client invokes the INITIALIZE operation, it has
2998	   two desired results:

3000	   1.  The structure described by the app_data_block4 be imposed on the
3001	       file.

3003	   2.  The contents described by the app_data_block4 be sparse.

3005	   If the server supports the INITIALIZE operation, it still might not
3006	   support sparse files.  So if it receives the INITIALIZE operation,
3007	   then it MUST populate the contents of the file with the initialized
3008	   ADHs.

3010	   If the data was already initialized, there are two interesting
3011	   scenarios:

3013	   1.  The data blocks are allocated.

3015	   2.  Initializing in the middle of an existing ADH.

3017	   If the data blocks were already allocated, then the INITIALIZE is a
3018	   hole punch operation.  If INITIALIZE supports sparse files, then the
3019	   data blocks are to be deallocated.  If not, then the data blocks are
3020	   to be rewritten in the indicated ADH format.

3022	   Since the server has no knowledge of ADHs, it should not report
3023	   misaligned creation of ADHs.  Even while it can detect them, it
3024	   cannot disallow them, as the application might be in the process of
3025	   changing the size of the ADHs.  Thus the server must be prepared to
3026	   handle an INITIALIZE into an existing ADH.

3028	   This document does not mandate the manner in which the server stores
3029	   ADHs sparsely for a file.  However, if an INITIALIZE arrives that
3030	   will force a new ADH to start inside an existing ADH then the server
3031	   will have three ADHs instead of two.  It will have one up to the new
3032	   one for the INITIALIZE, one for the INITIALIZE, and one for after the
3033	   INITIALIZE.  Note that depending on server specific policies for
3034	   block allocation, there may also be some physical blocks allocated to
3035	   align the boundaries.

3037	13.8.  Operation 67: IO_ADVISE - Application I/O access pattern hints

3039	13.8.1.  ARGUMENT

3041	   enum IO_ADVISE_type4 {
3042	           IO_ADVISE4_NORMAL                       = 0,
3043	           IO_ADVISE4_SEQUENTIAL                   = 1,
3044	           IO_ADVISE4_SEQUENTIAL_BACKWARDS         = 2,
3045	           IO_ADVISE4_RANDOM                       = 3,
3046	           IO_ADVISE4_WILLNEED                     = 4,
3047	           IO_ADVISE4_WILLNEED_OPPORTUNISTIC       = 5,
3048	           IO_ADVISE4_DONTNEED                     = 6,
3049	           IO_ADVISE4_NOREUSE                      = 7,
3050	           IO_ADVISE4_READ                         = 8,
3051	           IO_ADVISE4_WRITE                        = 9,
3052	           IO_ADVISE4_INIT_PROXIMITY               = 10
3053	   };

3055	   struct IO_ADVISE4args {
3056	           /* CURRENT_FH: file */
3057	           stateid4        iar_stateid;
3058	           offset4         iar_offset;
3059	           length4         iar_count;
3060	           bitmap4         iar_hints;
3061	   };

3063	13.8.2.  RESULT

3065	   struct IO_ADVISE4resok {
3066	           bitmap4 ior_hints;
3067	   };

3069	   union IO_ADVISE4res switch (nfsstat4 _status) {
3070	   case NFS4_OK:
3071	           IO_ADVISE4resok resok4;
3072	   default:
3073	           void;
3074	   };

3076	13.8.3.  DESCRIPTION

3078	   The IO_ADVISE operation sends an I/O access pattern hint to the
3079	   server for the owner of the stateid for a given byte range specified
3080	   by iar_offset and iar_count.  The byte range specified by iar_offset
3081	   and iar_count need not currently exist in the file, but the iar_hints
3082	   will apply to the byte range when it does exist.  If iar_count is 0,
3083	   all data following iar_offset is specified.  The server MAY ignore
3084	   the advice.

3086	   The following are the allowed hints for a stateid holder:

3088	   IO_ADVISE4_NORMAL  There is no advice to give, this is the default
3089	      behavior.

3091	   IO_ADVISE4_SEQUENTIAL  Expects to access the specified data
3092	      sequentially from lower offsets to higher offsets.

3094	   IO_ADVISE4_SEQUENTIAL BACKWARDS  Expects to access the specified data
3095	      sequentially from higher offsets to lower offsets.

3097	   IO_ADVISE4_RANDOM  Expects to access the specified data in a random
3098	      order.

3100	   IO_ADVISE4_WILLNEED  Expects to access the specified data in the near
3101	      future.

3103	   IO_ADVISE4_WILLNEED_OPPORTUNISTIC  Expects to possibly access the
3104	      data in the near future.  This is a speculative hint, and
3105	      therefore the server should prefetch data or indirect blocks only
3106	      if it can be done at a marginal cost.

3108	   IO_ADVISE_DONTNEED  Expects that it will not access the specified
3109	      data in the near future.

3111	   IO_ADVISE_NOREUSE  Expects to access the specified data once and then
3112	      not reuse it thereafter.

3114	   IO_ADVISE4_READ  Expects to read the specified data in the near
3115	      future.

3117	   IO_ADVISE4_WRITE  Expects to write the specified data in the near
3118	      future.

3120	   IO_ADVISE4_INIT_PROXIMITY  Informs the server that the data in the
3121	      byte range remains important to the client.

3123	   Since IO_ADVISE is a hint, a server SHOULD NOT return an error and
3124	   invalidate a entire Compound request if one of the sent hints in
3125	   iar_hints is not supported by the server.  Also, the server MUST NOT
3126	   return an error if the client sends contradictory hints to the
3127	   server, e.g., IO_ADVISE4_SEQUENTIAL and IO_ADVISE4_RANDOM in a single
3128	   IO_ADVISE operation.  In these cases, the server MUST return success
3129	   and a ior_hints value that indicates the hint it intends to
3130	   implement.  This may mean simply returning IO_ADVISE4_NORMAL.

3132	   The ior_hints returned by the server is primarily for debugging
3133	   purposes since the server is under no obligation to carry out the
3134	   hints that it describes in the ior_hints result.  In addition, while
3135	   the server may have intended to implement the hints returned in
3136	   ior_hints, as time progresses, the server may need to change its
3137	   handling of a given file due to several reasons including, but not
3138	   limited to, memory pressure, additional IO_ADVISE hints sent by other
3139	   clients, and heuristically detected file access patterns.

3141	   The server MAY return different advice than what the client
3142	   requested.  If it does, then this might be due to one of several
3143	   conditions, including, but not limited to another client advising of
3144	   a different I/O access pattern; a different I/O access pattern from
3145	   another client that that the server has heuristically detected; or
3146	   the server is not able to support the requested I/O access pattern,
3147	   perhaps due to a temporary resource limitation.

3149	   Each issuance of the IO_ADVISE operation overrides all previous
3150	   issuances of IO_ADVISE for a given byte range.  This effectively
3151	   follows a strategy of last hint wins for a given stateid and byte
3152	   range.

3154	   Clients should assume that hints included in an IO_ADVISE operation
3155	   will be forgotten once the file is closed.

3157	13.8.4.  IMPLEMENTATION

3159	   The NFS client may choose to issue an IO_ADVISE operation to the
3160	   server in several different instances.

3162	   The most obvious is in direct response to an application's execution
3163	   of posix_fadvise().  In this case, IO_ADVISE4_WRITE and
3164	   IO_ADVISE4_READ may be set based upon the type of file access
3165	   specified when the file was opened.

3167	13.8.5.  IO_ADVISE4_INIT_PROXIMITY

3169	   The IO_ADVISE4_INIT_PROXIMITY hint is non-posix in origin and conveys
3170	   that the client has recently accessed the byte range in its own
3171	   cache.  I.e., it has not accessed it on the server, but it has
3172	   locally.  When the server reaches resource exhaustion, knowing which
3173	   data is more important allows the server to make better choices about
3174	   which data to, for example purge from a cache, or move to secondary
3175	   storage.  It also informs the server which delegations are more
3176	   important, since if delegations are working correctly, once delegated
3177	   to a client and the client has read the content for that byte range,
3178	   a server might never receive another read request for that byte
3179	   range.

3181	   This hint is also useful in the case of NFS clients which are network
3182	   booting from a server.  If the first client to be booted sends this
3183	   hint, then it keeps the cache warm for the remaining clients.

3185	13.8.6.  pNFS File Layout Data Type Considerations

3187	   The IO_ADVISE considerations for pNFS are very similar to the COMMIT
3188	   considerations for pNFS.  That is, as with COMMIT, some NFS server
3189	   implementations prefer IO_ADVISE be done on the DS, and some prefer
3190	   it be done on the MDS.

3192	   So for the file's layout type, it is proposed that NFSv4.2 include an
3193	   additional hint NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on
3194	   NFSv4.2 or higher.  Any file's layout obtained with NFSv4.1 MUST NOT
3195	   have NFL42_UFLG_IO_ADVISE_THRU_MDS set.  Any file's layout obtained
3196	   with NFSv4.2 MAY have NFL42_UFLG_IO_ADVISE_THRU_MDS set.  If the
3197	   client does not implement IO_ADVISE, then it MUST ignore
3198	   NFL42_UFLG_IO_ADVISE_THRU_MDS.

3200	   If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, the client MUST send the
3201	   IO_ADVISE operation to the MDS in order for it to be honored by the
3202	   DS.  Once the MDS receives the IO_ADVISE operation, it will
3203	   communicate the advice to each DS.

3205	   If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then the client SHOULD
3206	   send an IO_ADVISE operation to the appropriate DS for the specified
3207	   byte range.  While the client MAY always send IO_ADVISE to the MDS,
3208	   if the server has not set NFL42_UFLG_IO_ADVISE_THRU_MDS, the client
3209	   should expect that such an IO_ADVISE is futile.  Note that a client
3210	   SHOULD use the same set of arguments on each IO_ADVISE sent to a DS
3211	   for the same open file reference.

3213	   The server is not required to support different advice for different
3214	   DS's with the same open file reference.

3216	13.8.6.1.  Dense and Sparse Packing Considerations

3218	   The IO_ADVISE operation MUST use the iar_offset and byte range as
3219	   dictated by the presence or absence of NFL4_UFLG_DENSE.

3221	   E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the DS
3222	   for iar_offset 0 really means iar_offset 10000 in the logical file,
3223	   then an IO_ADVISE for iar_offset 0 means iar_offset 10000.

3225	   E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the DS
3226	   for iar_offset 0 really means iar_offset 0 in the logical file, then
3227	   an IO_ADVISE for iar_offset 0 means iar_offset 0 in the logical file.

3229	   E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes
3230	   and the stripe count is 10, and the dense DS file is serving
3231	   iar_offset 0.  A READ or WRITE to the DS for iar_offsets 0, 1000,
3232	   2000, and 3000, really mean iar_offsets 10000, 20000, 30000, and
3233	   40000 (implying a stripe count of 10 and a stripe unit of 1000), then
3234	   an IO_ADVISE sent to the same DS with an iar_offset of 500, and a
3235	   iar_count of 3000 means that the IO_ADVISE applies to these byte
3236	   ranges of the dense DS file:

3238	     - 500 to 999
3239	     - 1000 to 1999
3240	     - 2000 to 2999
3241	     - 3000 to 3499

3243	   I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE.

3245	   It also applies to these byte ranges of the logical file:

3247	     - 10500 to 10999 (500 bytes)
3248	     - 20000 to 20999 (1000 bytes)
3249	     - 30000 to 30999 (1000 bytes)
3250	     - 40000 to 40499 (500 bytes)
3251	     (total            3000 bytes)

3253	   E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the
3254	   stripe count is 4, and the sparse DS file is serving iar_offset 0.
3255	   Then a READ or WRITE to the DS for iar_offsets 0, 1000, 2000, and
3256	   3000, really mean iar_offsets 0, 1000, 2000, and 3000 in the logical
3257	   file, keeping in mind that on the DS file,. byte ranges 250 to 999,
3258	   1250 to 1999, 2250 to 2999, and 3250 to 3999 are not accessible.
3259	   Then an IO_ADVISE sent to the same DS with an iar_offset of 500, and
3260	   a iar_count of 3000 means that the IO_ADVISE applies to these byte
3261	   ranges of the logical file and the sparse DS file:

3263	     - 500 to 999 (500 bytes)   - no effect
3264	     - 1000 to 1249 (250 bytes) - effective
3265	     - 1250 to 1999 (750 bytes) - no effect
3266	     - 2000 to 2249 (250 bytes) - effective
3267	     - 2250 to 2999 (750 bytes) - no effect
3268	     - 3000 to 3249 (250 bytes) - effective
3269	     - 3250 to 3499 (250 bytes) - no effect
3270	     (subtotal      2250 bytes) - no effect
3271	     (subtotal       750 bytes) - effective
3272	     (grand total   3000 bytes) - no effect + effective

3274	   If neither of the flags NFL42_UFLG_IO_ADVISE_THRU_MDS and
3275	   NFL4_UFLG_DENSE are set in the layout, then any IO_ADVISE request
3276	   sent to the data server with a byte range that overlaps stripe unit
3277	   that the data server does not serve MUST NOT result in the status
3278	   NFS4ERR_PNFS_IO_HOLE.  Instead, the response SHOULD be successful and
3279	   if the server applies IO_ADVISE hints on any stripe units that
3280	   overlap with the specified range, those hints SHOULD be indicated in
3281	   the response.

3283	13.9.  Changes to Operation 51: LAYOUTRETURN

3285	13.9.1.  Introduction

3287	   In the pNFS description provided in [2], the client is not capable to
3288	   relay an error code from the DS to the MDS.  In the specification of
3289	   the Objects-Based Layout protocol [9], use is made of the opaque
3290	   lrf_body field of the LAYOUTRETURN argument to do such a relaying of
3291	   error codes.  In this section, we define a new data structure to
3292	   enable the passing of error codes back to the MDS and provide some
3293	   guidelines on what both the client and MDS should expect in such
3294	   circumstances.

3296	   There are two broad classes of errors, transient and persistent.  The
3297	   client SHOULD strive to only use this new mechanism to report
3298	   persistent errors.  It MUST be able to deal with transient issues by
3299	   itself.  Also, while the client might consider an issue to be
3300	   persistent, it MUST be prepared for the MDS to consider such issues
3301	   to be transient.  A prime example of this is if the MDS fences off a
3302	   client from either a stateid or a filehandle.  The client will get an
3303	   error from the DS and might relay either NFS4ERR_ACCESS or
3304	   NFS4ERR_BAD_STATEID back to the MDS, with the belief that this is a
3305	   hard error.  If the MDS is informed by the client that there is an
3306	   error, it can safely ignore that.  For it, the mission is
3307	   accomplished in that the client has returned a layout that the MDS
3308	   had most likley recalled.

3310	   The client might also need to inform the MDS that it cannot reach one
3311	   or more of the DSes.  While the MDS can detect the connectivity of
3312	   both of these paths:

3314	   o  MDS to DS

3316	   o  MDS to client

3318	   it cannot determine if the client and DS path is working.  As with
3319	   the case of the DS passing errors to the client, it must be prepared
3320	   for the MDS to consider such outages as being transistory.

3322	   The existing LAYOUTRETURN operation is extended by introducing a new
3323	   data structure to report errors, layoutreturn_device_error4.  Also,
3324	   layoutreturn_device_error4 is introduced to enable an array of errors
3325	   to be reported.

3327	13.9.2.  ARGUMENT

3329	   The ARGUMENT specification of the LAYOUTRETURN operation in section
3330	   18.44.1 of [2] is augmented by the following XDR code [23]:

3332	   struct layoutreturn_device_error4 {
3333	           deviceid4       lrde_deviceid;
3334	           nfsstat4        lrde_status;
3335	           nfs_opnum4      lrde_opnum;
3336	   };

3338	   struct layoutreturn_error_report4 {
3339	           layoutreturn_device_error4      lrer_errors<>;
3340	   };

3342	13.9.3.  RESULT

3344	   The RESULT of the LAYOUTRETURN operation is unchanged; see section
3345	   18.44.2 of [2].

3347	13.9.4.  DESCRIPTION

3349	   The following text is added to the end of the LAYOUTRETURN operation
3350	   DESCRIPTION in section 18.44.3 of [2].

3352	   When a client uses LAYOUTRETURN with a type of LAYOUTRETURN4_FILE,
3353	   then if the lrf_body field is NULL, it indicates to the MDS that the
3354	   client experienced no errors.  If lrf_body is non-NULL, then the
3355	   field references error information which is layout type specific.
3356	   I.e., the Objects-Based Layout protocol can continue to utilize
3357	   lrf_body as specified in [9].  For both Files-Based and Block-Based
3358	   Layouts, the field references a layoutreturn_device_error4, which
3359	   contains an array of layoutreturn_device_error4.

3361	   Each individual layoutreturn_device_error4 descibes a single error
3362	   associated with a DS, which is identfied via lrde_deviceid.  The
3363	   operation which returned the error is identified via lrde_opnum.
3364	   Finally the NFS error value (nfsstat4) encountered is provided via
3365	   lrde_status and may consist of the following error codes:

3367	   NFS4ERR_NXIO:  The client was unable to establish any communication
3368	      with the DS.

3370	   NFS4ERR_*:  The client was able to establish communication with the
3371	      DS and is returning one of the allowed error codes for the
3372	      operation denoted by lrde_opnum.

3374	13.9.5.  IMPLEMENTATION

3376	   The following text is added to the end of the LAYOUTRETURN operation
3377	   IMPLEMENTATION in section 18.4.4 of [2].

3379	   Clients are expected to tolerate transient storage device errors, and
3380	   hence clients SHOULD NOT use the LAYOUTRETURN error handling for
3381	   device access problems that may be transient.  The methods by which a
3382	   client decides whether a device access problem is transient vs.
3383	   persistent are implementation-specific, but may include retrying I/Os
3384	   to a data server under appropriate conditions.

3386	   When an I/O fails to a storage device, the client SHOULD retry the
3387	   failed I/O via the MDS.  In this situation, before retrying the I/O,
3388	   the client SHOULD return the layout, or the affected portion thereof,
3389	   and SHOULD indicate which storage device or devices was problematic.
3390	   The client needs to do this when the DS is being unresponsive in
3391	   order to fence off any failed write attempts, and ensure that they do
3392	   not end up overwriting any later data being written through the MDS.
3393	   If the client does not do this, the MDS MAY issue a layout recall
3394	   callback in order to perform the retried I/O.

3396	   The client needs to be cognizant that since this error handling is
3397	   optional in the MDS, the MDS may silently ignore this functionality.
3398	   Also, as the MDS may consider some issues the client reports to be
3399	   expected (see Section 13.9.1), the client might find it difficult to
3400	   detect a MDS which has not implemented error handling via
3401	   LAYOUTRETURN.

3403	   If an MDS is aware that a storage device is proving problematic to a
3404	   client, the MDS SHOULD NOT include that storage device in any pNFS
3405	   layouts sent to that client.  If the MDS is aware that a storage
3406	   device is affecting many clients, then the MDS SHOULD NOT include
3407	   that storage device in any pNFS layouts sent out.  If a client asks
3408	   for a new layout for the file from the MDS, it MUST be prepared for
3409	   the MDS to return that storage device in the layout.  The MDS might
3410	   not have any choice in using the storage device, i.e., there might
3411	   only be one possible layout for the system.  Also, in the case of
3412	   existing files, the MDS might have no choice in which storage devices
3413	   to hand out to clients.

3415	   The MDS is not required to indefinitely retain per-client storage
3416	   device error information.  An MDS is also not required to
3417	   automatically reinstate use of a previously problematic storage
3418	   device; administrative intervention may be required instead.

3420	13.10.  Operation 65: READ_PLUS

3422	   READ_PLUS is a new variant of the NFSv4.1 READ operation [2].
3423	   Besides being able to support all of the data semantics of READ, it
3424	   can also be used by the server to return either holes or ADHs to the
3425	   client.  For holes, READ_PLUS extends the response to avoid returning
3426	   data for portions of the file which are either initialized and
3427	   contain no backing store or if the result would appear to be so.
3428	   I.e., if the result was a data block composed entirely of zeros, then
3429	   it is easier to return a hole.  Returning data blocks of
3430	   uninitialized data wastes computational and network resources, thus
3431	   reducing performance.  For ADHs, READ_PLUS is used to return the
3432	   metadata describing the portions of the file which are either
3433	   initialized and contain no backing store.

3435	   If the client sends a READ operation, it is explicitly stating that
3436	   it is neither supporting sparse files nor ADHs.  So if a READ occurs
3437	   on a sparse ADH or file, then the server must expand such data to be
3438	   raw bytes.  If a READ occurs in the middle of a hole or ADH, the
3439	   server can only send back bytes starting from that offset.  In
3440	   contrast, if a READ_PLUS occurs in the middle of a hole or ADH, the
3441	   server can send back a range which starts before the offset and
3442	   extends past the range.

3444	   READ is inefficient for transfer of sparse sections of the file.  As
3445	   such, READ is marked as OBSOLETE in NFSv4.2.  Instead, a client
3446	   should issue READ_PLUS.  Note that as the client has no a priori
3447	   knowledge of whether either an ADH or a hole is present or not, it
3448	   should always use READ_PLUS.

3450	13.10.1.  ARGUMENT

3452	   struct READ_PLUS4args {
3453	           /* CURRENT_FH: file */
3454	           stateid4        rpa_stateid;
3455	           offset4         rpa_offset;
3456	           count4          rpa_count;
3457	   };

3459	13.10.2.  RESULT

3461	   union read_plus_content switch (data_content4 content) {
3462	   case NFS4_CONTENT_DATA:
3463	           opaque          rpc_data<>;
3464	   case NFS4_CONTENT_APP_DATA_HOLE:
3465	           app_data_hole4  rpc_adh;
3466	   case NFS4_CONTENT_HOLE:
3467	           data_info4      rpc_hole;
3468	   default:
3469	           void;
3470	   };

3472	   /*
3473	    * Allow a return of an array of contents.
3474	    */
3475	   struct read_plus_res4 {
3476	           bool                    rpr_eof;
3477	           read_plus_content       rpr_contents<>;
3478	   };

3480	   union READ_PLUS4res switch (nfsstat4 status) {
3481	   case NFS4_OK:
3482	           read_plus_res4  resok4;
3483	   default:
3484	           void;
3485	   };

3487	13.10.3.  DESCRIPTION

3489	   The READ_PLUS operation is based upon the NFSv4.1 READ operation [2]
3490	   and similarly reads data from the regular file identified by the
3491	   current filehandle.

3493	   The client provides a rpa_offset of where the READ_PLUS is to start
3494	   and a rpa_count of how many bytes are to be read.  A rpa_offset of
3495	   zero means to read data starting at the beginning of the file.  If
3496	   rpa_offset is greater than or equal to the size of the file, the
3497	   status NFS4_OK is returned with di_length (the data length) set to
3498	   zero and eof set to TRUE.

3500	   The READ_PLUS result is comprised of an array of rpr_contents, each
3501	   of which describe a data_content4 type of data (Section 6.1.2).  For
3502	   NFSv4.2, the allowed values are data, ADH, and hole.  A server is
3503	   required to support the data type, but neither ADH nor hole.  Both an
3504	   ADH and a hole must be returned in its entirety - clients must be
3505	   prepared to get more information than they requested.  Both the start
3506	   and the end of the hole may execeed what was requested.

3508	   READ_PLUS has to support all of the errors which are returned by READ
3509	   plus NFS4ERR_UNION_NOTSUPP.  If the client asks for a hole and the
3510	   server does not support that arm of the discriminated union, but does
3511	   support one or more additional arms, it can signal to the client that
3512	   it supports the operation, but not the arm with
3513	   NFS4ERR_UNION_NOTSUPP.

3515	   If the data to be returned is comprised entirely of zeros, then the
3516	   server may elect to return that data as a hole.  The server
3517	   differentiates this to the client by setting di_allocated to TRUE in
3518	   this case.  Note that in such a scenario, the server is not required
3519	   to determine the full extent of the "hole" - it does not need to
3520	   determine where the zeros start and end.

3522	   The server may elect to return adjacent elements of the same type.
3523	   For example, the guard pattern or block size of an ADH might change,
3524	   which would require adjacent elements of type ADH.  Likewise if the
3525	   server has a range of data comprised entirely of zeros and then a
3526	   hole, it might want to return two adjacent holes to the client.

3528	   If the client specifies a rpa_count value of zero, the READ_PLUS
3529	   succeeds and returns zero bytes of data.  In all situations, the
3530	   server may choose to return fewer bytes than specified by the client.
3531	   The client needs to check for this condition and handle the condition
3532	   appropriately.

3534	   If the client specifies an rpa_offset and rpa_count value that is
3535	   entirely contained within a hole of the file, then the di_offset and
3536	   di_length returned must be for the entire hole.  This result is
3537	   considered valid until the file is changed (detected via the change
3538	   attribute).  The server MUST provide the same semantics for the hole
3539	   as if the client read the region and received zeroes; the implied
3540	   holes contents lifetime MUST be exactly the same as any other read
3541	   data.

3543	   If the client specifies an rpa_offset and rpa_count value that begins
3544	   in a non-hole of the file but extends into hole the server should
3545	   return an array comprised of both data and a hole.  The client MUST
3546	   be prepared for the server to return a short read describing just the
3547	   data.  The client will then issue another READ_PLUS for the remaining
3548	   bytes, which the server will respond with information about the hole
3549	   in the file.

3551	   Except when special stateids are used, the stateid value for a
3552	   READ_PLUS request represents a value returned from a previous byte-
3553	   range lock or share reservation request or the stateid associated
3554	   with a delegation.  The stateid identifies the associated owners if
3555	   any and is used by the server to verify that the associated locks are
3556	   still valid (e.g., have not been revoked).

3558	   If the read ended at the end-of-file (formally, in a correctly formed
3559	   READ_PLUS operation, if rpa_offset + rpa_count is equal to the size
3560	   of the file), or the READ_PLUS operation extends beyond the size of
3561	   the file (if rpa_offset + rpa_count is greater than the size of the
3562	   file), eof is returned as TRUE; otherwise, it is FALSE.  A successful
3563	   READ_PLUS of an empty file will always return eof as TRUE.

3565	   If the current filehandle is not an ordinary file, an error will be
3566	   returned to the client.  In the case that the current filehandle
3567	   represents an object of type NF4DIR, NFS4ERR_ISDIR is returned.  If
3568	   the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
3569	   returned.  In all other cases, NFS4ERR_WRONG_TYPE is returned.

3571	   For a READ_PLUS with a stateid value of all bits equal to zero, the
3572	   server MAY allow the READ_PLUS to be serviced subject to mandatory
3573	   byte-range locks or the current share deny modes for the file.  For a
3574	   READ_PLUS with a stateid value of all bits equal to one, the server
3575	   MAY allow READ_PLUS operations to bypass locking checks at the
3576	   server.

3578	   On success, the current filehandle retains its value.

3580	13.10.4.  IMPLEMENTATION

3582	   In general, the IMPLEMENTATION notes for READ in Section 18.22.4 of
3583	   [2] also apply to READ_PLUS.  One delta is that when the owner has a
3584	   locked byte range, the server MUST return an array of rpr_contents
3585	   with values inside that range.

3587	13.10.4.1.  Additional pNFS Implementation Information

3589	   With pNFS, the semantics of using READ_PLUS remains the same.  Any
3590	   data server MAY return a hole or ADH result for a READ_PLUS request
3591	   that it receives.  When a data server chooses to return such a
3592	   result, it has the option of returning information for the data
3593	   stored on that data server (as defined by the data layout), but it
3594	   MUST not return results for a byte range that includes data managed
3595	   by another data server.

3597	   A data server should do its best to return as much information about
3598	   a ADH as is feasible without having to contact the metadata server.
3599	   If communication with the metadata server is required, then every
3600	   attempt should be taken to minimize the number of requests.

3602	   If mandatory locking is enforced, then the data server must also
3603	   ensure that to return only information that is within the owner's
3604	   locked byte range.

3606	13.10.5.  READ_PLUS with Sparse Files Example

3608	   The following table describes a sparse file.  For each byte range,
3609	   the file contains either non-zero data or a hole.  In addition, the
3610	   server in this example uses a Hole Threshold of 32K.

3612	                        +-------------+----------+
3613	                        | Byte-Range  | Contents |
3614	                        +-------------+----------+
3615	                        | 0-15999     | Hole     |
3616	                        | 16K-31999   | Non-Zero |
3617	                        | 32K-255999  | Hole     |
3618	                        | 256K-287999 | Non-Zero |
3619	                        | 288K-353999 | Hole     |
3620	                        | 354K-417999 | Non-Zero |
3621	                        +-------------+----------+

3623	                                  Table 5

3625	   Under the given circumstances, if a client was to read from the file
3626	   with a max read size of 64K, the following will be the results for
3627	   the given READ_PLUS calls.  This assumes the client has already
3628	   opened the file, acquired a valid stateid ('s' in the example), and
3629	   just needs to issue READ_PLUS requests.

3631	   1.  READ_PLUS(s, 0, 64K) --> NFS_OK, eof = false, <data[0,32K],
3632	       hole[32K,224K]>.  Since the first hole is less than the server's
3633	       Hole Threshhold, the first 32K of the file is returned as data
3634	       and the remaining 32K is returned as a hole which actually
3635	       extends to 256K.

3637	   2.  READ_PLUS(s, 32K, 64K) --> NFS_OK, eof = false, <hole[32K,224K]>
3638	       The requested range was all zeros, and the current hole begins at
3639	       offset 32K and is 224K in length.  Note that the client should
3640	       not have followed up the previous READ_PLUS request with this one
3641	       as the hole information from the previous call extended past what
3642	       the client was requesting.

3644	   3.  READ_PLUS(s, 256K, 64K) --> NFS_OK, eof = false, <data[256K,
3645	       288K], hole[288K, 354K]>.  Returns an array of the 32K data and
3646	       the hole which extends to 354K.

3648	   4.  READ_PLUS(s, 354K, 64K) --> NFS_OK, eof = true, <data[354K,
3649	       418K]>.  Returns the final 64K of data and informs the client
3650	       there is no more data in the file.

3652	13.11.  Operation 66: SEEK

3654	   SEEK is an operation that allows a client to determine the location
3655	   of the next data_content4 in a file.  It allows an implementation of
3656	   the emerging extension to lseek(2) to allow clients to determine
3657	   SEEK_HOLE and SEEK_DATA.

3659	13.11.1.  ARGUMENT

3661	   struct SEEK4args {
3662	           /* CURRENT_FH: file */
3663	           stateid4        sa_stateid;
3664	           offset4         sa_offset;
3665	           data_content4   sa_what;
3666	   };

3668	13.11.2.  RESULT

3670	   union seek_content switch (data_content4 content) {
3671	   case NFS4_CONTENT_DATA:
3672	           data_info4      sc_data;
3673	   case NFS4_CONTENT_APP_DATA_HOLE:
3674	           app_data_hole4  sc_adh;
3675	   case NFS4_CONTENT_HOLE:
3676	           data_info4      sc_hole;
3677	   default:
3678	           void;
3679	   };

3681	   struct seek_res4 {
3682	           bool                    sr_eof;
3683	           seek_content            sr_contents;
3684	   };

3686	   union SEEK4res switch (nfsstat4 status) {
3687	   case NFS4_OK:
3688	           seek_res4       resok4;
3689	   default:
3690	           void;
3691	   };

3693	13.11.3.  DESCRIPTION

3695	   From the given sa_offset, find the next data_content4 of type sa_what
3696	   in the file.  For either a hole or ADH, this must return the
3697	   data_content4 in its entirety.  For data, it must not return the
3698	   actual data.

3700	   SEEK must follow the same rules for stateids as READ_PLUS
3701	   (Section 13.10.3).

3703	   If the server could not find a corresponding sa_what, then the status
3704	   would still be NFS4_OK, but sr_eof would be TRUE.  The sr_contents
3705	   would contain a zero-ed out content of the appropriate type.

3707	14.  NFSv4.2 Callback Operations

3709	14.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that the File's
3710	       Attributes Changed

3712	14.1.1.  ARGUMENTS

3714	   struct CB_ATTR_CHANGED4args {
3715	           nfs_fh4         acca_fh;
3716	           bitmap4         acca_critical;
3717	           bitmap4         acca_info;
3718	   };

3720	14.1.2.  RESULTS

3722	   struct CB_ATTR_CHANGED4res {
3723	           nfsstat4        accr_status;
3724	   };

3726	14.1.3.  DESCRIPTION

3728	   The CB_ATTR_CHANGED callback operation is used by the server to
3729	   indicate to the client that the file's attributes have been modified
3730	   on the server.  The server does not convey how the attributes have
3731	   changed, just that they have been modified.  The server can inform
3732	   the client about both critical and informational attribute changes in
3733	   the bitmask arguments.  The client SHOULD query the server about all
3734	   attributes set in acca_critical.  For all changes reflected in
3735	   acca_info, the client can decide whether or not it wants to poll the
3736	   server.

3738	   The CB_ATTR_CHANGED callback operation with the FATTR4_SEC_LABEL set
3739	   in acca_critical is the method used by the server to indicate that
3740	   the MAC label for the file referenced by acca_fh has changed.  In
3741	   many ways, the server does not care about the result returned by the
3742	   client.

3744	14.2.  Operation 15: CB_COPY - Report results of a server-side copy
3745	14.2.1.  ARGUMENT

3747	   union copy_info4 switch (nfsstat4 cca_status) {
3748	           case NFS4_OK:
3749	                   void;
3750	           default:
3751	                   length4         cca_bytes_copied;
3752	   };

3754	   struct CB_COPY4args {
3755	           nfs_fh4         cca_fh;
3756	           stateid4        cca_stateid;
3757	           copy_info4      cca_copy_info;
3758	   };

3760	14.2.2.  RESULT

3762	   struct CB_COPY4res {
3763	           nfsstat4        ccr_status;
3764	   };

3766	14.2.3.  DESCRIPTION

3768	   CB_COPY is used for both intra- and inter-server asynchronous copies.
3769	   The CB_COPY callback informs the client of the result of an
3770	   asynchronous server-side copy.  This operation is sent by the
3771	   destination server to the client in a CB_COMPOUND request.  The copy
3772	   is identified by the filehandle and stateid arguments.  The result is
3773	   indicated by the status field.  If the copy failed, cca_bytes_copied
3774	   contains the number of bytes copied before the failure occurred.  The
3775	   cca_bytes_copied value indicates the number of bytes copied but not
3776	   which specific bytes have been copied.

3778	   If the client supports the COPY operation, the client is REQUIRED to
3779	   support the CB_COPY operation.

3781	   There is a potential race between the reply to the original COPY on
3782	   the forechannel and the CB_COPY callback on the backchannel.
3783	   Sections 2.10.6.3 and 20.9.3 in [2] describes how to handle this type
3784	   of issue.

3786	   The CB_COPY operation may fail for the following reasons (this is a
3787	   partial list):

3789	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
3790	      NFS client receiving this request.

3792	15.  IANA Considerations

3794	   This section uses terms that are defined in [24].

3796	16.  References

3798	16.1.  Normative References

3800	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
3801	         Levels", March 1997.

3803	   [2]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
3804	         (NFS) Version 4 Minor Version 1 Protocol", RFC 5661,
3805	         January 2010.

3807	   [3]   Haynes, T., "Network File System (NFS) Version 4 Minor Version
3808	         2 External Data Representation Standard (XDR) Description",
3809	         March 2011.

3811	   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
3812	         Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
3813	         January 2005.

3815	   [5]   Haynes, T. and N. Williams, "Remote Procedure Call (RPC)
3816	         Security Version 3", draft-williams-rpcsecgssv3 (work in
3817	         progress), 2011.

3819	   [6]   The Open Group, "Section 'posix_fadvise()' of System Interfaces
3820	         of The Open Group Base Specifications Issue 6, IEEE Std 1003.1,
3821	         2004 Edition", 2004.

3823	   [7]   Haynes, T., "Requirements for Labeled NFS",
3824	         draft-ietf-nfsv4-labreqs-00 (work in progress).

3826	   [8]   Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
3827	         Specification", RFC 2203, September 1997.

3829	   [9]   Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel
3830	         NFS (pNFS) Operations", RFC 5664, January 2010.

3832	16.2.  Informative References

3834	   [10]  Haynes, T. and D. Noveck, "Network File System (NFS) version 4
3835	         Protocol", draft-ietf-nfsv4-rfc3530bis-09 (Work In Progress),
3836	         March 2011.

3838	   [11]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3839	         "NSDB Protocol for Federated Filesystems",
3840	         draft-ietf-nfsv4-federated-fs-protocol (Work In Progress),
3841	         2010.

3843	   [12]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3844	         "Administration Protocol for Federated Filesystems",
3845	         draft-ietf-nfsv4-federated-fs-admin (Work In Progress), 2010.

3847	   [13]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
3848	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
3849	         HTTP/1.1", RFC 2616, June 1999.

3851	   [14]  Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9,
3852	         RFC 959, October 1985.

3854	   [15]  Simpson, W., "PPP Challenge Handshake Authentication Protocol
3855	         (CHAP)", RFC 1994, August 1996.

3857	   [16]  Strohm, R., "Chapter 2, Data Blocks, Extents, and Segments, of
3858	         Oracle Database Concepts 11g Release 1 (11.1)", January 2011.

3860	   [17]  Ashdown, L., "Chapter 15, Validating Database Files and
3861	         Backups, of Oracle Database Backup and Recovery User's Guide
3862	         11g Release 1 (11.1)", August 2008.

3864	   [18]  McDougall, R. and J. Mauro, "Section 11.4.3, Detecting Memory
3865	         Corruption of Solaris Internals", 2007.

3867	   [19]  Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-
3868	         Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data
3869	         Corruption in the Storage Stack", Proceedings of the 6th USENIX
3870	         Symposium on File and Storage Technologies (FAST '08) , 2008.

3872	   [20]  "Section 46.6. Multi-Level Security (MLS) of Deployment Guide:
3873	         Deployment, configuration and administration of Red Hat
3874	         Enterprise Linux 5, Edition 6", 2011.

3876	   [21]  Quigley, D. and J. Lu, "Registry Specification for MAC Security
3877	         Label Formats", draft-quigley-label-format-registry (work in
3878	         progress), 2011.

3880	   [22]  ISEG, "IESG Processing of RFC Errata for the IETF Stream",
3881	         2008.

3883	   [23]  Eisler, M., "XDR: External Data Representation Standard",
3884	         RFC 4506, May 2006.

3886	   [24]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
3887	         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

3889	   [25]  VanDeBogart, S., Frost, C., and E. Kohler, "Reducing Seek
3890	         Overhead with Application-Directed Prefetching", Proceedings of
3891	         USENIX Annual Technical Conference , June 2009.

3893	Appendix A.  Acknowledgments

3895	   For the pNFS Access Permissions Check, the original draft was by
3896	   Sorin Faibish, David Black, Mike Eisler, and Jason Glasgow.  The work
3897	   was influenced by discussions with Benny Halevy and Bruce Fields.  A
3898	   review was done by Tom Haynes.

3900	   For the Sharing change attribute implementation details with NFSv4
3901	   clients, the original draft was by Trond Myklebust.

3903	   For the NFS Server-side Copy, the original draft was by James
3904	   Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul
3905	   Iyer.  Tom Talpey co-authored an unpublished version of that
3906	   document.  It was also was reviewed by a number of individuals:
3907	   Pranoop Erasani, Tom Haynes, Arthur Lent, Trond Myklebust, Dave
3908	   Noveck, Theresa Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani,
3909	   and Nico Williams.

3911	   For the NFS space reservation operations, the original draft was by
3912	   Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer.

3914	   For the sparse file support, the original draft was by Dean
3915	   Hildebrand and Marc Eshel.  Valuable input and advice was received
3916	   from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and
3917	   Richard Scheffenegger.

3919	   For the Application IO Hints, the original draft was by Dean
3920	   Hildebrand, Mike Eisler, Trond Myklebust, and Sam Falkner.  Some
3921	   early reviwers included Benny Halevy and Pranoop Erasani.

3923	   For Labeled NFS, the original draft was by David Quigley, James
3924	   Morris, Jarret Lu, and Tom Haynes.  Peter Staubach, Trond Myklebust,
3925	   Stephen Smalley, Sorrin Faibish, Nico Williams, and David Black also
3926	   contributed in the final push to get this accepted.

3928	   During the review process, Talia Reyes-Ortiz helped the sessions run
3929	   smoothly.  While many people contributed here and there, the core
3930	   reviewers were Andy Adamson, Pranoop Erasani, Bruce Fields, Chuck
3931	   Lever, Trond Myklebust, David Noveck, and Peter Staubach.

3933	Appendix B.  RFC Editor Notes

3935	   [RFC Editor: please remove this section prior to publishing this
3936	   document as an RFC]

3938	   [RFC Editor: prior to publishing this document as an RFC, please
3939	   replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
3940	   RFC number of this document]

3942	Author's Address

3944	   Thomas Haynes
3945	   NetApp
3946	   9110 E 66th St
3947	   Tulsa, OK  74133
3948	   USA

3950	   Phone: +1 918 307 1415
3951	   Email: thomas@netapp.com
3952	   URI:   http://www.tulsalabs.com