idnits 2.17.1 

draft-ietf-nfsv4-minorversion2-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  == There are 5 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     When a data server chooses to return a READ_HOLE result, it has the
     option of returning hole information for the data stored on that data
     server (as defined by the data layout), but it MUST not return a
     nfs_readplusreshole structure with a byte range that includes data
     managed by another data server.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Furthermore, each DS MUST not report to a client either a sparse
     ADB or data which belongs to another DS.  One implication of this
     requirement is that the app_data_block4's adb_block_size MUST be either
     be the stripe width or the stripe width must be an even multiple of it.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The second change is to provide a method for the server to notify
     the client that the attribute changed on an open file on the server.  If
     the file is closed, then during the open attempt, the client will gather
     the new attribute value.  The server MUST not communicate the new value
     of the attribute, the client MUST query it.  This requirement stems from
     the need for the client to provide sufficient access rights to the
     attribute.

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 14, 2011) is 4547 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 871, but not defined

  == Unused Reference: '9' is defined on line 4169, but no explicit reference
     was found in the text

  == Unused Reference: '10' is defined on line 4173, but no explicit
     reference was found in the text

  == Unused Reference: '28' is defined on line 4242, but no explicit
     reference was found in the text

  == Unused Reference: '29' is defined on line 4245, but no explicit
     reference was found in the text

  == Unused Reference: '30' is defined on line 4248, but no explicit
     reference was found in the text

  == Unused Reference: '31' is defined on line 4251, but no explicit
     reference was found in the text

  == Unused Reference: '32' is defined on line 4255, but no explicit
     reference was found in the text

  == Unused Reference: '33' is defined on line 4257, but no explicit
     reference was found in the text

  == Unused Reference: '34' is defined on line 4260, but no explicit
     reference was found in the text

  == Unused Reference: '35' is defined on line 4263, but no explicit
     reference was found in the text

  == Unused Reference: '36' is defined on line 4266, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 5661 (ref. '2') (Obsoleted by RFC 8881)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  == Outdated reference: A later version (-35) exists of
     draft-ietf-nfsv4-rfc3530bis-09

  -- Obsolete informational reference (is this intentional?): RFC 2616 (ref.
     '14') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC
     7235)

  -- Obsolete informational reference (is this intentional?): RFC 5226 (ref.
     '27') (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 3530 (ref.
     '36') (Obsoleted by RFC 7530)


     Summary: 1 error (**), 0 flaws (~~), 20 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                          T. Haynes
3	Internet-Draft                                                    Editor
4	Intended status: Standards Track                       November 14, 2011
5	Expires: May 17, 2012

7	                     NFS Version 4 Minor Version 2
8	                 draft-ietf-nfsv4-minorversion2-06.txt

10	Abstract

12	   This Internet-Draft describes NFS version 4 minor version two,
13	   focusing mainly on the protocol extensions made from NFS version 4
14	   minor version 0 and NFS version 4 minor version 1.  Major extensions
15	   introduced in NFS version 4 minor version two include: Server-side
16	   Copy, Space Reservations, and Support for Sparse Files.

18	Requirements Language

20	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
21	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
22	   document are to be interpreted as described in RFC 2119 [1].

24	Status of this Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on May 17, 2012.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	   This document may contain material from IETF Documents or IETF
57	   Contributions published or made publicly available before November
58	   10, 2008.  The person(s) controlling the copyright in some of this
59	   material may not have granted the IETF Trust the right to allow
60	   modifications of such material outside the IETF Standards Process.
61	   Without obtaining an adequate license from the person(s) controlling
62	   the copyright in such materials, this document may not be modified
63	   outside the IETF Standards Process, and derivative works of it may
64	   not be created outside the IETF Standards Process, except to format
65	   it for publication as an RFC or to translate it into languages other
66	   than English.

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  6
71	     1.1.   The NFS Version 4 Minor Version 2 Protocol  . . . . . . .  6
72	     1.2.   Scope of This Document  . . . . . . . . . . . . . . . . .  6
73	     1.3.   NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . .  6
74	     1.4.   Overview of NFSv4.2 Features  . . . . . . . . . . . . . .  6
75	       1.4.1.  Application I/O Advise . . . . . . . . . . . . . . . .  6
76	     1.5.   Differences from NFSv4.1  . . . . . . . . . . . . . . . .  7
77	   2.  NFS Server-side Copy . . . . . . . . . . . . . . . . . . . . .  7
78	     2.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . .  7
79	     2.2.   Protocol Overview . . . . . . . . . . . . . . . . . . . .  7
80	       2.2.1.  Intra-Server Copy  . . . . . . . . . . . . . . . . . .  9
81	       2.2.2.  Inter-Server Copy  . . . . . . . . . . . . . . . . . . 10
82	       2.2.3.  Server-to-Server Copy Protocol . . . . . . . . . . . . 13
83	     2.3.   Operations  . . . . . . . . . . . . . . . . . . . . . . . 15
84	       2.3.1.  netloc4 - Network Locations  . . . . . . . . . . . . . 15
85	       2.3.2.  Copy Offload Stateids  . . . . . . . . . . . . . . . . 16
86	     2.4.   Security Considerations . . . . . . . . . . . . . . . . . 16
87	       2.4.1.  Inter-Server Copy Security . . . . . . . . . . . . . . 16
88	   3.  Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . . 24
89	     3.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 24
90	     3.2.   Terminology . . . . . . . . . . . . . . . . . . . . . . . 25
91	     3.3.   Overview of Sparse Files and NFSv4  . . . . . . . . . . . 25
92	     3.4.   Operation 65: READ_PLUS . . . . . . . . . . . . . . . . . 26
93	       3.4.1.  ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 26
94	       3.4.2.  RESULT . . . . . . . . . . . . . . . . . . . . . . . . 27
95	       3.4.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 27
96	       3.4.4.  IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . 29
97	       3.4.5.  READ_PLUS with Sparse Files Example  . . . . . . . . . 30
98	     3.5.   Related Work  . . . . . . . . . . . . . . . . . . . . . . 31
99	     3.6.   Other Proposed Designs  . . . . . . . . . . . . . . . . . 31
100	       3.6.1.  Multi-Data Server Hole Information . . . . . . . . . . 31
101	       3.6.2.  Data Result Array  . . . . . . . . . . . . . . . . . . 32
102	       3.6.3.  User-Defined Sparse Mask . . . . . . . . . . . . . . . 32
103	       3.6.4.  Allocated flag . . . . . . . . . . . . . . . . . . . . 32
104	       3.6.5.  Dense and Sparse pNFS File Layouts . . . . . . . . . . 33
105	   4.  Space Reservation  . . . . . . . . . . . . . . . . . . . . . . 33
106	     4.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 33
107	     4.2.   Operations and attributes . . . . . . . . . . . . . . . . 35
108	     4.3.   Attribute 77: space_reserved  . . . . . . . . . . . . . . 35
109	     4.4.   Attribute 78: space_freed . . . . . . . . . . . . . . . . 36
110	   5.  Support for Application IO Hints . . . . . . . . . . . . . . . 36
111	     5.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 36
112	     5.2.   POSIX Requirements  . . . . . . . . . . . . . . . . . . . 37
113	     5.3.   Additional Requirements . . . . . . . . . . . . . . . . . 38
114	     5.4.   Security Considerations . . . . . . . . . . . . . . . . . 39
115	     5.5.   IANA Considerations . . . . . . . . . . . . . . . . . . . 39

117	   6.  Application Data Block Support . . . . . . . . . . . . . . . . 39
118	     6.1.   Generic Framework . . . . . . . . . . . . . . . . . . . . 40
119	       6.1.1.  Data Block Representation  . . . . . . . . . . . . . . 40
120	       6.1.2.  Data Content . . . . . . . . . . . . . . . . . . . . . 41
121	     6.2.   pNFS Considerations . . . . . . . . . . . . . . . . . . . 41
122	     6.3.   An Example of Detecting Corruption  . . . . . . . . . . . 42
123	     6.4.   Example of READ_PLUS  . . . . . . . . . . . . . . . . . . 43
124	     6.5.   Zero Filled Holes . . . . . . . . . . . . . . . . . . . . 44
125	   7.  Labeled NFS  . . . . . . . . . . . . . . . . . . . . . . . . . 44
126	     7.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 44
127	     7.2.   Definitions . . . . . . . . . . . . . . . . . . . . . . . 45
128	     7.3.   MAC Security Attribute  . . . . . . . . . . . . . . . . . 46
129	       7.3.1.  Interpreting FATTR4_SEC_LABEL  . . . . . . . . . . . . 46
130	       7.3.2.  Delegations  . . . . . . . . . . . . . . . . . . . . . 47
131	       7.3.3.  Permission Checking  . . . . . . . . . . . . . . . . . 47
132	       7.3.4.  Object Creation  . . . . . . . . . . . . . . . . . . . 48
133	       7.3.5.  Existing Objects . . . . . . . . . . . . . . . . . . . 48
134	       7.3.6.  Label Changes  . . . . . . . . . . . . . . . . . . . . 48
135	     7.4.   pNFS Considerations . . . . . . . . . . . . . . . . . . . 49
136	     7.5.   Discovery of Server LNFS Support  . . . . . . . . . . . . 49
137	     7.6.   MAC Security NFS Modes of Operation . . . . . . . . . . . 50
138	       7.6.1.  Full Mode  . . . . . . . . . . . . . . . . . . . . . . 50
139	       7.6.2.  Smart Client Mode  . . . . . . . . . . . . . . . . . . 51
140	       7.6.3.  Smart Server Mode  . . . . . . . . . . . . . . . . . . 52
141	     7.7.   Security Considerations . . . . . . . . . . . . . . . . . 53
142	   8.  Sharing change attribute implementation details with NFSv4
143	       clients  . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
144	     8.1.   Introduction  . . . . . . . . . . . . . . . . . . . . . . 53
145	     8.2.   Definition of the 'change_attr_type' per-file system
146	            attribute . . . . . . . . . . . . . . . . . . . . . . . . 54
147	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 55
148	   10. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . . 55
149	   11. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . . 59
150	     11.1.  Operation 59: COPY - Initiate a server-side copy  . . . . 59
151	     11.2.  Operation 60: COPY_ABORT - Cancel a server-side copy  . . 66
152	     11.3.  Operation 61: COPY_NOTIFY - Notify a source server of
153	            a future copy . . . . . . . . . . . . . . . . . . . . . . 67
154	     11.4.  Operation 62: COPY_REVOKE - Revoke a destination
155	            server's copy privileges  . . . . . . . . . . . . . . . . 70
156	     11.5.  Operation 63: COPY_STATUS - Poll for status of a
157	            server-side copy  . . . . . . . . . . . . . . . . . . . . 71
158	     11.6.  Modification to Operation 42: EXCHANGE_ID -
159	            Instantiate Client ID . . . . . . . . . . . . . . . . . . 72
160	     11.7.  Operation 64: INITIALIZE  . . . . . . . . . . . . . . . . 73
161	     11.8.  Operation 67: IO_ADVISE - Application I/O access
162	            pattern hints . . . . . . . . . . . . . . . . . . . . . . 76
163	     11.9.  Changes to Operation 51: LAYOUTRETURN . . . . . . . . . . 83
164	       11.9.1. Introduction . . . . . . . . . . . . . . . . . . . . . 83
165	       11.9.2. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 84
166	       11.9.3. RESULT . . . . . . . . . . . . . . . . . . . . . . . . 84
167	       11.9.4. DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 84
168	       11.9.5. IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . 85
169	     11.10. Operation 65: READ_PLUS . . . . . . . . . . . . . . . . . 86
170	     11.11. Operation 66: SEEK  . . . . . . . . . . . . . . . . . . . 88
171	   12. NFSv4.2 Callback Operations  . . . . . . . . . . . . . . . . . 89
172	     12.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that
173	            the File's Attributes Changed . . . . . . . . . . . . . . 89
174	     12.2.  Operation 15: CB_COPY - Report results of a
175	            server-side copy  . . . . . . . . . . . . . . . . . . . . 90
176	   13. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 91
177	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 91
178	     14.1.  Normative References  . . . . . . . . . . . . . . . . . . 91
179	     14.2.  Informative References  . . . . . . . . . . . . . . . . . 92
180	   Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 94
181	   Appendix B.  RFC Editor Notes  . . . . . . . . . . . . . . . . . . 95
182	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 95

184	1.  Introduction

186	1.1.  The NFS Version 4 Minor Version 2 Protocol

188	   The NFS version 4 minor version 2 (NFSv4.2) protocol is the third
189	   minor version of the NFS version 4 (NFSv4) protocol.  The first minor
190	   version, NFSv4.0, is described in [11] and the second minor version,
191	   NFSv4.1, is described in [2].  It follows the guidelines for minor
192	   versioning that are listed in Section 11 of [11].

194	   As a minor version, NFSv4.2 is consistent with the overall goals for
195	   NFSv4, but extends the protocol so as to better meet those goals,
196	   based on experiences with NFSv4.1.  In addition, NFSv4.2 has adopted
197	   some additional goals, which motivate some of the major extensions in
198	   NFSv4.2.

200	1.2.  Scope of This Document

202	   This document describes the NFSv4.2 protocol.  With respect to
203	   NFSv4.0 and NFSv4.1, this document does not:

205	   o  describe the NFSv4.0 or NFSv4.1 protocols, except where needed to
206	      contrast with NFSv4.2.

208	   o  modify the specification of the NFSv4.0 or NFSv4.1 protocols.

210	   o  clarify the NFSv4.0 or NFSv4.1 protocols.  I.e., any
211	      clarifications made here apply to NFSv4.2 and neither of the prior
212	      protocols.

214	   The full XDR for NFSv4.2 is presented in [3].

216	1.3.  NFSv4.2 Goals

218	   [[Comment.1: This needs fleshing out! --TH]]

220	1.4.  Overview of NFSv4.2 Features

222	   [[Comment.2: This needs fleshing out! --TH]]

224	1.4.1.  Application I/O Advise

226	   We propose a new IO_ADVISE operation for NFSv4.2 that clients can use
227	   to communicate expected I/O behavior to the server.  By communicating
228	   future I/O behavior such as whether a file will be accessed
229	   sequentially or randomly, and whether a file will or will not be
230	   accessed in the near future, servers can optimize future I/O requests
231	   for a file by, for example, prefetching or evicting data.  This
232	   operation can be used to support the posix_fadvise function as well
233	   as other applications such as databases and video editors.

235	1.5.  Differences from NFSv4.1

237	   [[Comment.3: This needs fleshing out! --TH]]

239	2.  NFS Server-side Copy

241	2.1.  Introduction

243	   This section describes a server-side copy feature for the NFS
244	   protocol.

246	   The server-side copy feature provides a mechanism for the NFS client
247	   to perform a file copy on the server without the data being
248	   transmitted back and forth over the network.

250	   Without this feature, an NFS client copies data from one location to
251	   another by reading the data from the server over the network, and
252	   then writing the data back over the network to the server.  Using
253	   this server-side copy operation, the client is able to instruct the
254	   server to copy the data locally without the data being sent back and
255	   forth over the network unnecessarily.

257	   In general, this feature is useful whenever data is copied from one
258	   location to another on the server.  It is particularly useful when
259	   copying the contents of a file from a backup.  Backup-versions of a
260	   file are copied for a number of reasons, including restoring and
261	   cloning data.

263	   If the source object and destination object are on different file
264	   servers, the file servers will communicate with one another to
265	   perform the copy operation.  The server-to-server protocol by which
266	   this is accomplished is not defined in this document.

268	2.2.  Protocol Overview

270	   The server-side copy offload operations support both intra-server and
271	   inter-server file copies.  An intra-server copy is a copy in which
272	   the source file and destination file reside on the same server.  In
273	   an inter-server copy, the source file and destination file are on
274	   different servers.  In both cases, the copy may be performed
275	   synchronously or asynchronously.

277	   Throughout the rest of this document, we refer to the NFS server
278	   containing the source file as the "source server" and the NFS server
279	   to which the file is transferred as the "destination server".  In the
280	   case of an intra-server copy, the source server and destination
281	   server are the same server.  Therefore in the context of an intra-
282	   server copy, the terms source server and destination server refer to
283	   the single server performing the copy.

285	   The operations described below are designed to copy files.  Other
286	   file system objects can be copied by building on these operations or
287	   using other techniques.  For example if the user wishes to copy a
288	   directory, the client can synthesize a directory copy by first
289	   creating the destination directory and then copying the source
290	   directory's files to the new destination directory.  If the user
291	   wishes to copy a namespace junction [12] [13], the client can use the
292	   ONC RPC Federated Filesystem protocol [13] to perform the copy.
293	   Specifically the client can determine the source junction's
294	   attributes using the FEDFS_LOOKUP_FSN procedure and create a
295	   duplicate junction using the FEDFS_CREATE_JUNCTION procedure.

297	   For the inter-server copy protocol, the operations are defined to be
298	   compatible with a server-to-server copy protocol in which the
299	   destination server reads the file data from the source server.  This
300	   model in which the file data is pulled from the source by the
301	   destination has a number of advantages over a model in which the
302	   source pushes the file data to the destination.  The advantages of
303	   the pull model include:

305	   o  The pull model only requires a remote server (i.e., the
306	      destination server) to be granted read access.  A push model
307	      requires a remote server (i.e., the source server) to be granted
308	      write access, which is more privileged.

310	   o  The pull model allows the destination server to stop reading if it
311	      has run out of space.  In a push model, the destination server
312	      must flow control the source server in this situation.

314	   o  The pull model allows the destination server to easily flow
315	      control the data stream by adjusting the size of its read
316	      operations.  In a push model, the destination server does not have
317	      this ability.  The source server in a push model is capable of
318	      writing chunks larger than the destination server has requested in
319	      attributes and session parameters.  In theory, the destination
320	      server could perform a "short" write in this situation, but this
321	      approach is known to behave poorly in practice.

323	   The following operations are provided to support server-side copy:

325	   COPY_NOTIFY:  For inter-server copies, the client sends this
326	      operation to the source server to notify it of a future file copy
327	      from a given destination server for the given user.

329	   COPY_REVOKE:  Also for inter-server copies, the client sends this
330	      operation to the source server to revoke permission to copy a file
331	      for the given user.

333	   COPY:  Used by the client to request a file copy.

335	   COPY_ABORT:  Used by the client to abort an asynchronous file copy.

337	   COPY_STATUS:  Used by the client to poll the status of an
338	      asynchronous file copy.

340	   CB_COPY:  Used by the destination server to report the results of an
341	      asynchronous file copy to the client.

343	   These operations are described in detail in Section 2.3.  This
344	   section provides an overview of how these operations are used to
345	   perform server-side copies.

347	2.2.1.  Intra-Server Copy

349	   To copy a file on a single server, the client uses a COPY operation.
350	   The server may respond to the copy operation with the final results
351	   of the copy or it may perform the copy asynchronously and deliver the
352	   results using a CB_COPY operation callback.  If the copy is performed
353	   asynchronously, the client may poll the status of the copy using
354	   COPY_STATUS or cancel the copy using COPY_ABORT.

356	   A synchronous intra-server copy is shown in Figure 1.  In this
357	   example, the NFS server chooses to perform the copy synchronously.
358	   The copy operation is completed, either successfully or
359	   unsuccessfully, before the server replies to the client's request.
360	   The server's reply contains the final result of the operation.

362	     Client                                  Server
363	        +                                      +
364	        |                                      |
365	        |--- COPY ---------------------------->| Client requests
366	        |<------------------------------------/| a file copy
367	        |                                      |
368	        |                                      |

370	                Figure 1: A synchronous intra-server copy.

372	   An asynchronous intra-server copy is shown in Figure 2.  In this
373	   example, the NFS server performs the copy asynchronously.  The
374	   server's reply to the copy request indicates that the copy operation
375	   was initiated and the final result will be delivered at a later time.
376	   The server's reply also contains a copy stateid.  The client may use
377	   this copy stateid to poll for status information (as shown) or to
378	   cancel the copy using a COPY_ABORT.  When the server completes the
379	   copy, the server performs a callback to the client and reports the
380	   results.

382	     Client                                  Server
383	        +                                      +
384	        |                                      |
385	        |--- COPY ---------------------------->| Client requests
386	        |<------------------------------------/| a file copy
387	        |                                      |
388	        |                                      |
389	        |--- COPY_STATUS --------------------->| Client may poll
390	        |<------------------------------------/| for status
391	        |                                      |
392	        |                  .                   | Multiple COPY_STATUS
393	        |                  .                   | operations may be sent.
394	        |                  .                   |
395	        |                                      |
396	        |<-- CB_COPY --------------------------| Server reports results
397	        |\------------------------------------>|
398	        |                                      |

400	               Figure 2: An asynchronous intra-server copy.

402	2.2.2.  Inter-Server Copy

404	   A copy may also be performed between two servers.  The copy protocol
405	   is designed to accommodate a variety of network topologies.  As shown
406	   in Figure 3, the client and servers may be connected by multiple
407	   networks.  In particular, the servers may be connected by a
408	   specialized, high speed network (network 192.168.33.0/24 in the
409	   diagram) that does not include the client.  The protocol allows the
410	   client to setup the copy between the servers (over network
411	   10.11.78.0/24 in the diagram) and for the servers to communicate on
412	   the high speed network if they choose to do so.

414	                             192.168.33.0/24
415	                 +-------------------------------------+
416	                 |                                     |
417	                 |                                     |
418	                 | 192.168.33.18                       | 192.168.33.56
419	         +-------+------+                       +------+------+
420	         |     Source   |                       | Destination |
421	         +-------+------+                       +------+------+
422	                 | 10.11.78.18                         | 10.11.78.56
423	                 |                                     |
424	                 |                                     |
425	                 |             10.11.78.0/24           |
426	                 +------------------+------------------+
427	                                    |
428	                                    |
429	                                    | 10.11.78.243
430	                              +-----+-----+
431	                              |   Client  |
432	                              +-----------+

434	            Figure 3: An example inter-server network topology.

436	   For an inter-server copy, the client notifies the source server that
437	   a file will be copied by the destination server using a COPY_NOTIFY
438	   operation.  The client then initiates the copy by sending the COPY
439	   operation to the destination server.  The destination server may
440	   perform the copy synchronously or asynchronously.

442	   A synchronous inter-server copy is shown in Figure 4.  In this case,
443	   the destination server chooses to perform the copy before responding
444	   to the client's COPY request.

446	   An asynchronous copy is shown in Figure 5.  In this case, the
447	   destination server chooses to respond to the client's COPY request
448	   immediately and then perform the copy asynchronously.

450	     Client                Source         Destination
451	        +                    +                 +
452	        |                    |                 |
453	        |--- COPY_NOTIFY --->|                 |
454	        |<------------------/|                 |
455	        |                    |                 |
456	        |                    |                 |
457	        |--- COPY ---------------------------->|
458	        |                    |                 |
459	        |                    |                 |
460	        |                    |<----- read -----|
461	        |                    |\--------------->|
462	        |                    |                 |
463	        |                    |        .        | Multiple reads may
464	        |                    |        .        | be necessary
465	        |                    |        .        |
466	        |                    |                 |
467	        |                    |                 |
468	        |<------------------------------------/| Destination replies
469	        |                    |                 | to COPY

471	                Figure 4: A synchronous inter-server copy.

473	     Client                Source         Destination
474	        +                    +                 +
475	        |                    |                 |
476	        |--- COPY_NOTIFY --->|                 |
477	        |<------------------/|                 |
478	        |                    |                 |
479	        |                    |                 |
480	        |--- COPY ---------------------------->|
481	        |<------------------------------------/|
482	        |                    |                 |
483	        |                    |                 |
484	        |                    |<----- read -----|
485	        |                    |\--------------->|
486	        |                    |                 |
487	        |                    |        .        | Multiple reads may
488	        |                    |        .        | be necessary
489	        |                    |        .        |
490	        |                    |                 |
491	        |                    |                 |
492	        |--- COPY_STATUS --------------------->| Client may poll
493	        |<------------------------------------/| for status
494	        |                    |                 |
495	        |                    |        .        | Multiple COPY_STATUS
496	        |                    |        .        | operations may be sent
497	        |                    |        .        |
498	        |                    |                 |
499	        |                    |                 |
500	        |                    |                 |
501	        |<-- CB_COPY --------------------------| Destination reports
502	        |\------------------------------------>| results
503	        |                    |                 |

505	               Figure 5: An asynchronous inter-server copy.

507	2.2.3.  Server-to-Server Copy Protocol

509	   During an inter-server copy, the destination server reads the file
510	   data from the source server.  The source server and destination
511	   server are not required to use a specific protocol to transfer the
512	   file data.  The choice of what protocol to use is ultimately the
513	   destination server's decision.

515	2.2.3.1.  Using NFSv4.x as a Server-to-Server Copy Protocol

517	   The destination server MAY use standard NFSv4.x (where x >= 1) to
518	   read the data from the source server.  If NFSv4.x is used for the
519	   server-to-server copy protocol, the destination server can use the
520	   filehandle contained in the COPY request with standard NFSv4.x
521	   operations to read data from the source server.  Specifically, the
522	   destination server may use the NFSv4.x OPEN operation's CLAIM_FH
523	   facility to open the file being copied and obtain an open stateid.
524	   Using the stateid, the destination server may then use NFSv4.x READ
525	   operations to read the file.

527	2.2.3.2.  Using an alternative Server-to-Server Copy Protocol

529	   In a homogeneous environment, the source and destination servers
530	   might be able to perform the file copy extremely efficiently using
531	   specialized protocols.  For example the source and destination
532	   servers might be two nodes sharing a common file system format for
533	   the source and destination file systems.  Thus the source and
534	   destination are in an ideal position to efficiently render the image
535	   of the source file to the destination file by replicating the file
536	   system formats at the block level.  Another possibility is that the
537	   source and destination might be two nodes sharing a common storage
538	   area network, and thus there is no need to copy any data at all, and
539	   instead ownership of the file and its contents might simply be re-
540	   assigned to the destination.  To allow for these possibilities, the
541	   destination server is allowed to use a server-to-server copy protocol
542	   of its choice.

544	   In a heterogeneous environment, using a protocol other than NFSv4.x
545	   (e.g,.  HTTP [14] or FTP [15]) presents some challenges.  In
546	   particular, the destination server is presented with the challenge of
547	   accessing the source file given only an NFSv4.x filehandle.

549	   One option for protocols that identify source files with path names
550	   is to use an ASCII hexadecimal representation of the source
551	   filehandle as the file name.

553	   Another option for the source server is to use URLs to direct the
554	   destination server to a specialized service.  For example, the
555	   response to COPY_NOTIFY could include the URL
556	   ftp://s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII
557	   hexadecimal representation of the source filehandle.  When the
558	   destination server receives the source server's URL, it would use
559	   "_FH/0x12345" as the file name to pass to the FTP server listening on
560	   port 9999 of s1.example.com.  On port 9999 there would be a special
561	   instance of the FTP service that understands how to convert NFS
562	   filehandles to an open file descriptor (in many operating systems,
563	   this would require a new system call, one which is the inverse of the
564	   makefh() function that the pre-NFSv4 MOUNT service needs).

566	   Authenticating and identifying the destination server to the source
567	   server is also a challenge.  Recommendations for how to accomplish
568	   this are given in Section 2.4.1.2.4 and Section 2.4.1.4.

570	2.3.  Operations

572	   In the sections that follow, several operations are defined that
573	   together provide the server-side copy feature.  These operations are
574	   intended to be OPTIONAL operations as defined in section 17 of [2].
575	   The COPY_NOTIFY, COPY_REVOKE, COPY, COPY_ABORT, and COPY_STATUS
576	   operations are designed to be sent within an NFSv4 COMPOUND
577	   procedure.  The CB_COPY operation is designed to be sent within an
578	   NFSv4 CB_COMPOUND procedure.

580	   Each operation is performed in the context of the user identified by
581	   the ONC RPC credential of its containing COMPOUND or CB_COMPOUND
582	   request.  For example, a COPY_ABORT operation issued by a given user
583	   indicates that a specified COPY operation initiated by the same user
584	   be canceled.  Therefore a COPY_ABORT MUST NOT interfere with a copy
585	   of the same file initiated by another user.

587	   An NFS server MAY allow an administrative user to monitor or cancel
588	   copy operations using an implementation specific interface.

590	2.3.1.  netloc4 - Network Locations

592	   The server-side copy operations specify network locations using the
593	   netloc4 data type shown below:

595	   enum netloc_type4 {
596	           NL4_NAME        = 0,
597	           NL4_URL         = 1,
598	           NL4_NETADDR     = 2
599	   };
600	   union netloc4 switch (netloc_type4 nl_type) {
601	           case NL4_NAME:          utf8str_cis nl_name;
602	           case NL4_URL:           utf8str_cis nl_url;
603	           case NL4_NETADDR:       netaddr4    nl_addr;
604	   };

606	   If the netloc4 is of type NL4_NAME, the nl_name field MUST be
607	   specified as a UTF-8 string.  The nl_name is expected to be resolved
608	   to a network address via DNS, LDAP, NIS, /etc/hosts, or some other
609	   means.  If the netloc4 is of type NL4_URL, a server URL [4]
610	   appropriate for the server-to-server copy operation is specified as a
611	   UTF-8 string.  If the netloc4 is of type NL4_NETADDR, the nl_addr
612	   field MUST contain a valid netaddr4 as defined in Section 3.3.9 of
613	   [2].

615	   When netloc4 values are used for an inter-server copy as shown in
616	   Figure 3, their values may be evaluated on the source server,
617	   destination server, and client.  The network environment in which
618	   these systems operate should be configured so that the netloc4 values
619	   are interpreted as intended on each system.

621	2.3.2.  Copy Offload Stateids

623	   A server may perform a copy offload operation asynchronously.  An
624	   asynchronous copy is tracked using a copy offload stateid.  Copy
625	   offload stateids are included in the COPY, COPY_ABORT, COPY_STATUS,
626	   and CB_COPY operations.

628	   Section 8.2.4 of [2] specifies that stateids are valid until either
629	   (A) the client or server restart or (B) the client returns the
630	   resource.

632	   A copy offload stateid will be valid until either (A) the client or
633	   server restart or (B) the client returns the resource by issuing a
634	   COPY_ABORT operation or the client replies to a CB_COPY operation.

636	   A copy offload stateid's seqid MUST NOT be 0 (zero).  In the context
637	   of a copy offload operation, it is ambiguous to indicate the most
638	   recent copy offload operation using a stateid with seqid of 0 (zero).
639	   Therefore a copy offload stateid with seqid of 0 (zero) MUST be
640	   considered invalid.

642	2.4.  Security Considerations

644	   The security considerations pertaining to NFSv4 [11] apply to this
645	   document.

647	   The standard security mechanisms provide by NFSv4 [11] may be used to
648	   secure the protocol described in this document.

650	   NFSv4 clients and servers supporting the the inter-server copy
651	   operations described in this document are REQUIRED to implement [5],
652	   including the RPCSEC_GSSv3 privileges copy_from_auth and
653	   copy_to_auth.  If the server-to-server copy protocol is ONC RPC
654	   based, the servers are also REQUIRED to implement the RPCSEC_GSSv3
655	   privilege copy_confirm_auth.  These requirements to implement are not
656	   requirements to use.  NFSv4 clients and servers are RECOMMENDED to
657	   use [5] to secure server-side copy operations.

659	2.4.1.  Inter-Server Copy Security

661	2.4.1.1.  Requirements for Secure Inter-Server Copy

663	   Inter-server copy is driven by several requirements:

665	   o  The specification MUST NOT mandate an inter-server copy protocol.
666	      There are many ways to copy data.  Some will be more optimal than
667	      others depending on the identities of the source server and
668	      destination server.  For example the source and destination
669	      servers might be two nodes sharing a common file system format for
670	      the source and destination file systems.  Thus the source and
671	      destination are in an ideal position to efficiently render the
672	      image of the source file to the destination file by replicating
673	      the file system formats at the block level.  In other cases, the
674	      source and destination might be two nodes sharing a common storage
675	      area network, and thus there is no need to copy any data at all,
676	      and instead ownership of the file and its contents simply gets re-
677	      assigned to the destination.

679	   o  The specification MUST provide guidance for using NFSv4.x as a
680	      copy protocol.  For those source and destination servers willing
681	      to use NFSv4.x there are specific security considerations that
682	      this specification can and does address.

684	   o  The specification MUST NOT mandate pre-configuration between the
685	      source and destination server.  Requiring that the source and
686	      destination first have a "copying relationship" increases the
687	      administrative burden.  However the specification MUST NOT
688	      preclude implementations that require pre-configuration.

690	   o  The specification MUST NOT mandate a trust relationship between
691	      the source and destination server.  The NFSv4 security model
692	      requires mutual authentication between a principal on an NFS
693	      client and a principal on an NFS server.  This model MUST continue
694	      with the introduction of COPY.

696	2.4.1.2.  Inter-Server Copy with RPCSEC_GSSv3

698	   When the client sends a COPY_NOTIFY to the source server to expect
699	   the destination to attempt to copy data from the source server, it is
700	   expected that this copy is being done on behalf of the principal
701	   (called the "user principal") that sent the RPC request that encloses
702	   the COMPOUND procedure that contains the COPY_NOTIFY operation.  The
703	   user principal is identified by the RPC credentials.  A mechanism
704	   that allows the user principal to authorize the destination server to
705	   perform the copy in a manner that lets the source server properly
706	   authenticate the destination's copy, and without allowing the
707	   destination to exceed its authorization is necessary.

709	   An approach that sends delegated credentials of the client's user
710	   principal to the destination server is not used for the following
711	   reasons.  If the client's user delegated its credentials, the
712	   destination would authenticate as the user principal.  If the
713	   destination were using the NFSv4 protocol to perform the copy, then
714	   the source server would authenticate the destination server as the
715	   user principal, and the file copy would securely proceed.  However,
716	   this approach would allow the destination server to copy other files.
717	   The user principal would have to trust the destination server to not
718	   do so.  This is counter to the requirements, and therefore is not
719	   considered.  Instead an approach using RPCSEC_GSSv3 [5] privileges is
720	   proposed.

722	   One of the stated applications of the proposed RPCSEC_GSSv3 protocol
723	   is compound client host and user authentication [+ privilege
724	   assertion].  For inter-server file copy, we require compound NFS
725	   server host and user authentication [+ privilege assertion].  The
726	   distinction between the two is one without meaning.

728	   RPCSEC_GSSv3 introduces the notion of privileges.  We define three
729	   privileges:

731	   copy_from_auth:  A user principal is authorizing a source principal
732	      ("nfs@<source>") to allow a destination principal ("nfs@
733	      <destination>") to copy a file from the source to the destination.
734	      This privilege is established on the source server before the user
735	      principal sends a COPY_NOTIFY operation to the source server.

737	   struct copy_from_auth_priv {
738	           secret4             cfap_shared_secret;
739	           netloc4             cfap_destination;
740	           /* the NFSv4 user name that the user principal maps to */
741	           utf8str_mixed       cfap_username;
742	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
743	           unsigned int        cfap_seq_num;
744	   };

746	      cap_shared_secret is a secret value the user principal generates.

748	   copy_to_auth:  A user principal is authorizing a destination
749	      principal ("nfs@<destination>") to allow it to copy a file from
750	      the source to the destination.  This privilege is established on
751	      the destination server before the user principal sends a COPY
752	      operation to the destination server.

754	   struct copy_to_auth_priv {
755	           /* equal to cfap_shared_secret */
756	           secret4              ctap_shared_secret;
757	           netloc4              ctap_source;
758	           /* the NFSv4 user name that the user principal maps to */
759	           utf8str_mixed        ctap_username;
760	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
761	           unsigned int         ctap_seq_num;
762	   };

764	      ctap_shared_secret is a secret value the user principal generated
765	      and was used to establish the copy_from_auth privilege with the
766	      source principal.

768	   copy_confirm_auth:  A destination principal is confirming with the
769	      source principal that it is authorized to copy data from the
770	      source on behalf of the user principal.  When the inter-server
771	      copy protocol is NFSv4, or for that matter, any protocol capable
772	      of being secured via RPCSEC_GSSv3 (i.e., any ONC RPC protocol),
773	      this privilege is established before the file is copied from the
774	      source to the destination.

776	   struct copy_confirm_auth_priv {
777	           /* equal to GSS_GetMIC() of cfap_shared_secret */
778	           opaque              ccap_shared_secret_mic<>;
779	           /* the NFSv4 user name that the user principal maps to */
780	           utf8str_mixed       ccap_username;
781	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
782	           unsigned int        ccap_seq_num;
783	   };

785	2.4.1.2.1.  Establishing a Security Context

787	   When the user principal wants to COPY a file between two servers, if
788	   it has not established copy_from_auth and copy_to_auth privileges on
789	   the servers, it establishes them:

791	   o  The user principal generates a secret it will share with the two
792	      servers.  This shared secret will be placed in the
793	      cfap_shared_secret and ctap_shared_secret fields of the
794	      appropriate privilege data types, copy_from_auth_priv and
795	      copy_to_auth_priv.

797	   o  An instance of copy_from_auth_priv is filled in with the shared
798	      secret, the destination server, and the NFSv4 user id of the user
799	      principal.  It will be sent with an RPCSEC_GSS3_CREATE procedure,
800	      and so cfap_seq_num is set to the seq_num of the credential of the
801	      RPCSEC_GSS3_CREATE procedure.  Because cfap_shared_secret is a
802	      secret, after XDR encoding copy_from_auth_priv, GSS_Wrap() (with
803	      privacy) is invoked on copy_from_auth_priv.  The
804	      RPCSEC_GSS3_CREATE procedure's arguments are:

806	      struct {
807	         rpc_gss3_gss_binding    *compound_binding;
808	         rpc_gss3_chan_binding   *chan_binding_mic;
809	         rpc_gss3_assertion      assertions<>;
810	         rpc_gss3_extension      extensions<>;
811	      } rpc_gss3_create_args;

813	      The string "copy_from_auth" is placed in assertions[0].privs.  The
814	      output of GSS_Wrap() is placed in extensions[0].data.  The field
815	      extensions[0].critical is set to TRUE.  The source server calls
816	      GSS_Unwrap() on the privilege, and verifies that the seq_num
817	      matches the credential.  It then verifies that the NFSv4 user id
818	      being asserted matches the source server's mapping of the user
819	      principal.  If it does, the privilege is established on the source
820	      server as: <"copy_from_auth", user id, destination>.  The
821	      successful reply to RPCSEC_GSS3_CREATE has:

823	      struct {
824	         opaque                  handle<>;
825	         rpc_gss3_chan_binding   *chan_binding_mic;
826	         rpc_gss3_assertion      granted_assertions<>;
827	         rpc_gss3_assertion      server_assertions<>;
828	         rpc_gss3_extension      extensions<>;
829	      } rpc_gss3_create_res;

831	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
832	      use on COPY_NOTIFY requests involving the source and destination
833	      server. granted_assertions[0].privs will be equal to
834	      "copy_from_auth".  The server will return a GSS_Wrap() of
835	      copy_to_auth_priv.

837	   o  An instance of copy_to_auth_priv is filled in with the shared
838	      secret, the source server, and the NFSv4 user id.  It will be sent
839	      with an RPCSEC_GSS3_CREATE procedure, and so ctap_seq_num is set
840	      to the seq_num of the credential of the RPCSEC_GSS3_CREATE
841	      procedure.  Because ctap_shared_secret is a secret, after XDR
842	      encoding copy_to_auth_priv, GSS_Wrap() is invoked on
843	      copy_to_auth_priv.  The RPCSEC_GSS3_CREATE procedure's arguments
844	      are:

846	      struct {
847	         rpc_gss3_gss_binding    *compound_binding;
848	         rpc_gss3_chan_binding   *chan_binding_mic;
849	         rpc_gss3_assertion      assertions<>;
850	         rpc_gss3_extension      extensions<>;
851	      } rpc_gss3_create_args;

853	      The string "copy_to_auth" is placed in assertions[0].privs.  The
854	      output of GSS_Wrap() is placed in extensions[0].data.  The field
855	      extensions[0].critical is set to TRUE.  After unwrapping,
856	      verifying the seq_num, and the user principal to NFSv4 user ID
857	      mapping, the destination establishes a privilege of
858	      <"copy_to_auth", user id, source>.  The successful reply to
859	      RPCSEC_GSS3_CREATE has:

861	      struct {
862	         opaque                  handle<>;
863	         rpc_gss3_chan_binding   *chan_binding_mic;
864	         rpc_gss3_assertion      granted_assertions<>;
865	         rpc_gss3_assertion      server_assertions<>;
866	         rpc_gss3_extension      extensions<>;
867	      } rpc_gss3_create_res;

869	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
870	      use on COPY requests involving the source and destination server.
871	      The field granted_assertions[0].privs will be equal to
872	      "copy_to_auth".  The server will return a GSS_Wrap() of
873	      copy_to_auth_priv.

875	2.4.1.2.2.  Starting a Secure Inter-Server Copy

877	   When the client sends a COPY_NOTIFY request to the source server, it
878	   uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle.
879	   cna_destination_server in COPY_NOTIFY MUST be the same as the name of
880	   the destination server specified in copy_from_auth_priv.  Otherwise,
881	   COPY_NOTIFY will fail with NFS4ERR_ACCESS.  The source server
882	   verifies that the privilege <"copy_from_auth", user id, destination>
883	   exists, and annotates it with the source filehandle, if the user
884	   principal has read access to the source file, and if administrative
885	   policies give the user principal and the NFS client read access to
886	   the source file (i.e., if the ACCESS operation would grant read
887	   access).  Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS.

889	   When the client sends a COPY request to the destination server, it
890	   uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle.
891	   ca_source_server in COPY MUST be the same as the name of the source
892	   server specified in copy_to_auth_priv.  Otherwise, COPY will fail
893	   with NFS4ERR_ACCESS.  The destination server verifies that the
894	   privilege <"copy_to_auth", user id, source> exists, and annotates it
895	   with the source and destination filehandles.  If the client has
896	   failed to establish the "copy_to_auth" policy it will reject the
897	   request with NFS4ERR_PARTNER_NO_AUTH.

899	   If the client sends a COPY_REVOKE to the source server to rescind the
900	   destination server's copy privilege, it uses the privileged
901	   "copy_from_auth" RPCSEC_GSSv3 handle and the cra_destination_server
902	   in COPY_REVOKE MUST be the same as the name of the destination server
903	   specified in copy_from_auth_priv.  The source server will then delete
904	   the <"copy_from_auth", user id, destination> privilege and fail any
905	   subsequent copy requests sent under the auspices of this privilege
906	   from the destination server.

908	2.4.1.2.3.  Securing ONC RPC Server-to-Server Copy Protocols

910	   After a destination server has a "copy_to_auth" privilege established
911	   on it, and it receives a COPY request, if it knows it will use an ONC
912	   RPC protocol to copy data, it will establish a "copy_confirm_auth"
913	   privilege on the source server, using nfs@<destination> as the
914	   initiator principal, and nfs@<source> as the target principal.

916	   The value of the field ccap_shared_secret_mic is a GSS_VerifyMIC() of
917	   the shared secret passed in the copy_to_auth privilege.  The field
918	   ccap_username is the mapping of the user principal to an NFSv4 user
919	   name ("user"@"domain" form), and MUST be the same as ctap_username
920	   and cfap_username.  The field ccap_seq_num is the seq_num of the
921	   RPCSEC_GSSv3 credential used for the RPCSEC_GSS3_CREATE procedure the
922	   destination will send to the source server to establish the
923	   privilege.

925	   The source server verifies the privilege, and establishes a
926	   <"copy_confirm_auth", user id, destination> privilege.  If the source
927	   server fails to verify the privilege, the COPY operation will be
928	   rejected with NFS4ERR_PARTNER_NO_AUTH.  All subsequent ONC RPC
929	   requests sent from the destination to copy data from the source to
930	   the destination will use the RPCSEC_GSSv3 handle returned by the
931	   source's RPCSEC_GSS3_CREATE response.

933	   Note that the use of the "copy_confirm_auth" privilege accomplishes
934	   the following:

936	   o  if a protocol like NFS is being used, with export policies, export
937	      policies can be overridden in case the destination server as-an-
938	      NFS-client is not authorized

940	   o  manual configuration to allow a copy relationship between the
941	      source and destination is not needed.

943	   If the attempt to establish a "copy_confirm_auth" privilege fails,
944	   then when the user principal sends a COPY request to destination, the
945	   destination server will reject it with NFS4ERR_PARTNER_NO_AUTH.

947	2.4.1.2.4.  Securing Non ONC RPC Server-to-Server Copy Protocols

949	   If the destination won't be using ONC RPC to copy the data, then the
950	   source and destination are using an unspecified copy protocol.  The
951	   destination could use the shared secret and the NFSv4 user id to
952	   prove to the source server that the user principal has authorized the
953	   copy.

955	   For protocols that authenticate user names with passwords (e.g., HTTP
956	   [14] and FTP [15]), the nfsv4 user id could be used as the user name,
957	   and an ASCII hexadecimal representation of the RPCSEC_GSSv3 shared
958	   secret could be used as the user password or as input into non-
959	   password authentication methods like CHAP [16].

961	2.4.1.3.  Inter-Server Copy via ONC RPC but without RPCSEC_GSSv3

963	   ONC RPC security flavors other than RPCSEC_GSSv3 MAY be used with the
964	   server-side copy offload operations described in this document.  In
965	   particular, host-based ONC RPC security flavors such as AUTH_NONE and
966	   AUTH_SYS MAY be used.  If a host-based security flavor is used, a
967	   minimal level of protection for the server-to-server copy protocol is
968	   possible.

970	   In the absence of strong security mechanisms such as RPCSEC_GSSv3,
971	   the challenge is how the source server and destination server
972	   identify themselves to each other, especially in the presence of
973	   multi-homed source and destination servers.  In a multi-homed
974	   environment, the destination server might not contact the source
975	   server from the same network address specified by the client in the
976	   COPY_NOTIFY.  This can be overcome using the procedure described
977	   below.

979	   When the client sends the source server the COPY_NOTIFY operation,
980	   the source server may reply to the client with a list of target
981	   addresses, names, and/or URLs and assign them to the unique triple:
982	   <source fh, user ID, destination address Y>.  If the destination uses
983	   one of these target netlocs to contact the source server, the source
984	   server will be able to uniquely identify the destination server, even
985	   if the destination server does not connect from the address specified
986	   by the client in COPY_NOTIFY.

988	   For example, suppose the network topology is as shown in Figure 3.
989	   If the source filehandle is 0x12345, the source server may respond to
990	   a COPY_NOTIFY for destination 10.11.78.56 with the URLs:

992	      nfs://10.11.78.18//_COPY/10.11.78.56/_FH/0x12345

994	      nfs://192.168.33.18//_COPY/10.11.78.56/_FH/0x12345

996	   The client will then send these URLs to the destination server in the
997	   COPY operation.  Suppose that the 192.168.33.0/24 network is a high
998	   speed network and the destination server decides to transfer the file
999	   over this network.  If the destination contacts the source server
1000	   from 192.168.33.56 over this network using NFSv4.1, it does the
1001	   following:

1003	   COMPOUND  { PUTROOTFH, LOOKUP "_COPY" ; LOOKUP "10.11.78.56"; LOOKUP
1004	      "_FH" ; OPEN "0x12345" ; GETFH }

1006	   The source server will therefore know that these NFSv4.1 operations
1007	   are being issued by the destination server identified in the
1008	   COPY_NOTIFY.

1010	2.4.1.4.  Inter-Server Copy without ONC RPC and RPCSEC_GSSv3

1012	   The same techniques as Section 2.4.1.3, using unique URLs for each
1013	   destination server, can be used for other protocols (e.g., HTTP [14]
1014	   and FTP [15]) as well.

1016	3.  Sparse Files

1018	3.1.  Introduction

1020	   A sparse file is a common way of representing a large file without
1021	   having to utilize all of the disk space for it.  Consequently, a
1022	   sparse file uses less physical space than its size indicates.  This
1023	   means the file contains 'holes', byte ranges within the file that
1024	   contain no data.  Most modern file systems support sparse files,
1025	   including most UNIX file systems and NTFS, but notably not Apple's
1026	   HFS+.  Common examples of sparse files include Virtual Machine (VM)
1027	   OS/disk images, database files, log files, and even checkpoint
1028	   recovery files most commonly used by the HPC community.

1030	   If an application reads a hole in a sparse file, the file system must
1031	   return all zeros to the application.  For local data access there is
1032	   little penalty, but with NFS these zeroes must be transferred back to
1033	   the client.  If an application uses the NFS client to read data into
1034	   memory, this wastes time and bandwidth as the application waits for
1035	   the zeroes to be transferred.

1037	   A sparse file is typically created by initializing the file to be all
1038	   zeros - nothing is written to the data in the file, instead the hole
1039	   is recorded in the metadata for the file.  So a 8G disk image might
1040	   be represented initially by a couple hundred bits in the inode and
1041	   nothing on the disk.  If the VM then writes 100M to a file in the
1042	   middle of the image, there would now be two holes represented in the
1043	   metadata and 100M in the data.

1045	   This section introduces a new operation READ_PLUS which supports all
1046	   the features of READ but includes an extension to support sparse
1047	   pattern files.  READ_PLUS is guaranteed to perform no worse than
1048	   READ, and can dramatically improve performance with sparse files.
1049	   READ_PLUS does not depend on pNFS protocol features, but can be used
1050	   by pNFS to support sparse files.

1052	3.2.  Terminology

1054	   Regular file:  An object of file type NF4REG or NF4NAMEDATTR.

1056	   Sparse file:  A Regular file that contains one or more Holes.

1058	   Hole:  A byte range within a Sparse file that contains regions of all
1059	      zeroes.  For block-based file systems, this could also be an
1060	      unallocated region of the file.

1062	   Hole Threshold:  The minimum length of a Hole as determined by the
1063	      server.  If a server chooses to define a Hole Threshold, then it
1064	      would not return hole information (nfs_readplusreshole) with a
1065	      hole_offset and hole_length that specify a range shorter than the
1066	      Hole Threshold.

1068	3.3.  Overview of Sparse Files and NFSv4

1070	   This section provides sparse file support to the largest number of
1071	   NFS client and server implementations, and as such proposes to add a
1072	   new return code to the READ_PLUS operation instead of proposing
1073	   additions or extensions of new or existing optional features (such as
1074	   pNFS).

1076	3.4.  Operation 65: READ_PLUS

1078	   The section introduces a new read operation, named READ_PLUS, which
1079	   allows NFS clients to avoid reading holes in a sparse file.
1080	   READ_PLUS is guaranteed to perform no worse than READ, and can
1081	   dramatically improve performance with sparse files.

1083	   READ_PLUS supports all the features of the existing NFSv4.1 READ
1084	   operation [2] and adds a simple yet significant extension to the
1085	   format of its response.  The change allows the client to avoid
1086	   returning all zeroes from a file hole, wasting computational and
1087	   network resources and reducing performance.  READ_PLUS uses a new
1088	   result structure that tells the client that the result is all zeroes
1089	   AND the byte-range of the hole in which the request was made.
1090	   Returning the hole's byte-range, and only upon request, avoids
1091	   transferring large Data Region Maps that may be soon invalidated and
1092	   contain information about a file that may not even be read in its
1093	   entirely.

1095	   A new read operation is required due to NFSv4.1 minor versioning
1096	   rules that do not allow modification of existing operation's
1097	   arguments or results.  READ_PLUS is designed in such a way to allow
1098	   future extensions to the result structure.  The same approach could
1099	   be taken to extend the argument structure, but a good use case is
1100	   first required to make such a change.

1102	3.4.1.  ARGUMENT

1104	   struct READ_PLUS4args {
1105	           /* CURRENT_FH: file */
1106	           stateid4        rpa_stateid;
1107	           offset4         rpa_offset;
1108	           count4          rpa_count;
1109	   };

1111	3.4.2.  RESULT

1113	   union read_plus_content switch (data_content4 content) {
1114	   case NFS4_CONTENT_DATA:
1115	           opaque          rpc_data<>;
1116	   case NFS4_CONTENT_APP_BLOCK:
1117	           app_data_block4 rpc_block;
1118	   case NFS4_CONTENT_HOLE:
1119	           data_info4      rpc_hole;
1120	   default:
1121	           void;
1122	   };

1124	   /*
1125	    * Allow a return of an array of contents.
1126	    */
1127	   struct read_plus_res4 {
1128	           bool                    rpr_eof;
1129	           read_plus_content       rpr_contents<>;
1130	   };

1132	   union READ_PLUS4res switch (nfsstat4 status) {
1133	   case NFS4_OK:
1134	           read_plus_res4  resok4;
1135	   default:
1136	           void;
1137	   };

1139	3.4.3.  DESCRIPTION

1141	   The READ_PLUS operation is based upon the NFSv4.1 READ operation [2],
1142	   and similarly reads data from the regular file identified by the
1143	   current filehandle.

1145	   The client provides an offset of where the READ_PLUS is to start and
1146	   a count of how many bytes are to be read.  An offset of zero means to
1147	   read data starting at the beginning of the file.  If offset is
1148	   greater than or equal to the size of the file, the status NFS4_OK is
1149	   returned with nfs_readplusrestype4 set to READ_OK, data length set to
1150	   zero, and eof set to TRUE.  The READ_PLUS is subject to access
1151	   permissions checking.

1153	   If the client specifies a count value of zero, the READ_PLUS succeeds
1154	   and returns zero bytes of data, again subject to access permissions
1155	   checking.  In all situations, the server may choose to return fewer
1156	   bytes than specified by the client.  The client needs to check for
1157	   this condition and handle the condition appropriately.

1159	   If the client specifies an offset and count value that is entirely
1160	   contained within a hole of the file, the status NFS4_OK is returned
1161	   with nfs_readplusresok4 set to READ_HOLE, and if information is
1162	   available regarding the hole, a nfs_readplusreshole structure
1163	   containing the offset and range of the entire hole.  The
1164	   nfs_readplusreshole structure is considered valid until the file is
1165	   changed (detected via the change attribute).  The server MUST provide
1166	   the same semantics for nfs_readplusreshole as if the client read the
1167	   region and received zeroes; the implied holes contents lifetime MUST
1168	   be exactly the same as any other read data.

1170	   If the client specifies an offset and count value that begins in a
1171	   non-hole of the file but extends into hole the server should return a
1172	   short read with status NFS4_OK, nfs_readplusresok4 set to READ_OK,
1173	   and data length set to the number of bytes returned.  The client will
1174	   then issue another READ_PLUS for the remaining bytes, which the
1175	   server will respond with information about the hole in the file.

1177	   If the server knows that the requested byte range is into a hole of
1178	   the file, but has no further information regarding the hole, it
1179	   returns a nfs_readplusreshole structure with holeres4 set to
1180	   HOLE_NOINFO.

1182	   If hole information is available and can be returned to the client,
1183	   the server returns a nfs_readplusreshole structure with the value of
1184	   holeres4 to HOLE_INFO.  The values of hole_offset and hole_length
1185	   define the byte-range for the current hole in the file.  These values
1186	   represent the information known to the server and may describe a
1187	   byte-range smaller than the true size of the hole.

1189	   Except when special stateids are used, the stateid value for a
1190	   READ_PLUS request represents a value returned from a previous byte-
1191	   range lock or share reservation request or the stateid associated
1192	   with a delegation.  The stateid identifies the associated owners if
1193	   any and is used by the server to verify that the associated locks are
1194	   still valid (e.g., have not been revoked).

1196	   If the read ended at the end-of-file (formally, in a correctly formed
1197	   READ_PLUS operation, if offset + count is equal to the size of the
1198	   file), or the READ_PLUS operation extends beyond the size of the file
1199	   (if offset + count is greater than the size of the file), eof is
1200	   returned as TRUE; otherwise, it is FALSE.  A successful READ_PLUS of
1201	   an empty file will always return eof as TRUE.

1203	   If the current filehandle is not an ordinary file, an error will be
1204	   returned to the client.  In the case that the current filehandle
1205	   represents an object of type NF4DIR, NFS4ERR_ISDIR is returned.  If
1206	   the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
1207	   returned.  In all other cases, NFS4ERR_WRONG_TYPE is returned.

1209	   For a READ_PLUS with a stateid value of all bits equal to zero, the
1210	   server MAY allow the READ_PLUS to be serviced subject to mandatory
1211	   byte-range locks or the current share deny modes for the file.  For a
1212	   READ_PLUS with a stateid value of all bits equal to one, the server
1213	   MAY allow READ_PLUS operations to bypass locking checks at the
1214	   server.

1216	   On success, the current filehandle retains its value.

1218	3.4.4.  IMPLEMENTATION

1220	   If the server returns a "short read" (i.e., fewer data than requested
1221	   and eof is set to FALSE), the client should send another READ_PLUS to
1222	   get the remaining data.  A server may return less data than requested
1223	   under several circumstances.  The file may have been truncated by
1224	   another client or perhaps on the server itself, changing the file
1225	   size from what the requesting client believes to be the case.  This
1226	   would reduce the actual amount of data available to the client.  It
1227	   is possible that the server reduce the transfer size and so return a
1228	   short read result.  Server resource exhaustion may also occur in a
1229	   short read.

1231	   If mandatory byte-range locking is in effect for the file, and if the
1232	   byte-range corresponding to the data to be read from the file is
1233	   WRITE_LT locked by an owner not associated with the stateid, the
1234	   server will return the NFS4ERR_LOCKED error.  The client should try
1235	   to get the appropriate READ_LT via the LOCK operation before re-
1236	   attempting the READ_PLUS.  When the READ_PLUS completes, the client
1237	   should release the byte-range lock via LOCKU.  In addition, the
1238	   server MUST return a nfs_readplusreshole structure with values of
1239	   hole_offset and hole_length that are within the owner's locked byte
1240	   range.

1242	   If another client has an OPEN_DELEGATE_WRITE delegation for the file
1243	   being read, the delegation must be recalled, and the operation cannot
1244	   proceed until that delegation is returned or revoked.  Except where
1245	   this happens very quickly, one or more NFS4ERR_DELAY errors will be
1246	   returned to requests made while the delegation remains outstanding.
1247	   Normally, delegations will not be recalled as a result of a READ_PLUS
1248	   operation since the recall will occur as a result of an earlier OPEN.
1249	   However, since it is possible for a READ_PLUS to be done with a
1250	   special stateid, the server needs to check for this case even though
1251	   the client should have done an OPEN previously.

1253	3.4.4.1.  Additional pNFS Implementation Information

1255	   With pNFS, the semantics of using READ_PLUS remains the same.  Any
1256	   data server MAY return a READ_HOLE result for a READ_PLUS request
1257	   that it receives.

1259	   When a data server chooses to return a READ_HOLE result, it has the
1260	   option of returning hole information for the data stored on that data
1261	   server (as defined by the data layout), but it MUST not return a
1262	   nfs_readplusreshole structure with a byte range that includes data
1263	   managed by another data server.

1265	   1.  Data servers that cannot determine hole information SHOULD return
1266	       HOLE_NOINFO.

1268	   2.  Data servers that can obtain hole information for the parts of
1269	       the file stored on that data server, the data server SHOULD
1270	       return HOLE_INFO and the byte range of the hole stored on that
1271	       data server.

1273	   A data server should do its best to return as much information about
1274	   a hole as is feasible without having to contact the metadata server.
1275	   If communication with the metadata server is required, then every
1276	   attempt should be taken to minimize the number of requests.

1278	   If mandatory locking is enforced, then the data server must also
1279	   ensure that to return only information for a Hole that is within the
1280	   owner's locked byte range.

1282	3.4.5.  READ_PLUS with Sparse Files Example

1284	   To see how the return value READ_HOLE will work, the following table
1285	   describes a sparse file.  For each byte range, the file contains
1286	   either non-zero data or a hole.  In addition, the server in this
1287	   example uses a hole threshold of 32K.

1289	                        +-------------+----------+
1290	                        | Byte-Range  | Contents |
1291	                        +-------------+----------+
1292	                        | 0-15999     | Hole     |
1293	                        | 16K-31999   | Non-Zero |
1294	                        | 32K-255999  | Hole     |
1295	                        | 256K-287999 | Non-Zero |
1296	                        | 288K-353999 | Hole     |
1297	                        | 354K-417999 | Non-Zero |
1298	                        +-------------+----------+

1300	                                  Table 1

1302	   Under the given circumstances, if a client was to read the file from
1303	   beginning to end with a max read size of 64K, the following will be
1304	   the result.  This assumes the client has already opened the file and
1305	   acquired a valid stateid and just needs to issue READ_PLUS requests.

1307	   1.  READ_PLUS(s, 0, 64K) --> NFS_OK, readplusrestype4 = READ_OK, eof
1308	       = false, data<>[32K].  Return a short read, as the last half of
1309	       the request was all zeroes.  Note that the first hole is read
1310	       back as all zeros as it is below the hole threshhold.

1312	   2.  READ_PLUS(s, 32K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE,
1313	       nfs_readplusreshole(HOLE_INFO)(32K, 224K).  The requested range
1314	       was all zeros, and the current hole begins at offset 32K and is
1315	       224K in length.

1317	   3.  READ_PLUS(s, 256K, 64K) --> NFS_OK, readplusrestype4 = READ_OK,
1318	       eof = false, data<>[32K].  Return a short read, as the last half
1319	       of the request was all zeroes.

1321	   4.  READ_PLUS(s, 288K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE,
1322	       nfs_readplusreshole(HOLE_INFO)(288K, 66K).

1324	   5.  READ_PLUS(s, 354K, 64K) --> NFS_OK, readplusrestype4 = READ_OK,
1325	       eof = true, data<>[64K].

1327	3.5.  Related Work

1329	   Solaris and ZFS support an extension to lseek(2) that allows
1330	   applications to discover holes in a file.  The values, SEEK_HOLE and
1331	   SEEK_DATA, allow clients to seek to the next hole or beginning of
1332	   data, respectively.

1334	   XFS supports the XFS_IOC_GETBMAP extended attribute, which returns
1335	   the Data Region Map for a file.  Clients can then use this
1336	   information to avoid reading holes in a file.

1338	   NTFS and CIFS support the FSCTL_SET_SPARSE attribute, which allows
1339	   applications to control whether empty regions of the file are
1340	   preallocated and filled in with zeros or simply left unallocated.

1342	3.6.  Other Proposed Designs

1344	3.6.1.  Multi-Data Server Hole Information

1346	   The current design prohibits pnfs data servers from returning hole
1347	   information for regions of a file that are not stored on that data
1348	   server.  Having data servers return information regarding other data
1349	   servers changes the fundamental principal that all metadata
1350	   information comes from the metadata server.

1352	   Here is a brief description if we did choose to support multi-data
1353	   server hole information:

1355	   For a data server that can obtain hole information for the entire
1356	   file without severe performance impact, it MAY return HOLE_INFO and
1357	   the byte range of the entire file hole.  When a pNFS client receives
1358	   a READ_HOLE result and a non-empty nfs_readplusreshole structure, it
1359	   MAY use this information in conjunction with a valid layout for the
1360	   file to determine the next data server for the next region of data
1361	   that is not in a hole.

1363	3.6.2.  Data Result Array

1365	   If a single read request contains one or more Holes with a length
1366	   greater than the Sparse Threshold, the current design would return
1367	   results indicating a short read to the client.  A client would then
1368	   send a series of read requests to the server to retrieve information
1369	   for the Holes and the remaining data.  To avoid turning a single read
1370	   request into several exchanges between the client and server, the
1371	   server may need to choose a relatively large Sparse Threshold in
1372	   order to decrease the number of short reads it creates.  A large
1373	   Sparse Threshold may miss many smaller holes, which in turn may
1374	   negate the benefits of sparse read support.

1376	   To avoid this situation, one option is to have the READ_PLUS
1377	   operation return information for multiple holes in a single return
1378	   value.  This would allow several small holes to be described in a
1379	   single read response without requiring multliple exchanges between
1380	   the client and server.

1382	   One important item to consider with returning an array of data chunks
1383	   is its impact on RDMA, which may use different block sizes on the
1384	   client and server (among other things).

1386	3.6.3.  User-Defined Sparse Mask

1388	   Add mask (instead of just zeroes).  Specified by server or client?

1390	3.6.4.  Allocated flag

1392	   A Hole on the server may be an allocated byte-range consisting of all
1393	   zeroes or may not be allocated at all.  To ensure this information is
1394	   properly communicated to the client, it may be beneficial to add a
1395	   'alloc' flag to the HOLE_INFO section of nfs_readplusreshole.  This
1396	   would allow an NFS client to copy a file from one file system to
1397	   another and have it more closely resemble the original.

1399	3.6.5.  Dense and Sparse pNFS File Layouts

1401	   The hole information returned form a data server must be understood
1402	   by pNFS clients using both Dense or Sparse file layout types.  Does
1403	   the current READ_PLUS return value work for both layout types?  Does
1404	   the data server know if it is using dense or sparse so that it can
1405	   return the correct hole_offset and hole_length values?

1407	4.  Space Reservation

1409	4.1.  Introduction

1411	   This section describes a set of operations that allow applications
1412	   such as hypervisors to reserve space for a file, report the amount of
1413	   actual disk space a file occupies and freeup the backing space of a
1414	   file when it is not required.  In virtualized environments, virtual
1415	   disk files are often stored on NFS mounted volumes.  Since virtual
1416	   disk files represent the hard disks of virtual machines, hypervisors
1417	   often have to guarantee certain properties for the file.

1419	   One such example is space reservation.  When a hypervisor creates a
1420	   virtual disk file, it often tries to preallocate the space for the
1421	   file so that there are no future allocation related errors during the
1422	   operation of the virtual machine.  Such errors prevent a virtual
1423	   machine from continuing execution and result in downtime.

1425	   Currently, in order to achieve such a guarantee, applications zero
1426	   the entire file.  The initial zeroing allocates the backing blocks
1427	   and all subsequent writes are overwrites of already allocated blocks.
1428	   This approach is not only inefficient in terms of the amount of I/O
1429	   done, it is also not guaranteed to work on filesystems that are log
1430	   structured or deduplicated.  An efficient way of guaranteeing space
1431	   reservation would be beneficial to such applications.

1433	   If the space_reserved attribute is set on a file, it is guaranteed
1434	   that writes that do not grow the file will not fail with
1435	   NFSERR_NOSPC.

1437	   Another useful feature would be the ability to report the number of
1438	   blocks that would be freed when a file is deleted.  Currently, NFS
1439	   reports two size attributes:

1441	   size  The logical file size of the file.

1443	   space_used  The size in bytes that the file occupies on disk

1445	   While these attributes are sufficient for space accounting in
1446	   traditional filesystems, they prove to be inadequate in modern
1447	   filesystems that support block sharing.  In such filesystems,
1448	   multiple inodes can point to a single block with a block reference
1449	   count to guard against premature freeing.  Having a way to tell the
1450	   number of blocks that would be freed if the file was deleted would be
1451	   useful to applications that wish to migrate files when a volume is
1452	   low on space.

1454	   Since virtual disks represent a hard drive in a virtual machine, a
1455	   virtual disk can be viewed as a filesystem within a file.  Since not
1456	   all blocks within a filesystem are in use, there is an opportunity to
1457	   reclaim blocks that are no longer in use.  A call to deallocate
1458	   blocks could result in better space efficiency.  Lesser space MAY be
1459	   consumed for backups after block deallocation.

1461	   The following operations and attributes can be used to resolve this
1462	   issues:

1464	   space_reserved  This attribute specifies whether the blocks backing
1465	      the file have been preallocated.

1467	   space_freed  This attribute specifies the space freed when a file is
1468	      deleted, taking block sharing into consideration.

1470	   INITIALIZED  This operation zeroes and/or deallocates the blocks
1471	      backing a region of the file.

1473	   If space_used of a file is interpreted to mean the size in bytes of
1474	   all disk blocks pointed to by the inode of the file, then shared
1475	   blocks get double counted, over-reporting the space utilization.
1476	   This also has the adverse effect that the deletion of a file with
1477	   shared blocks frees up less than space_used bytes.

1479	   On the other hand, if space_used is interpreted to mean the size in
1480	   bytes of those disk blocks unique to the inode of the file, then
1481	   shared blocks are not counted in any file, resulting in under-
1482	   reporting of the space utilization.

1484	   For example, two files A and B have 10 blocks each.  Let 6 of these
1485	   blocks be shared between them.  Thus, the combined space utilized by
1486	   the two files is 14 * BLOCK_SIZE bytes.  In the former case, the
1487	   combined space utilization of the two files would be reported as 20 *
1488	   BLOCK_SIZE.  However, deleting either would only result in 4 *
1489	   BLOCK_SIZE being freed.  Conversely, the latter interpretation would
1490	   report that the space utilization is only 8 * BLOCK_SIZE.

1492	   Adding another size attribute, space_freed, is helpful in solving
1493	   this problem. space_freed is the number of blocks that are allocated
1494	   to the given file that would be freed on its deletion.  In the
1495	   example, both A and B would report space_freed as 4 * BLOCK_SIZE and
1496	   space_used as 10 * BLOCK_SIZE.  If A is deleted, B will report
1497	   space_freed as 10 * BLOCK_SIZE as the deletion of B would result in
1498	   the deallocation of all 10 blocks.

1500	   The addition of this problem doesn't solve the problem of space being
1501	   over-reported.  However, over-reporting is better than under-
1502	   reporting.

1504	4.2.  Operations and attributes

1506	   In the sections that follow, one operation and three attributes are
1507	   defined that together provide the space management facilities
1508	   outlined earlier in the document.  The operation is intended to be
1509	   OPTIONAL and the attributes RECOMMENDED as defined in section 17 of
1510	   [2].

1512	4.3.  Attribute 77: space_reserved

1514	   The space_reserve attribute is a read/write attribute of type
1515	   boolean.  It is a per file attribute.  When the space_reserved
1516	   attribute is set via SETATTR, the server must ensure that there is
1517	   disk space to accommodate every byte in the file before it can return
1518	   success.  If the server cannot guarantee this, it must return
1519	   NFS4ERR_NOSPC.

1521	   If the client tries to grow a file which has the space_reserved
1522	   attribute set, the server must guarantee that there is disk space to
1523	   accommodate every byte in the file with the new size before it can
1524	   return success.  If the server cannot guarantee this, it must return
1525	   NFS4ERR_NOSPC.

1527	   It is not required that the server allocate the space to the file
1528	   before returning success.  The allocation can be deferred, however,
1529	   it must be guaranteed that it will not fail for lack of space.

1531	   The value of space_reserved can be obtained at any time through
1532	   GETATTR.

1534	   In order to avoid ambiguity, the space_reserve bit cannot be set
1535	   along with the size bit in SETATTR.  Increasing the size of a file
1536	   with space_reserve set will fail if space reservation cannot be
1537	   guaranteed for the new size.  If the file size is decreased, space
1538	   reservation is only guaranteed for the new size and the extra blocks
1539	   backing the file can be released.

1541	4.4.  Attribute 78: space_freed

1543	   space_freed gives the number of bytes freed if the file is deleted.
1544	   This attribute is read only and is of type length4.  It is a per file
1545	   attribute.

1547	5.  Support for Application IO Hints

1549	5.1.  Introduction

1551	   Applications currently have several options for communicating I/O
1552	   access patterns to the NFS client.  While this can help the NFS
1553	   client optimize I/O and caching for a file, it does not allow the NFS
1554	   server and its exported file system to do likewise.  Therefore, here
1555	   we put forth a proposal for the NFSv4.2 protocol to allow
1556	   applications to communicate their expected behavior to the server.

1558	   By communicating expected access pattern, e.g., sequential or random,
1559	   and data re-use behavior, e.g., data range will be read multiple
1560	   times and should be cached, the server will be able to better
1561	   understand what optimizations it should implement for access to a
1562	   file.  For example, if a application indicates it will never read the
1563	   data more than once, then the file system can avoid polluting the
1564	   data cache and not cache the data.

1566	   The first application that can issue client I/O hints is the
1567	   posix_fadvise operation.  For example, on Linux, when an application
1568	   uses posix_fadvise to specify a file will be read sequentially, Linux
1569	   doubles the readahead buffer size.

1571	   Another instance where applications provide an indication of their
1572	   desired I/O behavior is the use of direct I/O. By specifying direct
1573	   I/O, clients will no longer cache data, but this information is not
1574	   passed to the server, which will continue caching data.

1576	   Application specific NFS clients such as those used by hypervisors
1577	   and databases can also leverage application hints to communicate
1578	   their specialized requirements.

1580	   This section adds a new IO_ADVISE operation to communicate the client
1581	   file access patterns to the NFS server.  The NFS server upon
1582	   receiving a IO_ADVISE operation MAY choose to alter its I/O and
1583	   caching behavior, but is under no obligation to do so.

1585	5.2.  POSIX Requirements

1587	   The first key requirement of the IO_ADVISE operation is to support
1588	   the posix_fadvise function [6], which is supported in Linux and many
1589	   other operating systems.  Examples and guidance on how to use
1590	   posix_fadvise to improve performance can be found here [17].
1591	   posix_fadvise is defined as follows,

1593	      int posix_fadvise(int fd, off_t offset, off_t len, int advice);

1595	   The posix_fadvise() function shall advise the implementation on the
1596	   expected behavior of the application with respect to the data in the
1597	   file associated with the open file descriptor, fd, starting at offset
1598	   and continuing for len bytes.  The specified range need not currently
1599	   exist in the file.  If len is zero, all data following offset is
1600	   specified.  The implementation may use this information to optimize
1601	   handling of the specified data.  The posix_fadvise() function shall
1602	   have no effect on the semantics of other operations on the specified
1603	   data, although it may affect the performance of other operations.

1605	   The advice to be applied to the data is specified by the advice
1606	   parameter and may be one of the following values:

1608	   POSIX_FADV_NORMAL -  Specifies that the application has no advice to
1609	      give on its behavior with respect to the specified data.  It is
1610	      the default characteristic if no advice is given for an open file.

1612	   POSIX_FADV_SEQUENTIAL -  Specifies that the application expects to
1613	      access the specified data sequentially from lower offsets to
1614	      higher offsets.

1616	   POSIX_FADV_RANDOM -  Specifies that the application expects to access
1617	      the specified data in a random order.

1619	   POSIX_FADV_WILLNEED -  Specifies that the application expects to
1620	      access the specified data in the near future.

1622	   POSIX_FADV_DONTNEED -  Specifies that the application expects that it
1623	      will not access the specified data in the near future.

1625	   POSIX_FADV_NOREUSE -  Specifies that the application expects to
1626	      access the specified data once and then not reuse it thereafter.

1628	   Upon successful completion, posix_fadvise() shall return zero;
1629	   otherwise, an error number shall be returned to indicate the error.

1631	5.3.  Additional Requirements

1633	   Many use cases exist for sending application I/O hints to the server
1634	   that cannot utilize the POSIX supported interface.  This is because
1635	   some applications may benefit from additional hints not specified by
1636	   posix_fadvise, and some applications may not use POSIX altogether.

1638	   One use case is "Opportunistic Prefetch", which allows a stateid
1639	   holder to tell the server that it is possible that it will access the
1640	   specified data in the near future.  This is similar to
1641	   POSIX_FADV_WILLNEED, but the client is unsure it will in fact read
1642	   the specified data, so the server should only prefetch the data if it
1643	   can be done at a marginal cost.  For example, when a server receives
1644	   this hint, it could prefetch only the indirect blocks for a file
1645	   instead of all the data.  This would still improve performance if the
1646	   client does read the data, but with less pressure on server memory.

1648	   An example use case for this hint is a database that reads in a
1649	   single record that points to additional records in either other areas
1650	   of the same file or different files located on the same or different
1651	   server.  While it is likely that the application may access the
1652	   additional records, it is far from guaranteed.  Therefore, the
1653	   database may issue an opportunistic prefetch (instead of
1654	   POSIX_FADV_WILLNEED) for the data in the other files pointed to by
1655	   the record.

1657	   Another use case is "Direct I/O", which allows a stated holder to
1658	   inform the server that it does not wish to cache data.  Today, for
1659	   applications that only intend to read data once, the use of direct
1660	   I/O disables client caching, but does not affect server caching.  By
1661	   caching data that will not be re-read, the server is polluting its
1662	   cache and possibly causing useful cached data to be evicted.  By
1663	   informing the server of its expected I/O access, this situation can
1664	   be avoid.  Direct I/O can be used in Linux and AIX via the open()
1665	   O_DIRECT parameter, in Solaris via the directio() function, and in
1666	   Windows via the CreateFile() FILE_FLAG_NO_BUFFERING flag.

1668	   Another use case is "Backward Sequential Read", which allows a stated
1669	   holder to inform the server that it intends to read the specified
1670	   data backwards, i.e., back the end to the beginning.  This is
1671	   different than POSIX_FADV_SEQUENTIAL, whose implied intention was
1672	   that data will be read from beginning to end.  This hint allows
1673	   servers to prefetch data at the end of the range first, and then
1674	   prefetch data sequentially in a backwards manner to the start of the
1675	   data range.  One example of an application that can make use of this
1676	   hint is video editing.

1678	5.4.  Security Considerations

1680	   None.

1682	5.5.  IANA Considerations

1684	   The IO_ADVISE_type4 will be extended through an IANA registry.

1686	6.  Application Data Block Support

1688	   At the OS level, files are contained on disk blocks.  Applications
1689	   are also free to impose structure on the data contained in a file and
1690	   we can define an Application Data Block (ADB) to be such a structure.
1691	   From the application's viewpoint, it only wants to handle ADBs and
1692	   not raw bytes (see [18]).  An ADB is typically comprised of two
1693	   sections: a header and data.  The header describes the
1694	   characteristics of the block and can provide a means to detect
1695	   corruption in the data payload.  The data section is typically
1696	   initialized to all zeros.

1698	   The format of the header is application specific, but there are two
1699	   main components typically encountered:

1701	   1.  An ADB Number (ADBN), which allows the application to determine
1702	       which data block is being referenced.  The ADBN is a logical
1703	       block number and is useful when the client is not storing the
1704	       blocks in contiguous memory.

1706	   2.  Fields to describe the state of the ADB and a means to detect
1707	       block corruption.  For both pieces of data, a useful property is
1708	       that allowed values be unique in that if passed across the
1709	       network, corruption due to translation between big and little
1710	       endian architectures are detectable.  For example, 0xF0DEDEF0 has
1711	       the same bit pattern in both architectures.

1713	   Applications already impose structures on files [18] and detect
1714	   corruption in data blocks [19].  What they are not able to do is
1715	   efficiently transfer and store ADBs.  To initialize a file with ADBs,
1716	   the client must send the full ADB to the server and that must be
1717	   stored on the server.  When the application is initializing a file to
1718	   have the ADB structure, it could compress the ADBs to just the
1719	   information to necessary to later reconstruct the header portion of
1720	   the ADB when the contents are read back.  Using sparse file
1721	   techniques, the disk blocks described by would not be allocated.
1722	   Unlike sparse file techniques, there would be a small cost to store
1723	   the compressed header data.

1725	   In this section, we are going to define a generic framework for an
1726	   ADB, present one approach to detecting corruption in a given ADB
1727	   implementation, and describe the model for how the client and server
1728	   can support efficient initialization of ADBs, reading of ADB holes,
1729	   punching holes in ADBs, and space reservation.  Further, we need to
1730	   be able to extend this model to applications which do not support
1731	   ADBs, but wish to be able to handle sparse files, hole punching, and
1732	   space reservation.

1734	6.1.  Generic Framework

1736	   We want the representation of the ADB to be flexible enough to
1737	   support many different applications.  The most basic approach is no
1738	   imposition of a block at all, which means we are working with the raw
1739	   bytes.  Such an approach would be useful for storing holes, punching
1740	   holes, etc.  In more complex deployments, a server might be
1741	   supporting multiple applications, each with their own definition of
1742	   the ADB.  One might store the ADBN at the start of the block and then
1743	   have a guard pattern to detect corruption [20].  The next might store
1744	   the ADBN at an offset of 100 bytes within the block and have no guard
1745	   pattern at all.  The point is that existing applications might
1746	   already have well defined formats for their data blocks.

1748	   The guard pattern can be used to represent the state of the block, to
1749	   protect against corruption, or both.  Again, it needs to be able to
1750	   be placed anywhere within the ADB.

1752	   We need to be able to represent the starting offset of the block and
1753	   the size of the block.  Note that nothing prevents the application
1754	   from defining different sized blocks in a file.

1756	6.1.1.  Data Block Representation

1758	   struct app_data_block4 {
1759	           offset4         adb_offset;
1760	           length4         adb_block_size;
1761	           length4         adb_block_count;
1762	           length4         adb_reloff_blocknum;
1763	           count4          adb_block_num;
1764	           length4         adb_reloff_pattern;
1765	           opaque          adb_pattern<>;
1766	   };

1768	   The app_data_block4 structure captures the abstraction presented for
1769	   the ADB.  The additional fields present are to allow the transmission
1770	   of adb_block_count ADBs at one time.  We also use adb_block_num to
1771	   convey the ADBN of the first block in the sequence.  Each ADB will
1772	   contain the same adb_pattern string.

1774	   As both adb_block_num and adb_pattern are optional, if either
1775	   adb_reloff_pattern or adb_reloff_blocknum is set to NFS4_UINT64_MAX,
1776	   then the corresponding field is not set in any of the ADB.

1778	6.1.2.  Data Content

1780	   /*
1781	    * Use an enum such that we can extend new types.
1782	    */
1783	   enum data_content4 {
1784	           NFS4_CONTENT_DATA = 0,
1785	           NFS4_CONTENT_APP_BLOCK = 1,
1786	           NFS4_CONTENT_HOLE = 2
1787	   };

1789	   New operations might need to differentiate between wanting to access
1790	   data versus an ADB.  Also, future minor versions might want to
1791	   introduce new data formats.  This enumeration allows that to occur.

1793	6.2.  pNFS Considerations

1795	   While this document does not mandate how sparse ADBs are recorded on
1796	   the server, it does make the assumption that such information is not
1797	   in the file.  I.e., the information is metadata.  As such, the
1798	   INITIALIZE operation is defined to be not supported by the DS - it
1799	   must be issued to the MDS.  But since the client must not assume a
1800	   priori whether a read is sparse or not, the READ_PLUS operation MUST
1801	   be supported by both the DS and the MDS.  I.e., the client might
1802	   impose on the MDS to asynchronously read the data from the DS.

1804	   Furthermore, each DS MUST not report to a client either a sparse ADB
1805	   or data which belongs to another DS.  One implication of this
1806	   requirement is that the app_data_block4's adb_block_size MUST be
1807	   either be the stripe width or the stripe width must be an even
1808	   multiple of it.

1810	   The second implication here is that the DS must be able to use the
1811	   Control Protocol to determine from the MDS where the sparse ADBs
1812	   occur.  [[Comment.4: Need to discuss what happens if after the file
1813	   is being written to and an INITIALIZE occurs? --TH]] Perhaps instead
1814	   of the DS pulling from the MDS, the MDS pushes to the DS?  Thus an
1815	   INITIALIZE causes a new push?  [[Comment.5: Still need to consider
1816	   race cases of the DS getting a WRITE and the MDS getting an
1817	   INITIALIZE. --TH]]

1819	6.3.  An Example of Detecting Corruption

1821	   In this section, we define an ADB format in which corruption can be
1822	   detected.  Note that this is just one possible format and means to
1823	   detect corruption.

1825	   Consider a very basic implementation of an operating system's disk
1826	   blocks.  A block is either data or it is an indirect block which
1827	   allows for files to be larger than one block.  It is desired to be
1828	   able to initialize a block.  Lastly, to quickly unlink a file, a
1829	   block can be marked invalid.  The contents remain intact - which
1830	   would enable this OS application to undelete a file.

1832	   The application defines 4k sized data blocks, with an 8 byte block
1833	   counter occurring at offset 0 in the block, and with the guard
1834	   pattern occurring at offset 8 inside the block.  Furthermore, the
1835	   guard pattern can take one of four states:

1837	   0xfeedface -   This is the FREE state and indicates that the ADB
1838	      format has been applied.

1840	   0xcafedead -   This is the DATA state and indicates that real data
1841	      has been written to this block.

1843	   0xe4e5c001 -   This is the INDIRECT state and indicates that the
1844	      block contains block counter numbers that are chained off of this
1845	      block.

1847	   0xba1ed4a3 -   This is the INVALID state and indicates that the block
1848	      contains data whose contents are garbage.

1850	   Finally, it also defines an 8 byte checksum [21] starting at byte 16
1851	   which applies to the remaining contents of the block.  If the state
1852	   is FREE, then that checksum is trivially zero.  As such, the
1853	   application has no need to transfer the checksum implicitly inside
1854	   the ADB - it need not make the transfer layer aware of the fact that
1855	   there is a checksum (see [19] for an example of checksums used to
1856	   detect corruption in application data blocks).

1858	   Corruption in each ADB can be detected thusly:

1860	   o  If the guard pattern is anything other than one of the allowed
1861	      values, including all zeros.

1863	   o  If the guard pattern is FREE and any other byte in the remainder
1864	      of the ADB is anything other than zero.

1866	   o  If the guard pattern is anything other than FREE, then if the
1867	      stored checksum does not match the computed checksum.

1869	   o  If the guard pattern is INDIRECT and one of the stored indirect
1870	      block numbers has a value greater than the number of ADBs in the
1871	      file.

1873	   o  If the guard pattern is INDIRECT and one of the stored indirect
1874	      block numbers is a duplicate of another stored indirect block
1875	      number.

1877	   As can be seen, the application can detect errors based on the
1878	   combination of the guard pattern state and the checksum.  But also,
1879	   the application can detect corruption based on the state and the
1880	   contents of the ADB.  This last point is important in validating the
1881	   minimum amount of data we incorporated into our generic framework.
1882	   I.e., the guard pattern is sufficient in allowing applications to
1883	   design their own corruption detection.

1885	   Finally, it is important to note that none of these corruption checks
1886	   occur in the transport layer.  The server and client components are
1887	   totally unaware of the file format and might report everything as
1888	   being transferred correctly even in the case the application detects
1889	   corruption.

1891	6.4.  Example of READ_PLUS

1893	   The hypothetical application presented in Section 6.3 can be used to
1894	   illustrate how READ_PLUS would return an array of results.  A file is
1895	   created and initialized with 100 4k ADBs in the FREE state:

1897	      INITIALIZE {0, 4k, 100, 0, 0, 8, 0xfeedface}

1899	   Further, assume the application writes a single ADB at 16k, changing
1900	   the guard pattern to 0xcafedead, we would then have in memory:

1902	      0 -> (16k - 1)   : 4k, 4, 0, 0, 8, 0xfeedface
1903	      16k -> (20k - 1) : 00 00 00 05 ca fe de ad XX XX ... XX XX
1904	      20k -> 400k      : 4k, 95, 0, 6, 0xfeedface

1906	   And when the client did a READ_PLUS of 64k at the start of the file,
1907	   it would get back a result of an ADB, some data, and a final ADB:

1909	      ADB {0, 4, 0, 0, 8, 0xfeedface}
1910	      data 4k
1911	      ADB {20k, 4k, 59, 0, 6, 0xfeedface}

1913	6.5.  Zero Filled Holes

1915	   As applications are free to define the structure of an ADB, it is
1916	   trivial to define an ADB which supports zero filled holes.  Such a
1917	   case would encompass the traditional definitions of a sparse file and
1918	   hole punching.  For example, to punch a 64k hole, starting at 100M,
1919	   into an existing file which has no ADB structure:

1921	      INITIALIZE {100M, 64k, 1, NFS4_UINT64_MAX,
1922	                  0, NFS4_UINT64_MAX, 0x0}

1924	7.  Labeled NFS

1926	7.1.  Introduction

1928	   Access control models such as Unix permissions or Access Control
1929	   Lists are commonly referred to as Discretionary Access Control (DAC)
1930	   models.  These systems base their access decisions on user identity
1931	   and resource ownership.  In contrast Mandatory Access Control (MAC)
1932	   models base their access control decisions on the label on the
1933	   subject (usually a process) and the object it wishes to access.
1934	   These labels may contain user identity information but usually
1935	   contain additional information.  In DAC systems users are free to
1936	   specify the access rules for resources that they own.  MAC models
1937	   base their security decisions on a system wide policy established by
1938	   an administrator or organization which the users do not have the
1939	   ability to override.  In this section, we add a MAC model to NFSv4.

1941	   The first change necessary is to devise a method for transporting and
1942	   storing security label data on NFSv4 file objects.  Security labels
1943	   have several semantics that are met by NFSv4 recommended attributes
1944	   such as the ability to set the label value upon object creation.
1945	   Access control on these attributes are done through a combination of
1946	   two mechanisms.  As with other recommended attributes on file objects
1947	   the usual DAC checks (ACLs and permission bits) will be performed to
1948	   ensure that proper file ownership is enforced.  In addition a MAC
1949	   system MAY be employed on the client, server, or both to enforce
1950	   additional policy on what subjects may modify security label
1951	   information.

1953	   The second change is to provide a method for the server to notify the
1954	   client that the attribute changed on an open file on the server.  If
1955	   the file is closed, then during the open attempt, the client will
1956	   gather the new attribute value.  The server MUST not communicate the
1957	   new value of the attribute, the client MUST query it.  This
1958	   requirement stems from the need for the client to provide sufficient
1959	   access rights to the attribute.

1961	   The final change necessary is a modification to the RPC layer used in
1962	   NFSv4 in the form of a new version of the RPCSEC_GSS [7] framework.
1963	   In order for an NFSv4 server to apply MAC checks it must obtain
1964	   additional information from the client.  Several methods were
1965	   explored for performing this and it was decided that the best
1966	   approach was to incorporate the ability to make security attribute
1967	   assertions through the RPC mechanism.  RPCSECGSSv3 [5] outlines a
1968	   method to assert additional security information such as security
1969	   labels on gss context creation and have that data bound to all RPC
1970	   requests that make use of that context.

1972	7.2.  Definitions

1974	   Label Format Specifier (LFS):  is an identifier used by the client to
1975	      establish the syntactic format of the security label and the
1976	      semantic meaning of its components.  These specifiers exist in a
1977	      registry associated with documents describing the format and
1978	      semantics of the label.

1980	   Label Format Registry:  is the IANA registry containing all
1981	      registered LFS along with references to the documents that
1982	      describe the syntactic format and semantics of the security label.

1984	   Policy Identifier (PI):  is an optional part of the definition of a
1985	      Label Format Specifier which allows for clients and server to
1986	      identify specific security policies.

1988	   Domain of Interpretation (DOI):  represents an administrative
1989	      security boundary, where all systems within the DOI have
1990	      semantically coherent labeling.  That is, a security attribute
1991	      must always mean exactly the same thing anywhere within the DOI.

1993	   Object:  is a passive resource within the system that we wish to be
1994	      protected.  Objects can be entities such as files, directories,
1995	      pipes, sockets, and many other system resources relevant to the
1996	      protection of the system state.

1998	   Subject:  A subject is an active entity usually a process which is
1999	      requesting access to an object.

2001	   Multi-Level Security (MLS):  is a traditional model where objects are
2002	      given a sensitivity level (Unclassified, Secret, Top Secret, etc)
2003	      and a category set [22].

2005	7.3.  MAC Security Attribute

2007	   MAC models base access decisions on security attributes bound to
2008	   subjects and objects.  This information can range from a user
2009	   identity for an identity based MAC model, sensitivity levels for
2010	   Multi-level security, or a type for Type Enforcement.  These models
2011	   base their decisions on different criteria but the semantics of the
2012	   security attribute remain the same.  The semantics required by the
2013	   security attributes are listed below:

2015	   o  Must provide flexibility with respect to MAC model.

2017	   o  Must provide the ability to atomically set security information
2018	      upon object creation

2020	   o  Must provide the ability to enforce access control decisions both
2021	      on the client and the server

2023	   o  Must not expose an object to either the client or server name
2024	      space before its security information has been bound to it.

2026	   NFSv4 implements the security attribute as a recommended attribute.
2027	   These attributes have a fixed format and semantics, which conflicts
2028	   with the flexible nature of the security attribute.  To resolve this
2029	   the security attribute consists of two components.  The first
2030	   component is a LFS as defined in [23] to allow for interoperability
2031	   between MAC mechanisms.  The second component is an opaque field
2032	   which is the actual security attribute data.  To allow for various
2033	   MAC models NFSv4 should be used solely as a transport mechanism for
2034	   the security attribute.  It is the responsibility of the endpoints to
2035	   consume the security attribute and make access decisions based on
2036	   their respective models.  In addition, creation of objects through
2037	   OPEN and CREATE allows for the security attribute to be specified
2038	   upon creation.  By providing an atomic create and set operation for
2039	   the security attribute it is possible to enforce the second and
2040	   fourth requirements.  The recommended attribute FATTR4_SEC_LABEL will
2041	   be used to satisfy this requirement.

2043	7.3.1.  Interpreting FATTR4_SEC_LABEL

2045	   The XDR [24] necessary to implement Labeled NFSv4 is presented below:

2047	   const FATTR4_SEC_LABEL   = 81;

2049	   typedef uint32_t  policy4;

2051	                                 Figure 6

2053	   struct labelformat_spec4 {
2054	           policy4 lfs_lfs;
2055	           policy4 lfs_pi;
2056	   };

2058	   struct sec_label_attr_info {
2059	           labelformat_spec4       slai_lfs;
2060	           opaque                  slai_data<>;
2061	   };

2063	   The FATTR4_SEC_LABEL contains an array of two components with the
2064	   first component being an LFS.  It serves to provide the receiving end
2065	   with the information necessary to translate the security attribute
2066	   into a form that is usable by the endpoint.  Label Formats assigned
2067	   an LFS may optionally choose to include a Policy Identifier field to
2068	   allow for complex policy deployments.  The LFS and Label Format
2069	   Registry are described in detail in [23].  The translation used to
2070	   interpret the security attribute is not specified as part of the
2071	   protocol as it may depend on various factors.  The second component
2072	   is an opaque section which contains the data of the attribute.  This
2073	   component is dependent on the MAC model to interpret and enforce.

2075	   In particular, it is the responsibility of the LFS specification to
2076	   define a maximum size for the opaque section, slai_data<>.  When
2077	   creating or modifying a label for an object, the client needs to be
2078	   guaranteed that the server will accept a label that is sized
2079	   correctly.  By both client and server being part of a specific MAC
2080	   model, the client will be aware of the size.

2082	7.3.2.  Delegations

2084	   In the event that a security attribute is changed on the server while
2085	   a client holds a delegation on the file, the client should follow the
2086	   existing protocol with respect to attribute changes.  It should flush
2087	   all changes back to the server and relinquish the delegation.

2089	7.3.3.  Permission Checking

2091	   It is not feasible to enumerate all possible MAC models and even
2092	   levels of protection within a subset of these models.  This means
2093	   that the NFSv4 client and servers cannot be expected to directly make
2094	   access control decisions based on the security attribute.  Instead
2095	   NFSv4 should defer permission checking on this attribute to the host
2096	   system.  These checks are performed in addition to existing DAC and
2097	   ACL checks outlined in the NFSv4 protocol.  Section 7.6 gives a
2098	   specific example of how the security attribute is handled under a
2099	   particular MAC model.

2101	7.3.4.  Object Creation

2103	   When creating files in NFSv4 the OPEN and CREATE operations are used.
2104	   One of the parameters to these operations is an fattr4 structure
2105	   containing the attributes the file is to be created with.  This
2106	   allows NFSv4 to atomically set the security attribute of files upon
2107	   creation.  When a client is MAC aware it must always provide the
2108	   initial security attribute upon file creation.  In the event that the
2109	   server is the only MAC aware entity in the system it should ignore
2110	   the security attribute specified by the client and instead make the
2111	   determination itself.  A more in depth explanation can be found in
2112	   Section 7.6.

2114	7.3.5.  Existing Objects

2116	   Note that under the MAC model, all objects must have labels.
2117	   Therefore, if an existing server is upgraded to include LNFS support,
2118	   then it is the responsibility of the security system to define the
2119	   behavior for existing objects.  For example, if the security system
2120	   is LFS 0, which means the server just stores and returns labels, then
2121	   existing files should return labels which are set to an empty value.

2123	7.3.6.  Label Changes

2125	   As per the requirements, when a file's security label is modified,
2126	   the server must notify all clients which have the file opened of the
2127	   change in label.  It does so with CB_ATTR_CHANGED.  There are
2128	   preconditions to making an attribute change imposed by NFSv4 and the
2129	   security system might want to impose others.  In the process of
2130	   meeting these preconditions, the server may chose to either serve the
2131	   request in whole or return NFS4ERR_DELAY to the SETATTR operation.

2133	   If there are open delegations on the file belonging to client other
2134	   than the one making the label change, then the process described in
2135	   Section 7.3.2 must be followed.

2137	   As the server is always presented with the subject label from the
2138	   client, it does not necessarily need to communicate the fact that the
2139	   label has changed to the client.  In the cases where the change
2140	   outright denies the client access, the client will be able to quickly
2141	   determine that there is a new label in effect.  It is in cases where
2142	   the client may share the same object between multiple subjects or a
2143	   security system which is not strictly hierarchical that the
2144	   CB_ATTR_CHANGED callback is very useful.  It allows the server to
2145	   inform the clients that the cached security attribute is now stale.

2147	   Consider a system in which the clients enforce MAC checks and and the
2148	   server has a very simple security system which just stores the
2149	   labels.  In this system, the MAC label check always allows access,
2150	   regardless of the subject label.

2152	   The way in which MAC labels are enforced is by the smart client.  So
2153	   if client A changes a security label on a file, then the server MUST
2154	   inform all clients that have the file opened that the label has
2155	   changed via CB_ATTR_CHANGED.  Then the clients MUST retrieve the new
2156	   label and MUST enforce access via the new attribute values.

2158	   [[Comment.6: Describe a LFS of 0, which will be the means to indicate
2159	   such a deployment.  In the current LFR, 0 is marked as reserved.  If
2160	   we use it, then we define the default LFS to be used by a LNFS aware
2161	   server.  I.e., it lets smart clients work together in the face of a
2162	   dumb server.  Note that will supporting this system is optional, it
2163	   will make for a very good debugging mode during development.  I.e.,
2164	   even if a server does not deploy with another security system, this
2165	   mode gets your foot in the door. --TH]]

2167	7.4.  pNFS Considerations

2169	   This section examines the issues in deploying LNFS in a pNFS
2170	   community of servers.

2172	7.4.1.  MAC Label Checks

2174	   The new FATTR4_SEC_LABEL attribute is metadata information and as
2175	   such the DS is not aware of the value contained on the MDS.
2176	   Fortunately, the NFSv4.1 protocol [2] already has provisions for
2177	   doing access level checks from the DS to the MDS.  In order for the
2178	   DS to validate the subject label presented by the client, it SHOULD
2179	   utilize this mechanism.

2181	   If a file's FATTR4_SEC_LABEL is changed, then the MDS should utilize
2182	   CB_ATTR_CHANGED to inform the client of that fact.  If the MDS is
2183	   maintaining

2185	7.5.  Discovery of Server LNFS Support

2187	   The server can easily determine that a client supports LNFS when it
2188	   queries for the FATTR4_SEC_LABEL label for an object.  Note that it
2189	   cannot assume that the presence of RPCSEC_GSSv3 indicates LNFS
2190	   support.  The client might need to discover which LFS the server
2191	   supports.

2193	   A server which supports LNFS MUST allow a client with any subject
2194	   label to retrieve the FATTR4_SEC_LABEL attribute for the root
2195	   filehandle, ROOTFH.  The following compound must always succeed as
2196	   far as a MAC label check is concerned:

2198	        PUTROOTFH, GETATTR {FATTR4_SEC_LABEL}

2200	   Note that the server might have imposed a security flavor on the root
2201	   that precludes such access.  I.e., if the server requires kerberized
2202	   access and the client presents a compound with AUTH_SYS, then the
2203	   server is allowed to return NFS4ERR_WRONGSEC in this case.  But if
2204	   the client presents a correct security flavor, then the server MUST
2205	   return the FATTR4_SEC_LABEL attribute with the supported LFS filled
2206	   in.

2208	7.6.  MAC Security NFS Modes of Operation

2210	   A system using Labeled NFS may operate in three modes.  The first
2211	   mode provides the most protection and is called "full mode".  In this
2212	   mode both the client and server implement a MAC model allowing each
2213	   end to make an access control decision.  The remaining two modes are
2214	   variations on each other and are called "smart client" and "smart
2215	   server" modes.  In these modes one end of the connection is not
2216	   implementing a MAC model and because of this these operating modes
2217	   offer less protection than full mode.

2219	7.6.1.  Full Mode

2221	   Full mode environments consist of MAC aware NFSv4 servers and clients
2222	   and may be composed of mixed MAC models and policies.  The system
2223	   requires that both the client and server have an opportunity to
2224	   perform an access control check based on all relevant information
2225	   within the network.  The file object security attribute is provided
2226	   using the mechanism described in Section 7.3.  The security attribute
2227	   of the subject making the request is transported at the RPC layer
2228	   using the mechanism described in RPCSECGSSv3 [5].

2230	7.6.1.1.  Initial Labeling and Translation

2232	   The ability to create a file is an action that a MAC model may wish
2233	   to mediate.  The client is given the responsibility to determine the
2234	   initial security attribute to be placed on a file.  This allows the
2235	   client to make a decision as to the acceptable security attributes to
2236	   create a file with before sending the request to the server.  Once
2237	   the server receives the creation request from the client it may
2238	   choose to evaluate if the security attribute is acceptable.

2240	   Security attributes on the client and server may vary based on MAC
2241	   model and policy.  To handle this the security attribute field has an
2242	   LFS component.  This component is a mechanism for the host to
2243	   identify the format and meaning of the opaque portion of the security
2244	   attribute.  A full mode environment may contain hosts operating in
2245	   several different LFSs and DOIs.  In this case a mechanism for
2246	   translating the opaque portion of the security attribute is needed.
2247	   The actual translation function will vary based on MAC model and
2248	   policy and is out of the scope of this document.  If a translation is
2249	   unavailable for a given LFS and DOI then the request SHOULD be
2250	   denied.  Another recourse is to allow the host to provide a fallback
2251	   mapping for unknown security attributes.

2253	7.6.1.2.  Policy Enforcement

2255	   In full mode access control decisions are made by both the clients
2256	   and servers.  When a client makes a request it takes the security
2257	   attribute from the requesting process and makes an access control
2258	   decision based on that attribute and the security attribute of the
2259	   object it is trying to access.  If the client denies that access an
2260	   RPC call to the server is never made.  If however the access is
2261	   allowed the client will make a call to the NFS server.

2263	   When the server receives the request from the client it extracts the
2264	   security attribute conveyed in the RPC request.  The server then uses
2265	   this security attribute and the attribute of the object the client is
2266	   trying to access to make an access control decision.  If the server's
2267	   policy allows this access it will fulfill the client's request,
2268	   otherwise it will return NFS4ERR_ACCESS.

2270	   Implementations MAY validate security attributes supplied over the
2271	   network to ensure that they are within a set of attributes permitted
2272	   from a specific peer, and if not, reject them.  Note that a system
2273	   may permit a different set of attributes to be accepted from each
2274	   peer.

2276	7.6.2.  Smart Client Mode

2278	   Smart client environments consist of NFSv4 servers that are not MAC
2279	   aware but NFSv4 clients that are.  Clients in this environment are
2280	   may consist of groups implementing different MAC models policies.
2281	   The system requires that all clients in the environment be
2282	   responsible for access control checks.  Due to the amount of trust
2283	   placed in the clients this mode is only to be used in a trusted
2284	   environment.

2286	7.6.2.1.  Initial Labeling and Translation

2288	   Just like in full mode the client is responsible for determining the
2289	   initial label upon object creation.  The server in smart client mode
2290	   does not implement a MAC model, however, it may provide the ability
2291	   to restrict the creation and labeling of object with certain labels
2292	   based on different criteria as described in Section 7.6.1.2.

2294	   In a smart client environment a group of clients operate in a single
2295	   DOI.  This removes the need for the clients to maintain a set of DOI
2296	   translations.  Servers should provide a method to allow different
2297	   groups of clients to access the server at the same time.  However it
2298	   should not let two groups of clients operating in different DOIs to
2299	   access the same files.

2301	7.6.2.2.  Policy Enforcement

2303	   In smart client mode access control decisions are made by the
2304	   clients.  When a client accesses an object it obtains the security
2305	   attribute of the object from the server and combines it with the
2306	   security attribute of the process making the request to make an
2307	   access control decision.  This check is in addition to the DAC checks
2308	   provided by NFSv4 so this may fail based on the DAC criteria even if
2309	   the MAC policy grants access.  As the policy check is located on the
2310	   client an access control denial should take the form that is native
2311	   to the platform.

2313	7.6.3.  Smart Server Mode

2315	   Smart server environments consist of NFSv4 servers that are MAC aware
2316	   and one or more MAC unaware clients.  The server is the only entity
2317	   enforcing policy, and may selectively provide standard NFS services
2318	   to clients based on their authentication credentials and/or
2319	   associated network attributes (e.g., IP address, network interface).
2320	   The level of trust and access extended to a client in this mode is
2321	   configuration-specific.

2323	7.6.3.1.  Initial Labeling and Translation

2325	   In smart server mode all labeling and access control decisions are
2326	   performed by the NFSv4 server.  In this environment the NFSv4 clients
2327	   are not MAC aware so they cannot provide input into the access
2328	   control decision.  This requires the server to determine the initial
2329	   labeling of objects.  Normally the subject to use in this calculation
2330	   would originate from the client.  Instead the NFSv4 server may choose
2331	   to assign the subject security attribute based on their
2332	   authentication credentials and/or associated network attributes
2333	   (e.g., IP address, network interface).

2335	   In smart server mode security attributes are contained solely within
2336	   the NFSv4 server.  This means that all security attributes used in
2337	   the system remain within a single LFS and DOI.  Since security
2338	   attributes will not cross DOIs or change format there is no need to
2339	   provide any translation functionality above that which is needed
2340	   internally by the MAC model.

2342	7.6.3.2.  Policy Enforcement

2344	   All access control decisions in smart server mode are made by the
2345	   server.  The server will assign the subject a security attribute
2346	   based on some criteria (e.g., IP address, network interface).  Using
2347	   the newly calculated security attribute and the security attribute of
2348	   the object being requested the MAC model makes the access control
2349	   check and returns NFS4ERR_ACCESS on a denial and NFS4_OK on success.
2350	   This check is done transparently to the client so if the MAC
2351	   permission check fails the client may be unaware of the reason for
2352	   the permission failure.  When operating in this mode administrators
2353	   attempting to debug permission failures should be aware to check the
2354	   MAC policy running on the server in addition to the DAC settings.

2356	7.7.  Security Considerations

2358	   This entire document deals with security issues.

2360	   Depending on the level of protection the MAC system offers there may
2361	   be a requirement to tightly bind the security attribute to the data.

2363	   When only one of the client or server enforces labels, it is
2364	   important to realize that the other side is not enforcing MAC
2365	   protections.  Alternate methods might be in use to handle the lack of
2366	   MAC support and care should be taken to identify and mitigate threats
2367	   from possible tampering outside of these methods.

2369	   An example of this is that a server that modifies READDIR or LOOKUP
2370	   results based on the client's subject label might want to always
2371	   construct the same subject label for a client which does not present
2372	   one.  This will prevent a non-LNFS client from mixing entries in the
2373	   directory cache.

2375	8.  Sharing change attribute implementation details with NFSv4 clients

2377	8.1.  Introduction

2379	   Although both the NFSv4 [11] and NFSv4.1 protocol [2], define the
2380	   change attribute as being mandatory to implement, there is little in
2381	   the way of guidance.  The only feature that is mandated by them is
2382	   that the value must change whenever the file data or metadata change.

2384	   While this allows for a wide range of implementations, it also leaves
2385	   the client with a conundrum: how does it determine which is the most
2386	   recent value for the change attribute in a case where several RPC
2387	   calls have been issued in parallel?  In other words if two COMPOUNDs,
2388	   both containing WRITE and GETATTR requests for the same file, have
2389	   been issued in parallel, how does the client determine which of the
2390	   two change attribute values returned in the replies to the GETATTR
2391	   requests corresponds to the most recent state of the file?  In some
2392	   cases, the only recourse may be to send another COMPOUND containing a
2393	   third GETATTR that is fully serialised with the first two.

2395	   NFSv4.2 avoids this kind of inefficiency by allowing the server to
2396	   share details about how the change attribute is expected to evolve,
2397	   so that the client may immediately determine which, out of the
2398	   several change attribute values returned by the server, is the most
2399	   recent.

2401	8.2.  Definition of the 'change_attr_type' per-file system attribute

2403	   enum change_attr_typeinfo {
2404	              NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR         = 0,
2405	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER        = 1,
2406	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2,
2407	              NFS4_CHANGE_TYPE_IS_TIME_METADATA          = 3,
2408	              NFS4_CHANGE_TYPE_IS_UNDEFINED              = 4
2409	   };

2411	        +------------------+----+---------------------------+-----+
2412	        | Name             | Id | Data Type                 | Acc |
2413	        +------------------+----+---------------------------+-----+
2414	        | change_attr_type | XX | enum change_attr_typeinfo | R   |
2415	        +------------------+----+---------------------------+-----+

2417	   The solution enables the NFS server to provide additional information
2418	   about how it expects the change attribute value to evolve after the
2419	   file data or metadata has changed. 'change_attr_type' is defined as a
2420	   new recommended attribute, and takes values from enum
2421	   change_attr_typeinfo as follows:

2423	   NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR:  The change attribute value MUST
2424	      monotonically increase for every atomic change to the file
2425	      attributes, data or directory contents.

2427	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER:  The change attribute value MUST
2428	      be incremented by one unit for every atomic change to the file
2429	      attributes, data or directory contents.  This property is
2430	      preserved when writing to pNFS data servers.

2432	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS:  The change attribute
2433	      value MUST be incremented by one unit for every atomic change to
2434	      the file attributes, data or directory contents.  In the case
2435	      where the client is writing to pNFS data servers, the number of
2436	      increments is not guaranteed to exactly match the number of
2437	      writes.

2439	   NFS4_CHANGE_TYPE_IS_TIME_METADATA:  The change attribute is
2440	      implemented as suggested in the NFSv4 spec [11] in terms of the
2441	      time_metadata attribute.

2443	   NFS4_CHANGE_TYPE_IS_UNDEFINED:  The change attribute does not take
2444	      values that fit into any of these categories.

2446	   If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR,
2447	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or
2448	   NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at
2449	   the very least that the change attribute is monotonically increasing,
2450	   which is sufficient to resolve the question of which value is the
2451	   most recent.

2453	   If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then
2454	   by inspecting the value of the 'time_delta' attribute it additionally
2455	   has the option of detecting rogue server implementations that use
2456	   time_metadata in violation of the spec.

2458	   Finally, if the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it
2459	   has the ability to predict what the resulting change attribute value
2460	   should be after a COMPOUND containing a SETATTR, WRITE, or CREATE.
2461	   This again allows it to detect changes made in parallel by another
2462	   client.  The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits
2463	   the same, but only if the client is not doing pNFS WRITEs.

2465	9.  Security Considerations

2467	10.  Operations: REQUIRED, RECOMMENDED, or OPTIONAL

2469	   The following tables summarize the operations of the NFSv4.2 protocol
2470	   and the corresponding designation of REQUIRED, RECOMMENDED, and
2471	   OPTIONAL to implement or MUST NOT implement.  The designation of MUST
2472	   NOT implement is reserved for those operations that were defined in
2473	   either NFSv4.0 or NFSV4.1 and MUST NOT be implemented in NFSv4.2.

2475	   For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation
2476	   for operations sent by the client is for the server implementation.
2477	   The client is generally required to implement the operations needed
2478	   for the operating environment for which it serves.  For example, a
2479	   read-only NFSv4.2 client would have no need to implement the WRITE
2480	   operation and is not required to do so.

2482	   The REQUIRED or OPTIONAL designation for callback operations sent by
2483	   the server is for both the client and server.  Generally, the client
2484	   has the option of creating the backchannel and sending the operations
2485	   on the fore channel that will be a catalyst for the server sending
2486	   callback operations.  A partial exception is CB_RECALL_SLOT; the only
2487	   way the client can avoid supporting this operation is by not creating
2488	   a backchannel.

2490	   Since this is a summary of the operations and their designation,
2491	   there are subtleties that are not presented here.  Therefore, if
2492	   there is a question of the requirements of implementation, the
2493	   operation descriptions themselves must be consulted along with other
2494	   relevant explanatory text within this either specification or that of
2495	   NFSv4.1 [2]..

2497	   The abbreviations used in the second and third columns of the table
2498	   are defined as follows.

2500	   REQ  REQUIRED to implement

2502	   REC  RECOMMEND to implement

2504	   OPT  OPTIONAL to implement

2506	   MNI  MUST NOT implement

2508	   For the NFSv4.2 features that are OPTIONAL, the operations that
2509	   support those features are OPTIONAL, and the server would return
2510	   NFS4ERR_NOTSUPP in response to the client's use of those operations.
2511	   If an OPTIONAL feature is supported, it is possible that a set of
2512	   operations related to the feature become REQUIRED to implement.  The
2513	   third column of the table designates the feature(s) and if the
2514	   operation is REQUIRED or OPTIONAL in the presence of support for the
2515	   feature.

2517	   The OPTIONAL features identified and their abbreviations are as
2518	   follows:

2520	   pNFS  Parallel NFS

2522	   FDELG  File Delegations

2524	   DDELG  Directory Delegations

2526	   COPY  Server Side Copy
2527	   ADB  Application Data Blocks

2529	                                Operations

2531	   +----------------------+--------------------+-----------------------+
2532	   | Operation            | REQ, REC, OPT, or  | Feature (REQ, REC, or |
2533	   |                      | MNI                | OPT)                  |
2534	   +----------------------+--------------------+-----------------------+
2535	   | ACCESS               | REQ                |                       |
2536	   | BACKCHANNEL_CTL      | REQ                |                       |
2537	   | BIND_CONN_TO_SESSION | REQ                |                       |
2538	   | CLOSE                | REQ                |                       |
2539	   | COMMIT               | REQ                |                       |
2540	   | COPY                 | OPT                | COPY (REQ)            |
2541	   | COPY_ABORT           | OPT                | COPY (REQ)            |
2542	   | COPY_NOTIFY          | OPT                | COPY (REQ)            |
2543	   | COPY_REVOKE          | OPT                | COPY (REQ)            |
2544	   | COPY_STATUS          | OPT                | COPY (REQ)            |
2545	   | CREATE               | REQ                |                       |
2546	   | CREATE_SESSION       | REQ                |                       |
2547	   | DELEGPURGE           | OPT                | FDELG (REQ)           |
2548	   | DELEGRETURN          | OPT                | FDELG, DDELG, pNFS    |
2549	   |                      |                    | (REQ)                 |
2550	   | DESTROY_CLIENTID     | REQ                |                       |
2551	   | DESTROY_SESSION      | REQ                |                       |
2552	   | EXCHANGE_ID          | REQ                |                       |
2553	   | FREE_STATEID         | REQ                |                       |
2554	   | GETATTR              | REQ                |                       |
2555	   | GETDEVICEINFO        | OPT                | pNFS (REQ)            |
2556	   | GETDEVICELIST        | OPT                | pNFS (OPT)            |
2557	   | GETFH                | REQ                |                       |
2558	   | INITIALIZE           | OPT                | ADB (REQ)             |
2559	   | GET_DIR_DELEGATION   | OPT                | DDELG (REQ)           |
2560	   | LAYOUTCOMMIT         | OPT                | pNFS (REQ)            |
2561	   | LAYOUTGET            | OPT                | pNFS (REQ)            |
2562	   | LAYOUTRETURN         | OPT                | pNFS (REQ)            |
2563	   | LINK                 | OPT                |                       |
2564	   | LOCK                 | REQ                |                       |
2565	   | LOCKT                | REQ                |                       |
2566	   | LOCKU                | REQ                |                       |
2567	   | LOOKUP               | REQ                |                       |
2568	   | LOOKUPP              | REQ                |                       |
2569	   | NVERIFY              | REQ                |                       |
2570	   | OPEN                 | REQ                |                       |
2571	   | OPENATTR             | OPT                |                       |
2572	   | OPEN_CONFIRM         | MNI                |                       |
2573	   | OPEN_DOWNGRADE       | REQ                |                       |
2574	   | PUTFH                | REQ                |                       |
2575	   | PUTPUBFH             | REQ                |                       |
2576	   | PUTROOTFH            | REQ                |                       |
2577	   | READ                 | OPT                |                       |
2578	   | READDIR              | REQ                |                       |
2579	   | READLINK             | OPT                |                       |
2580	   | READ_PLUS            | OPT                | ADB (REQ)             |
2581	   | RECLAIM_COMPLETE     | REQ                |                       |
2582	   | RELEASE_LOCKOWNER    | MNI                |                       |
2583	   | REMOVE               | REQ                |                       |
2584	   | RENAME               | REQ                |                       |
2585	   | RENEW                | MNI                |                       |
2586	   | RESTOREFH            | REQ                |                       |
2587	   | SAVEFH               | REQ                |                       |
2588	   | SECINFO              | REQ                |                       |
2589	   | SECINFO_NO_NAME      | REC                | pNFS file layout      |
2590	   |                      |                    | (REQ)                 |
2591	   | SEQUENCE             | REQ                |                       |
2592	   | SETATTR              | REQ                |                       |
2593	   | SETCLIENTID          | MNI                |                       |
2594	   | SETCLIENTID_CONFIRM  | MNI                |                       |
2595	   | SET_SSV              | REQ                |                       |
2596	   | TEST_STATEID         | REQ                |                       |
2597	   | VERIFY               | REQ                |                       |
2598	   | WANT_DELEGATION      | OPT                | FDELG (OPT)           |
2599	   | WRITE                | REQ                |                       |
2600	   +----------------------+--------------------+-----------------------+

2602	                            Callback Operations

2604	   +-------------------------+-------------------+---------------------+
2605	   | Operation               | REQ, REC, OPT, or | Feature (REQ, REC,  |
2606	   |                         | MNI               | or OPT)             |
2607	   +-------------------------+-------------------+---------------------+
2608	   | CB_COPY                 | OPT               | COPY (REQ)          |
2609	   | CB_GETATTR              | OPT               | FDELG (REQ)         |
2610	   | CB_LAYOUTRECALL         | OPT               | pNFS (REQ)          |
2611	   | CB_NOTIFY               | OPT               | DDELG (REQ)         |
2612	   | CB_NOTIFY_DEVICEID      | OPT               | pNFS (OPT)          |
2613	   | CB_NOTIFY_LOCK          | OPT               |                     |
2614	   | CB_PUSH_DELEG           | OPT               | FDELG (OPT)         |
2615	   | CB_RECALL               | OPT               | FDELG, DDELG, pNFS  |
2616	   |                         |                   | (REQ)               |
2617	   | CB_RECALL_ANY           | OPT               | FDELG, DDELG, pNFS  |
2618	   |                         |                   | (REQ)               |
2619	   | CB_RECALL_SLOT          | REQ               |                     |
2620	   | CB_RECALLABLE_OBJ_AVAIL | OPT               | DDELG, pNFS (REQ)   |
2621	   | CB_SEQUENCE             | OPT               | FDELG, DDELG, pNFS  |
2622	   |                         |                   | (REQ)               |
2623	   | CB_WANTS_CANCELLED      | OPT               | FDELG, DDELG, pNFS  |
2624	   |                         |                   | (REQ)               |
2625	   +-------------------------+-------------------+---------------------+

2627	11.  NFSv4.2 Operations

2629	11.1.  Operation 59: COPY - Initiate a server-side copy

2631	11.1.1.  ARGUMENT

2633	   const COPY4_GUARDED     = 0x00000001;
2634	   const COPY4_METADATA    = 0x00000002;

2636	   struct COPY4args {
2637	           /* SAVED_FH: source file */
2638	           /* CURRENT_FH: destination file or */
2639	           /*             directory           */
2640	           offset4         ca_src_offset;
2641	           offset4         ca_dst_offset;
2642	           length4         ca_count;
2643	           uint32_t        ca_flags;
2644	           component4      ca_destination;
2645	           netloc4         ca_source_server<>;
2646	   };

2648	11.1.2.  RESULT

2650	   union COPY4res switch (nfsstat4 cr_status) {
2651	           case NFS4_OK:
2652	                   stateid4        cr_callback_id<1>;
2653	           default:
2654	                   length4         cr_bytes_copied;
2655	   };

2657	11.1.3.  DESCRIPTION

2659	   The COPY operation is used for both intra-server and inter-server
2660	   copies.  In both cases, the COPY is always sent from the client to
2661	   the destination server of the file copy.  The COPY operation requests
2662	   that a file be copied from the location specified by the SAVED_FH
2663	   value to the location specified by the combination of CURRENT_FH and
2664	   ca_destination.

2666	   The SAVED_FH must be a regular file.  If SAVED_FH is not a regular
2667	   file, the operation MUST fail and return NFS4ERR_WRONG_TYPE.

2669	   In order to set SAVED_FH to the source file handle, the compound
2670	   procedure requesting the COPY will include a sub-sequence of
2671	   operations such as

2673	      PUTFH source-fh
2674	      SAVEFH

2676	   If the request is for a server-to-server copy, the source-fh is a
2677	   filehandle from the source server and the compound procedure is being
2678	   executed on the destination server.  In this case, the source-fh is a
2679	   foreign filehandle on the server receiving the COPY request.  If
2680	   either PUTFH or SAVEFH checked the validity of the filehandle, the
2681	   operation would likely fail and return NFS4ERR_STALE.

2683	   In order to avoid this problem, the minor version incorporating the
2684	   COPY operations will need to make a few small changes in the handling
2685	   of existing operations.  If a server supports the server-to-server
2686	   COPY feature, a PUTFH followed by a SAVEFH MUST NOT return
2687	   NFS4ERR_STALE for either operation.  These restrictions do not pose
2688	   substantial difficulties for servers.  The CURRENT_FH and SAVED_FH
2689	   may be validated in the context of the operation referencing them and
2690	   an NFS4ERR_STALE error returned for an invalid file handle at that
2691	   point.

2693	   The CURRENT_FH and ca_destination together specify the destination of
2694	   the copy operation.  If ca_destination is of 0 (zero) length, then
2695	   CURRENT_FH specifies the target file.  In this case, CURRENT_FH MUST
2696	   be a regular file and not a directory.  If ca_destination is not of 0
2697	   (zero) length, the ca_destination argument specifies the file name to
2698	   which the data will be copied within the directory identified by
2699	   CURRENT_FH.  In this case, CURRENT_FH MUST be a directory and not a
2700	   regular file.

2702	   If the file named by ca_destination does not exist and the operation
2703	   completes successfully, the file will be visible in the file system
2704	   namespace.  If the file does not exist and the operation fails, the
2705	   file MAY be visible in the file system namespace depending on when
2706	   the failure occurs and on the implementation of the NFS server
2707	   receiving the COPY operation.  If the ca_destination name cannot be
2708	   created in the destination file system (due to file name
2709	   restrictions, such as case or length), the operation MUST fail.

2711	   The ca_src_offset is the offset within the source file from which the
2712	   data will be read, the ca_dst_offset is the offset within the
2713	   destination file to which the data will be written, and the ca_count
2714	   is the number of bytes that will be copied.  An offset of 0 (zero)
2715	   specifies the start of the file.  A count of 0 (zero) requests that
2716	   all bytes from ca_src_offset through EOF be copied to the
2717	   destination.  If concurrent modifications to the source file overlap
2718	   with the source file region being copied, the data copied may include
2719	   all, some, or none of the modifications.  The client can use standard
2720	   NFS operations (e.g., OPEN with OPEN4_SHARE_DENY_WRITE or mandatory
2721	   byte range locks) to protect against concurrent modifications if the
2722	   client is concerned about this.  If the source file's end of file is
2723	   being modified in parallel with a copy that specifies a count of 0
2724	   (zero) bytes, the amount of data copied is implementation dependent
2725	   (clients may guard against this case by specifying a non-zero count
2726	   value or preventing modification of the source file as mentioned
2727	   above).

2729	   If the source offset or the source offset plus count is greater than
2730	   or equal to the size of the source file, the operation will fail with
2731	   NFS4ERR_INVAL.  The destination offset or destination offset plus
2732	   count may be greater than the size of the destination file.  This
2733	   allows for the client to issue parallel copies to implement
2734	   operations such as "cat file1 file2 file3 file4 > dest".

2736	   If the destination file is created as a result of this command, the
2737	   destination file's size will be equal to the number of bytes
2738	   successfully copied.  If the destination file already existed, the
2739	   destination file's size may increase as a result of this operation
2740	   (e.g. if ca_dst_offset plus ca_count is greater than the
2741	   destination's initial size).

2743	   If the ca_source_server list is specified, then this is an inter-
2744	   server copy operation and the source file is on a remote server.  The
2745	   client is expected to have previously issued a successful COPY_NOTIFY
2746	   request to the remote source server.  The ca_source_server list
2747	   SHOULD be the same as the COPY_NOTIFY response's cnr_source_server
2748	   list.  If the client includes the entries from the COPY_NOTIFY
2749	   response's cnr_source_server list in the ca_source_server list, the
2750	   source server can indicate a specific copy protocol for the
2751	   destination server to use by returning a URL, which specifies both a
2752	   protocol service and server name.  Server-to-server copy protocol
2753	   considerations are described in Section 2.2.3 and Section 2.4.1.

2755	   The ca_flags argument allows the copy operation to be customized in
2756	   the following ways using the guarded flag (COPY4_GUARDED) and the
2757	   metadata flag (COPY4_METADATA).

2759	   If the guarded flag is set and the destination exists on the server,
2760	   this operation will fail with NFS4ERR_EXIST.

2762	   If the guarded flag is not set and the destination exists on the
2763	   server, the behavior is implementation dependent.

2765	   If the metadata flag is set and the client is requesting a whole file
2766	   copy (i.e., ca_count is 0 (zero)), a subset of the destination file's
2767	   attributes MUST be the same as the source file's corresponding
2768	   attributes and a subset of the destination file's attributes SHOULD
2769	   be the same as the source file's corresponding attributes.  The
2770	   attributes in the MUST and SHOULD copy subsets will be defined for
2771	   each NFS version.

2773	   For NFSv4.1, Table 2 and Table 3 list the REQUIRED and RECOMMENDED
2774	   attributes respectively.  A "MUST" in the "Copy to destination file?"
2775	   column indicates that the attribute is part of the MUST copy set.  A
2776	   "SHOULD" in the "Copy to destination file?" column indicates that the
2777	   attribute is part of the SHOULD copy set.

2779	          +--------------------+----+---------------------------+
2780	          | Name               | Id | Copy to destination file? |
2781	          +--------------------+----+---------------------------+
2782	          | supported_attrs    | 0  | no                        |
2783	          | type               | 1  | MUST                      |
2784	          | fh_expire_type     | 2  | no                        |
2785	          | change             | 3  | SHOULD                    |
2786	          | size               | 4  | MUST                      |
2787	          | link_support       | 5  | no                        |
2788	          | symlink_support    | 6  | no                        |
2789	          | named_attr         | 7  | no                        |
2790	          | fsid               | 8  | no                        |
2791	          | unique_handles     | 9  | no                        |
2792	          | lease_time         | 10 | no                        |
2793	          | rdattr_error       | 11 | no                        |
2794	          | filehandle         | 19 | no                        |
2795	          | suppattr_exclcreat | 75 | no                        |
2796	          +--------------------+----+---------------------------+

2798	                                  Table 2

2800	          +--------------------+----+---------------------------+
2801	          | Name               | Id | Copy to destination file? |
2802	          +--------------------+----+---------------------------+
2803	          | acl                | 12 | MUST                      |
2804	          | aclsupport         | 13 | no                        |
2805	          | archive            | 14 | no                        |
2806	          | cansettime         | 15 | no                        |
2807	          | case_insensitive   | 16 | no                        |
2808	          | case_preserving    | 17 | no                        |
2809	          | change_policy      | 60 | no                        |
2810	          | chown_restricted   | 18 | MUST                      |
2811	          | dacl               | 58 | MUST                      |
2812	          | dir_notif_delay    | 56 | no                        |
2813	          | dirent_notif_delay | 57 | no                        |
2814	          | fileid             | 20 | no                        |
2815	          | files_avail        | 21 | no                        |
2816	          | files_free         | 22 | no                        |
2817	          | files_total        | 23 | no                        |
2818	          | fs_charset_cap     | 76 | no                        |
2819	          | fs_layout_type     | 62 | no                        |
2820	          | fs_locations       | 24 | no                        |
2821	          | fs_locations_info  | 67 | no                        |
2822	          | fs_status          | 61 | no                        |
2823	          | hidden             | 25 | MUST                      |
2824	          | homogeneous        | 26 | no                        |
2825	          | layout_alignment   | 66 | no                        |
2826	          | layout_blksize     | 65 | no                        |
2827	          | layout_hint        | 63 | no                        |
2828	          | layout_type        | 64 | no                        |
2829	          | maxfilesize        | 27 | no                        |
2830	          | maxlink            | 28 | no                        |
2831	          | maxname            | 29 | no                        |
2832	          | maxread            | 30 | no                        |
2833	          | maxwrite           | 31 | no                        |
2834	          | mdsthreshold       | 68 | no                        |
2835	          | mimetype           | 32 | MUST                      |
2836	          | mode               | 33 | MUST                      |
2837	          | mode_set_masked    | 74 | no                        |
2838	          | mounted_on_fileid  | 55 | no                        |
2839	          | no_trunc           | 34 | no                        |
2840	          | numlinks           | 35 | no                        |
2841	          | owner              | 36 | MUST                      |
2842	          | owner_group        | 37 | MUST                      |
2843	          | quota_avail_hard   | 38 | no                        |
2844	          | quota_avail_soft   | 39 | no                        |
2845	          | quota_used         | 40 | no                        |
2846	          | rawdev             | 41 | no                        |
2847	          | retentevt_get      | 71 | MUST                      |
2848	          | retentevt_set      | 72 | no                        |
2849	          | retention_get      | 69 | MUST                      |
2850	          | retention_hold     | 73 | MUST                      |
2851	          | retention_set      | 70 | no                        |
2852	          | sacl               | 59 | MUST                      |
2853	          | space_avail        | 42 | no                        |
2854	          | space_free         | 43 | no                        |
2855	          | space_freed        | 78 | no                        |
2856	          | space_reserved     | 77 | MUST                      |
2857	          | space_total        | 44 | no                        |
2858	          | space_used         | 45 | no                        |
2859	          | system             | 46 | MUST                      |
2860	          | time_access        | 47 | MUST                      |
2861	          | time_access_set    | 48 | no                        |
2862	          | time_backup        | 49 | no                        |
2863	          | time_create        | 50 | MUST                      |
2864	          | time_delta         | 51 | no                        |
2865	          | time_metadata      | 52 | SHOULD                    |
2866	          | time_modify        | 53 | MUST                      |
2867	          | time_modify_set    | 54 | no                        |
2868	          +--------------------+----+---------------------------+

2870	                                  Table 3

2872	   [NOTE: The source file's attribute values will take precedence over
2873	   any attribute values inherited by the destination file.]

2875	   In the case of an inter-server copy or an intra-server copy between
2876	   file systems, the attributes supported for the source file and
2877	   destination file could be different.  By definition,the REQUIRED
2878	   attributes will be supported in all cases.  If the metadata flag is
2879	   set and the source file has a RECOMMENDED attribute that is not
2880	   supported for the destination file, the copy MUST fail with
2881	   NFS4ERR_ATTRNOTSUPP.

2883	   Any attribute supported by the destination server that is not set on
2884	   the source file SHOULD be left unset.

2886	   Metadata attributes not exposed via the NFS protocol SHOULD be copied
2887	   to the destination file where appropriate.

2889	   The destination file's named attributes are not duplicated from the
2890	   source file.  After the copy process completes, the client MAY
2891	   attempt to duplicate named attributes using standard NFSv4
2892	   operations.  However, the destination file's named attribute
2893	   capabilities MAY be different from the source file's named attribute
2894	   capabilities.

2896	   If the metadata flag is not set and the client is requesting a whole
2897	   file copy (i.e., ca_count is 0 (zero)), the destination file's
2898	   metadata is implementation dependent.

2900	   If the client is requesting a partial file copy (i.e., ca_count is
2901	   not 0 (zero)), the client SHOULD NOT set the metadata flag and the
2902	   server MUST ignore the metadata flag.

2904	   If the operation does not result in an immediate failure, the server
2905	   will return NFS4_OK, and the CURRENT_FH will remain the destination's
2906	   filehandle.

2908	   If an immediate failure does occur, cr_bytes_copied will be set to
2909	   the number of bytes copied to the destination file before the error
2910	   occurred.  The cr_bytes_copied value indicates the number of bytes
2911	   copied but not which specific bytes have been copied.

2913	   A return of NFS4_OK indicates that either the operation is complete
2914	   or the operation was initiated and a callback will be used to deliver
2915	   the final status of the operation.

2917	   If the cr_callback_id is returned, this indicates that the operation
2918	   was initiated and a CB_COPY callback will deliver the final results
2919	   of the operation.  The cr_callback_id stateid is termed a copy
2920	   stateid in this context.  The server is given the option of returning
2921	   the results in a callback because the data may require a relatively
2922	   long period of time to copy.

2924	   If no cr_callback_id is returned, the operation completed
2925	   synchronously and no callback will be issued by the server.  The
2926	   completion status of the operation is indicated by cr_status.

2928	   If the copy completes successfully, either synchronously or
2929	   asynchronously, the data copied from the source file to the
2930	   destination file MUST appear identical to the NFS client.  However,
2931	   the NFS server's on disk representation of the data in the source
2932	   file and destination file MAY differ.  For example, the NFS server
2933	   might encrypt, compress, deduplicate, or otherwise represent the on
2934	   disk data in the source and destination file differently.

2936	   In the event of a failure the state of the destination file is
2937	   implementation dependent.  The COPY operation may fail for the
2938	   following reasons (this is a partial list).

2940	   NFS4ERR_MOVED:  The file system which contains the source file, or
2941	      the destination file or directory is not present.  The client can
2942	      determine the correct location and reissue the operation with the
2943	      correct location.

2945	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
2946	      NFS server receiving this request.

2948	   NFS4ERR_PARTNER_NOTSUPP:  The remote server does not support the
2949	      server-to-server copy offload protocol.

2951	   NFS4ERR_OFFLOAD_DENIED:  The copy offload operation is supported by
2952	      both the source and the destination, but the destination is not
2953	      allowing it for this file.  If the client sees this error, it
2954	      should fall back to the normal copy semantics.

2956	   NFS4ERR_PARTNER_NO_AUTH:  The remote server does not authorize a
2957	      server-to-server copy offload operation.  This may be due to the
2958	      client's failure to send the COPY_NOTIFY operation to the remote
2959	      server, the remote server receiving a server-to-server copy
2960	      offload request after the copy lease time expired, or for some
2961	      other permission problem.

2963	   NFS4ERR_FBIG:  The copy operation would have caused the file to grow
2964	      beyond the server's limit.

2966	   NFS4ERR_NOTDIR:  The CURRENT_FH is a file and ca_destination has non-
2967	      zero length.

2969	   NFS4ERR_WRONG_TYPE:  The SAVED_FH is not a regular file.

2971	   NFS4ERR_ISDIR:  The CURRENT_FH is a directory and ca_destination has
2972	      zero length.

2974	   NFS4ERR_INVAL:  The source offset or offset plus count are greater
2975	      than or equal to the size of the source file.

2977	   NFS4ERR_DELAY:  The server does not have the resources to perform the
2978	      copy operation at the current time.  The client should retry the
2979	      operation sometime in the future.

2981	   NFS4ERR_METADATA_NOTSUPP:  The destination file cannot support the
2982	      same metadata as the source file.

2984	   NFS4ERR_WRONGSEC:  The security mechanism being used by the client
2985	      does not match the server's security policy.

2987	11.2.  Operation 60: COPY_ABORT - Cancel a server-side copy

2989	11.2.1.  ARGUMENT

2991	   struct COPY_ABORT4args {
2992	           /* CURRENT_FH: desination file */
2993	           stateid4        caa_stateid;
2994	   };

2996	11.2.2.  RESULT

2998	   struct COPY_ABORT4res {
2999	           nfsstat4        car_status;
3000	   };

3002	11.2.3.  DESCRIPTION

3004	   COPY_ABORT is used for both intra- and inter-server asynchronous
3005	   copies.  The COPY_ABORT operation allows the client to cancel a
3006	   server-side copy operation that it initiated.  This operation is sent
3007	   in a COMPOUND request from the client to the destination server.
3008	   This operation may be used to cancel a copy when the application that
3009	   requested the copy exits before the operation is completed or for
3010	   some other reason.

3012	   The request contains the filehandle and copy stateid cookies that act
3013	   as the context for the previously initiated copy operation.

3015	   The result's car_status field indicates whether the cancel was
3016	   successful or not.  A value of NFS4_OK indicates that the copy
3017	   operation was canceled and no callback will be issued by the server.
3018	   A copy operation that is successfully canceled may result in none,
3019	   some, or all of the data copied.

3021	   If the server supports asynchronous copies, the server is REQUIRED to
3022	   support the COPY_ABORT operation.

3024	   The COPY_ABORT operation may fail for the following reasons (this is
3025	   a partial list):

3027	   NFS4ERR_NOTSUPP:  The abort operation is not supported by the NFS
3028	      server receiving this request.

3030	   NFS4ERR_RETRY:  The abort failed, but a retry at some time in the
3031	      future MAY succeed.

3033	   NFS4ERR_COMPLETE_ALREADY:  The abort failed, and a callback will
3034	      deliver the results of the copy operation.

3036	   NFS4ERR_SERVERFAULT:  An error occurred on the server that does not
3037	      map to a specific error code.

3039	11.3.  Operation 61: COPY_NOTIFY - Notify a source server of a future
3040	       copy

3042	11.3.1.  ARGUMENT

3044	   struct COPY_NOTIFY4args {
3045	           /* CURRENT_FH: source file */
3046	           netloc4         cna_destination_server;
3047	   };

3049	11.3.2.  RESULT

3051	   struct COPY_NOTIFY4resok {
3052	           nfstime4        cnr_lease_time;
3053	           netloc4         cnr_source_server<>;
3054	   };

3056	   union COPY_NOTIFY4res switch (nfsstat4 cnr_status) {
3057	           case NFS4_OK:
3058	                   COPY_NOTIFY4resok       resok4;
3059	           default:
3060	                   void;
3061	   };

3063	11.3.3.  DESCRIPTION

3065	   This operation is used for an inter-server copy.  A client sends this
3066	   operation in a COMPOUND request to the source server to authorize a
3067	   destination server identified by cna_destination_server to read the
3068	   file specified by CURRENT_FH on behalf of the given user.

3070	   The cna_destination_server MUST be specified using the netloc4
3071	   network location format.  The server is not required to resolve the
3072	   cna_destination_server address before completing this operation.

3074	   If this operation succeeds, the source server will allow the
3075	   cna_destination_server to copy the specified file on behalf of the
3076	   given user.  If COPY_NOTIFY succeeds, the destination server is
3077	   granted permission to read the file as long as both of the following
3078	   conditions are met:

3080	   o  The destination server begins reading the source file before the
3081	      cnr_lease_time expires.  If the cnr_lease_time expires while the
3082	      destination server is still reading the source file, the
3083	      destination server is allowed to finish reading the file.

3085	   o  The client has not issued a COPY_REVOKE for the same combination
3086	      of user, filehandle, and destination server.

3088	   The cnr_lease_time is chosen by the source server.  A cnr_lease_time
3089	   of 0 (zero) indicates an infinite lease.  To renew the copy lease
3090	   time the client should resend the same copy notification request to
3091	   the source server.

3093	   To avoid the need for synchronized clocks, copy lease times are
3094	   granted by the server as a time delta.  However, there is a
3095	   requirement that the client and server clocks do not drift
3096	   excessively over the duration of the lease.  There is also the issue
3097	   of propagation delay across the network which could easily be several
3098	   hundred milliseconds as well as the possibility that requests will be
3099	   lost and need to be retransmitted.

3101	   To take propagation delay into account, the client should subtract it
3102	   from copy lease times (e.g., if the client estimates the one-way
3103	   propagation delay as 200 milliseconds, then it can assume that the
3104	   lease is already 200 milliseconds old when it gets it).  In addition,
3105	   it will take another 200 milliseconds to get a response back to the
3106	   server.  So the client must send a lease renewal or send the copy
3107	   offload request to the cna_destination_server at least 400
3108	   milliseconds before the copy lease would expire.  If the propagation
3109	   delay varies over the life of the lease (e.g., the client is on a
3110	   mobile host), the client will need to continuously subtract the
3111	   increase in propagation delay from the copy lease times.

3113	   The server's copy lease period configuration should take into account
3114	   the network distance of the clients that will be accessing the
3115	   server's resources.  It is expected that the lease period will take
3116	   into account the network propagation delays and other network delay
3117	   factors for the client population.  Since the protocol does not allow
3118	   for an automatic method to determine an appropriate copy lease
3119	   period, the server's administrator may have to tune the copy lease
3120	   period.

3122	   A successful response will also contain a list of names, addresses,
3123	   and URLs called cnr_source_server, on which the source is willing to
3124	   accept connections from the destination.  These might not be
3125	   reachable from the client and might be located on networks to which
3126	   the client has no connection.

3128	   If the client wishes to perform an inter-server copy, the client MUST
3129	   send a COPY_NOTIFY to the source server.  Therefore, the source
3130	   server MUST support COPY_NOTIFY.

3132	   For a copy only involving one server (the source and destination are
3133	   on the same server), this operation is unnecessary.

3135	   The COPY_NOTIFY operation may fail for the following reasons (this is
3136	   a partial list):

3138	   NFS4ERR_MOVED:  The file system which contains the source file is not
3139	      present on the source server.  The client can determine the
3140	      correct location and reissue the operation with the correct
3141	      location.

3143	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
3144	      NFS server receiving this request.

3146	   NFS4ERR_WRONGSEC:  The security mechanism being used by the client
3147	      does not match the server's security policy.

3149	11.4.  Operation 62: COPY_REVOKE - Revoke a destination server's copy
3150	       privileges

3152	11.4.1.  ARGUMENT

3154	   struct COPY_REVOKE4args {
3155	           /* CURRENT_FH: source file */
3156	           netloc4         cra_destination_server;
3157	   };

3159	11.4.2.  RESULT

3161	   struct COPY_REVOKE4res {
3162	           nfsstat4        crr_status;
3163	   };

3165	11.4.3.  DESCRIPTION

3167	   This operation is used for an inter-server copy.  A client sends this
3168	   operation in a COMPOUND request to the source server to revoke the
3169	   authorization of a destination server identified by
3170	   cra_destination_server from reading the file specified by CURRENT_FH
3171	   on behalf of given user.  If the cra_destination_server has already
3172	   begun copying the file, a successful return from this operation
3173	   indicates that further access will be prevented.

3175	   The cra_destination_server MUST be specified using the netloc4
3176	   network location format.  The server is not required to resolve the
3177	   cra_destination_server address before completing this operation.

3179	   The COPY_REVOKE operation is useful in situations in which the source
3180	   server granted a very long or infinite lease on the destination
3181	   server's ability to read the source file and all copy operations on
3182	   the source file have been completed.

3184	   For a copy only involving one server (the source and destination are
3185	   on the same server), this operation is unnecessary.

3187	   If the server supports COPY_NOTIFY, the server is REQUIRED to support
3188	   the COPY_REVOKE operation.

3190	   The COPY_REVOKE operation may fail for the following reasons (this is
3191	   a partial list):

3193	   NFS4ERR_MOVED:  The file system which contains the source file is not
3194	      present on the source server.  The client can determine the
3195	      correct location and reissue the operation with the correct
3196	      location.

3198	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
3199	      NFS server receiving this request.

3201	11.5.  Operation 63: COPY_STATUS - Poll for status of a server-side copy

3203	11.5.1.  ARGUMENT

3205	   struct COPY_STATUS4args {
3206	           /* CURRENT_FH: destination file */
3207	           stateid4        csa_stateid;
3208	   };

3210	11.5.2.  RESULT

3212	   struct COPY_STATUS4resok {
3213	           length4         csr_bytes_copied;
3214	           nfsstat4        csr_complete<1>;
3215	   };

3217	   union COPY_STATUS4res switch (nfsstat4 csr_status) {
3218	           case NFS4_OK:
3219	                   COPY_STATUS4resok       resok4;
3220	           default:
3221	                   void;
3222	   };

3224	11.5.3.  DESCRIPTION

3226	   COPY_STATUS is used for both intra- and inter-server asynchronous
3227	   copies.  The COPY_STATUS operation allows the client to poll the
3228	   server to determine the status of an asynchronous copy operation.
3229	   This operation is sent by the client to the destination server.

3231	   If this operation is successful, the number of bytes copied are
3232	   returned to the client in the csr_bytes_copied field.  The
3233	   csr_bytes_copied value indicates the number of bytes copied but not
3234	   which specific bytes have been copied.

3236	   If the optional csr_complete field is present, the copy has
3237	   completed.  In this case the status value indicates the result of the
3238	   asynchronous copy operation.  In all cases, the server will also
3239	   deliver the final results of the asynchronous copy in a CB_COPY
3240	   operation.

3242	   The failure of this operation does not indicate the result of the
3243	   asynchronous copy in any way.

3245	   If the server supports asynchronous copies, the server is REQUIRED to
3246	   support the COPY_STATUS operation.

3248	   The COPY_STATUS operation may fail for the following reasons (this is
3249	   a partial list):

3251	   NFS4ERR_NOTSUPP:  The copy status operation is not supported by the
3252	      NFS server receiving this request.

3254	   NFS4ERR_BAD_STATEID:  The stateid is not valid (see Section 2.3.2
3255	      below).

3257	   NFS4ERR_EXPIRED:  The stateid has expired (see Copy Offload Stateid
3258	      section below).

3260	11.6.  Modification to Operation 42: EXCHANGE_ID - Instantiate Client ID

3262	11.6.1.  ARGUMENT

3264	      /* new */
3265	      const EXCHGID4_FLAG_SUPP_FENCE_OPS      = 0x00000004;

3267	11.6.2.  RESULT

3269	      Unchanged

3271	11.6.3.  MOTIVATION

3273	   Enterprise applications require guarantees that an operation has
3274	   either aborted or completed.  NFSv4.1 provides this guarantee as long
3275	   as the session is alive: simply send a SEQUENCE operation on the same
3276	   slot with a new sequence number, and the successful return of
3277	   SEQUENCE indicates the previous operation has completed.  However, if
3278	   the session is lost, there is no way to know when any in progress
3279	   operations have aborted or completed.  In hindsight, the NFSv4.1
3280	   specification should have mandated that DESTROY_SESSION abort/
3281	   complete all outstanding operations.

3283	11.6.4.  DESCRIPTION

3285	   A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability
3286	   when it sends an EXCHANGE_ID operation.  The server SHOULD set this
3287	   capability in the EXCHANGE_ID reply whether the client requests it or
3288	   not.  If the client ID is created with this capability then the
3289	   following will occur:

3291	   o  The server will not reply to DESTROY_SESSION until all operations
3292	      in progress are completed or aborted.

3294	   o  The server will not reply to subsequent EXCHANGE_ID invoked on the
3295	      same Client Owner with a new verifier until all operations in
3296	      progress on the Client ID's session are completed or aborted.

3298	   o  When DESTROY_CLIENTID is invoked, if there are sessions (both idle
3299	      and non-idle), opens, locks, delegations, layouts, and/or wants
3300	      (Section 18.49) associated with the client ID are removed.
3301	      Pending operations will be completed or aborted before the
3302	      sessions, opens, locks, delegations, layouts, and/or wants are
3303	      deleted.

3305	   o  The NFS server SHOULD support client ID trunking, and if it does
3306	      and the EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a
3307	      session ID created on one node of the storage cluster MUST be
3308	      destroyable via DESTROY_SESSION.  In addition, DESTROY_CLIENTID
3309	      and an EXCHANGE_ID with a new verifier affects all sessions
3310	      regardless what node the sessions were created on.

3312	11.7.  Operation 64: INITIALIZE

3314	   This operation can be used to initialize the structure imposed by an
3315	   application onto a file and to punch a hole into a file.

3317	   The server has no concept of the structure imposed by the
3318	   application.  It is only when the application writes to a section of
3319	   the file does order get imposed.  In order to detect corruption even
3320	   before the application utilizes the file, the application will want
3321	   to initialize a range of ADBs.  It uses the INITIALIZE operation to
3322	   do so.

3324	11.7.1.  ARGUMENT

3326	   /*
3327	    * We use data_content4 in case we wish to
3328	    * extend new types later. Note that we
3329	    * are explicitly disallowing data.
3330	    */
3331	   union initialize_arg4 switch (data_content4 content) {
3332	   case NFS4_CONTENT_APP_BLOCK:
3333	           app_data_block4 ia_adb;
3334	   case NFS4_CONTENT_HOLE:
3335	           data_info4      ia_hole;
3336	   default:
3337	           void;
3338	   };

3340	   struct INITIALIZE4args {
3341	           /* CURRENT_FH: file */
3342	           stateid4        ia_stateid;
3343	           stable_how4     ia_stable;
3344	           initialize_arg4 ia_data<>;
3345	   };

3347	11.7.2.  RESULT

3349	   struct INITIALIZE4resok {
3350	           count4          ir_count;
3351	           stable_how4     ir_committed;
3352	           verifier4       ir_writeverf;
3353	           data_content4   ir_sparse;
3354	   };

3356	   union INITIALIZE4res switch (nfsstat4 status) {
3357	   case NFS4_OK:
3358	           INITIALIZE4resok        resok4;
3359	   default:
3360	           void;
3361	   };

3363	11.7.3.  DESCRIPTION

3365	   When the client invokes the INITIALIZE operation, it has two desired
3366	   results:

3368	   1.  The structure described by the app_data_block4 be imposed on the
3369	       file.

3371	   2.  The contents described by the app_data_block4 be sparse.

3373	   If the server supports the INITIALIZE operation, it still might not
3374	   support sparse files.  So if it receives the INITIALIZE operation,
3375	   then it MUST populate the contents of the file with the initialized
3376	   ADBs.  In other words, if the server supports INITIALIZE, then it
3377	   supports the concept of ADBs.  [[Comment.7: Do we want to support an
3378	   asynchronous INITIALIZE?  Do we have to? --TH]]

3380	   If the data was already initialized, There are two interesting
3381	   scenarios:

3383	   1.  The data blocks are allocated.

3385	   2.  Initializing in the middle of an existing ADB.

3387	   If the data blocks were already allocated, then the INITIALIZE is a
3388	   hole punch operation.  If INITIALIZE supports sparse files, then the
3389	   data blocks are to be deallocated.  If not, then the data blocks are
3390	   to be rewritten in the indicated ADB format.  [[Comment.8: Need to
3391	   document interaction between space reservation and hole punching?
3392	   --TH]]

3394	   Since the server has no knowledge of ADBs, it should not report
3395	   misaligned creation of ADBs.  Even while it can detect them, it
3396	   cannot disallow them, as the application might be in the process of
3397	   changing the size of the ADBs.  Thus the server must be prepared to
3398	   handle an INITIALIZE into an existing ADB.

3400	   This document does not mandate the manner in which the server stores
3401	   ADBs sparsely for a file.  It does assume that if ADBs are stored
3402	   sparsely, then the server can detect when an INITIALIZE arrives that
3403	   will force a new ADB to start inside an existing ADB.  For example,
3404	   assume that ADBi has a adb_block_size of 4k and that an INITIALIZE
3405	   starts 1k inside ADBi.  The server should [[Comment.9: Need to flesh
3406	   this out. --TH]]

3408	11.7.3.1.  Hole punching

3410	   Whenever a client wishes to deallocate the blocks backing a
3411	   particular region in the file, it calls the INITIALIZE operation with
3412	   the current filehandle set to the filehandle of the file in question,
3413	   start offset and length in bytes of the region set in hpa_offset and
3414	   hpa_count respectively.  All further reads to this region MUST return
3415	   zeros until overwritten.  The filehandle specified must be that of a
3416	   regular file.

3418	   Situations may arise where ia_hole.hi_offset and/or ia_hole.hi_offset
3419	   + ia_hole.hi_length will not be aligned to a boundary that the server
3420	   does allocations/ deallocations in.  For most filesystems, this is
3421	   the block size of the file system.  In such a case, the server can
3422	   deallocate as many bytes as it can in the region.  The blocks that
3423	   cannot be deallocated MUST be zeroed.  Except for the block
3424	   deallocation and maximum hole punching capability, a INITIALIZE
3425	   operation is to be treated similar to a write of zeroes.

3427	   The server is not required to complete deallocating the blocks
3428	   specified in the operation before returning.  It is acceptable to
3429	   have the deallocation be deferred.  In fact, INITIALIZE is merely a
3430	   hint; it is valid for a server to return success without ever doing
3431	   anything towards deallocating the blocks backing the region
3432	   specified.  However, any future reads to the region MUST return
3433	   zeroes.

3435	   If used to hole punch, INITIALIZE will result in the space_used
3436	   attribute being decreased by the number of bytes that were
3437	   deallocated.  The space_freed attribute may or may not decrease,
3438	   depending on the support and whether the blocks backing the specified
3439	   range were shared or not.  The size attribute will remain unchanged.

3441	   The INITIALIZE operation MUST NOT change the space reservation
3442	   guarantee of the file.  While the server can deallocate the blocks
3443	   specified by hpa_offset and hpa_count, future writes to this region
3444	   MUST NOT fail with NFSERR_NOSPC.

3446	   The INITIALIZE operation may fail for the following reasons (this is
3447	   a partial list):

3449	   NFS4ERR_NOTSUPP  The Hole punch operations are not supported by the
3450	      NFS server receiving this request.

3452	   NFS4ERR_DIR  The current filehandle is of type NF4DIR.

3454	   NFS4ERR_SYMLINK  The current filehandle is of type NF4LNK.

3456	   NFS4ERR_WRONG_TYPE  The current filehandle does not designate an
3457	      ordinary file.

3459	11.8.  Operation 67: IO_ADVISE - Application I/O access pattern hints

3461	   This section introduces a new operation, named IO_ADVISE, which
3462	   allows NFS clients to communicate application I/O access pattern
3463	   hints to the NFS server.  This new operation will allow hints to be
3464	   sent to the server when applications use posix_fadvise, direct I/O,
3465	   or at any other point at which the client finds useful.

3467	11.8.1.  ARGUMENT

3469	   enum IO_ADVISE_type4 {
3470	           IO_ADVISE4_NORMAL                       = 0,
3471	           IO_ADVISE4_SEQUENTIAL                   = 1,
3472	           IO_ADVISE4_SEQUENTIAL_BACKWARDS         = 2,
3473	           IO_ADVISE4_RANDOM                       = 3,
3474	           IO_ADVISE4_WILLNEED                     = 4,
3475	           IO_ADVISE4_WILLNEED_OPPORTUNISTIC       = 5,
3476	           IO_ADVISE4_DONTNEED                     = 6,
3477	           IO_ADVISE4_NOREUSE                      = 7,
3478	           IO_ADVISE4_READ                         = 8,
3479	           IO_ADVISE4_WRITE                        = 9
3480	   };

3482	   struct IO_ADVISE4args {
3483	           /* CURRENT_FH: file */
3484	           stateid4        iar_stateid;
3485	           offset4         iar_offset;
3486	           length4         iar_count;
3487	           bitmap4         iar_hints;
3488	   };

3490	11.8.2.  RESULT

3492	   struct IO_ADVISE4resok {
3493	           bitmap4 ior_hints;
3494	   };

3496	   union IO_ADVISE4res switch (nfsstat4 _status) {
3497	   case NFS4_OK:
3498	           IO_ADVISE4resok resok4;
3499	   default:
3500	           void;
3501	   };

3503	11.8.3.  DESCRIPTION

3505	   The IO_ADVISE operation sends an I/O access pattern hint to the
3506	   server for the owner of stated for a given byte range specified by
3507	   iar_offset and iar_count.  The byte range specified by iar_offset and
3508	   iar_count need not currently exist in the file, but the iar_hints
3509	   will apply to the byte range when it does exist.  If iar_count is 0,
3510	   all data following iar_offset is specified.  The server MAY ignore
3511	   the advice.

3513	   The following are the possible hints:

3515	   IO_ADVISE4_NORMAL  Specifies that the application has no advice to
3516	      give on its behavior with respect to the specified data.  It is
3517	      the default characteristic if no advice is given.

3519	   IO_ADVISE4_SEQUENTIAL  Specifies that the stated holder expects to
3520	      access the specified data sequentially from lower offsets to
3521	      higher offsets.

3523	   IO_ADVISE4_SEQUENTIAL BACKWARDS  Specifies that the stated holder
3524	      expects to access the specified data sequentially from higher
3525	      offsets to lower offsets.

3527	   IO_ADVISE4_RANDOM  Specifies that the stated holder expects to access
3528	      the specified data in a random order.

3530	   IO_ADVISE4_WILLNEED  Specifies that the stated holder expects to
3531	      access the specified data in the near future.

3533	   IO_ADVISE4_WILLNEED_OPPORTUNISTIC  Specifies that the stated holder
3534	      expects to possibly access the data in the near future.  This is a
3535	      speculative hint, and therefore the server should prefetch data or
3536	      indirect blocks only if it can be done at a marginal cost.

3538	   IO_ADVISE_DONTNEED  Specifies that the stated holder expects that it
3539	      will not access the specified data in the near future.

3541	   IO_ADVISE_NOREUSE  Specifies that the stated holder expects to access
3542	      the specified data once and then not reuse it thereafter.

3544	   IO_ADVISE4_READ  Specifies that the stated holder expects to read the
3545	      specified data in the near future.

3547	   IO_ADVISE4_WRITE  Specifies that the stated holder expects to write
3548	      the specified data in the near future.

3550	   The server will return success if the operation is properly formed,
3551	   otherwise the server will return an error.  The server MUST NOT
3552	   return an error if it does not recognize or does not support the
3553	   requested advice.  This is also true even if the client sends
3554	   contradictory hints to the server, e.g., IO_ADVISE4_SEQUENTIAL and
3555	   IO_ADVISE4_RANDOM in a single IO_ADVISE operation.  In this case, the
3556	   server MUST return success and a ior_hints value that indicates the
3557	   hint it intends to optimize.  For contradictory hints, this may mean
3558	   simply returning IO_ADVISE4_NORMAL for example.

3560	   The ior_hints returned by the server is primarily for debugging
3561	   purposes since the server is under no obligation to carry out the
3562	   hints that it describes in the ior_hints result.  In addition, while
3563	   the server may have intended to implement the hints returned in
3564	   ior_hints, as time progresses, the server may need to change its
3565	   handling of a given file due to several reasons including, but not
3566	   limited to, memory pressure, additional IO_ADVISE hints sent by other
3567	   clients, and heuristically detected file access patterns.

3569	   The server MAY return different advice than what the client
3570	   requested.  If it does, then this might be due to one of several
3571	   conditions, including, but not limited to another client advising of
3572	   a different I/O access pattern; a different I/O access pattern from
3573	   another client that that the server has heuristically detected; or
3574	   the server is not able to support the requested I/O access pattern,
3575	   perhaps due to a temporary resource limitation.

3577	   Each issuance of the IO_ADVISE operation overrides all previous
3578	   issuances of IO_ADVISE for a given byte range.  This effectively
3579	   follows a strategy of last hint wins for a given stated and byte
3580	   range.

3582	   Clients should assume that hints included in an IO_ADVISE operation
3583	   will be forgotten once the file is closed.

3585	11.8.4.  IMPLEMENTATION

3587	   The NFS client may choose to issue and IO_ADVISE operation to the
3588	   server in several different instances.

3590	   The most obvious is in direct response to an applications execution
3591	   of posix_fadvise.  In this case, IO_ADVISE4_WRITE and IO_ADVISE4_READ
3592	   may be set based upon the type of file access specified when the file
3593	   was opened.

3595	   Another useful point would be when an application indicates it is
3596	   using direct I/O. Direct I/O may be specified at file open, in which
3597	   case a IO_ADVISE may be included in the same compound as the OPEN
3598	   operation with the IO_ADVISE4_NOREUSE flag set.  Direct I/O may also
3599	   be specified separately, in which case a IO_ADVISE operation can be
3600	   sent to the server separately.  As above, IO_ADVISE4_WRITE and
3601	   IO_ADVISE4_READ may be set based upon the type of file access
3602	   specified when the file was opened.

3604	11.8.5.  pNFS File Layout Data Type Considerations

3606	   The IO_ADVISE considerations for pNFS are very similar to the COMMIT
3607	   considerations for pNFS.  That is, as with COMMIT, some NFS server
3608	   implementations prefer IO_ADVISE be done on the DS, and some prefer
3609	   it be done on the MDS.

3611	   So for the file's layout type, it is proposed that NFSv4.2 include an
3612	   additional hint NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on
3613	   NFSv4.2 or higher.  Any file's layout obtained with NFSv4.1 MUST NOT
3614	   have NFL42_UFLG_IO_ADVISE_THRU_MDS set.  Any file's layout obtained
3615	   with NFSv4.2 MAY have NFL42_UFLG_IO_ADVISE_THRU_MDS set.  If the
3616	   client does not implement IO_ADVISE, then it MUST ignore
3617	   NFL42_UFLG_IO_ADVISE_THRU_MDS.

3619	   If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, then if the client
3620	   implements IO_ADVISE, then if it wants the DS to honor IO_ADVISE, the
3621	   client MUST send the operation to the MDS, and the server will
3622	   communicate the advice back each DS.  If the client sends IO_ADVISE
3623	   to the DS, then the server MAY return NFS4ERR_NOTSUPP.

3625	   If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then this indicates to
3626	   client that if wants to inform the server via IO_ADVISE of the
3627	   client's intended use of the file, then the client SHOULD send an
3628	   IO_ADVISE to each DS.  While the client MAY always send IO_ADVISE to
3629	   the MDS, if the server has not set NFL42_UFLG_IO_ADVISE_THRU_MDS, the
3630	   client should expect that such an IO_ADVISE is futile.  Note that a
3631	   client SHOULD use the same set of arguments on each IO_ADVISE sent to
3632	   a DS for the same open file reference.

3634	   The server is not required to support different advice for different
3635	   DS's with the same open file reference.

3637	11.8.5.1.  Dense and Sparse Packing Considerations

3639	   The IO_ADVISE operation MUST use the iar_offset and byte range as
3640	   dictated by the presence or absence of NFL4_UFLG_DENSE.

3642	   E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the DS
3643	   for iar_offset 0 really means iar_offset 10000 in the logical file,
3644	   then an IO_ADVISE for iar_offset 0 means iar_offset 10000.

3646	   E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the DS
3647	   for iar_offset 0 really means iar_offset 0 in the logical file, then
3648	   an IO_ADVISE for iar_offset 0 means iar_offset 0 in the logical file.

3650	   E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes
3651	   and the stripe count is 10, and the dense DS file is serving
3652	   iar_offset 0.  A READ or WRITE to the DS for iar_offsets 0, 1000,
3653	   2000, and 3000, really mean iar_offsets 10000, 20000, 30000, and
3654	   40000 (implying a stripe count of 10 and a stripe unit of 1000), then
3655	   an IO_ADVISE sent to the same DS with an iar_offset of 500, and a
3656	   iar_count of 3000 means that the IO_ADVISE applies to these byte
3657	   ranges of the dense DS file:

3659	     - 500 to 999
3660	     - 1000 to 1999
3661	     - 2000 to 2999
3662	     - 3000 to 3499

3664	   I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE.

3666	   It also applies to these byte ranges of the logical file:

3668	     - 10500 to 10999 (500 bytes)
3669	     - 20000 to 20999 (1000 bytes)
3670	     - 30000 to 30999 (1000 bytes)
3671	     - 40000 to 40499 (500 bytes)
3672	     (total            3000 bytes)

3674	   E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the
3675	   stripe count is 4, and the sparse DS file is serving iar_offset 0.
3676	   Then a READ or WRITE to the DS for iar_offsets 0, 1000, 2000, and
3677	   3000, really mean iar_offsets 0, 1000, 2000, and 3000 in the logical
3678	   file, keeping in mind that on the DS file,. byte ranges 250 to 999,
3679	   1250 to 1999, 2250 to 2999, and 3250 to 3999 are not accessible.
3680	   Then an IO_ADVISE sent to the same DS with an iar_offset of 500, and
3681	   a iar_count of 3000 means that the IO_ADVISE applies to these byte
3682	   ranges of the logical file and the sparse DS file:

3684	     - 500 to 999 (500 bytes)   - no effect
3685	     - 1000 to 1249 (250 bytes) - effective
3686	     - 1250 to 1999 (750 bytes) - no effect
3687	     - 2000 to 2249 (250 bytes) - effective
3688	     - 2250 to 2999 (750 bytes) - no effect
3689	     - 3000 to 3249 (250 bytes) - effective
3690	     - 3250 to 3499 (250 bytes) - no effect
3691	     (subtotal      2250 bytes) - no effect
3692	     (subtotal       750 bytes) - effective
3693	     (grand total   3000 bytes) - no effect + effective

3695	   If neither of the flags NFL42_UFLG_IO_ADVISE_THRU_MDS and
3696	   NFL4_UFLG_DENSE are set in the layout, then any IO_ADVISE request
3697	   sent to the data server with a byte range that overlaps stripe unit
3698	   that the data server does not serve MUST NOT result in the status
3699	   NFS4ERR_PNFS_IO_HOLE.  Instead, the response SHOULD be successful and
3700	   if the server applies IO_ADVISE hints on any stripe units that
3701	   overlap with the specified range, those hints SHOULD be indicated in
3702	   the response.

3704	11.8.6.  Number of Supported File Segments

3706	   In theory IO_ADVISE allows a client and server to support multiple
3707	   file segments, meaning that different, possibly overlapping, byte
3708	   ranges of the same open file reference will support different hints.
3709	   This is not practical, and in general the server will support just
3710	   one set of hints, and these will apply to the entire file.  However,
3711	   there are some hints that very ephemeral, and are essentially amount
3712	   to one time instructions to the NFS server, which will be forgotten
3713	   momentarily after IO_ADVISE is executed.

3715	   The following hints will always apply to the entire file, regardless
3716	   of the specified byte range:

3718	   o  IO_ADVISE4_NORMAL

3720	   o  IO_ADVISE4_SEQUENTIAL

3722	   o  IO_ADVISE4_SEQUENTIAL_BACKWARDS

3724	   o  IO_ADVISE4_RANDOM

3726	   The following hints will always apply to specified byte range, and
3727	   will treated as one time instructions:

3729	   o  IO_ADVISE4_WILLNEED

3731	   o  IO_ADVISE4_WILLNEED_OPPORTUNISTIC

3733	   o  IO_ADVISE4_DONTNEED

3735	   o  IO_ADVISE4_NOREUSE

3737	   The following hints are modifiers to all other hints, and will apply
3738	   to the entire file and/or to a one time instruction on the specified
3739	   byte range:

3741	   o  IO_ADVISE4_READ

3743	   o  IO_ADVISE4_WRITE

3745	11.8.7.  Possible Additional Hint - IO_ADVISE4_RECENTLY_USED

3747	   IO_ADVISE4_RECENTLY_USED  The client has recently accessed the byte
3748	      range in its own cache.  This informs the server that the data in
3749	      the byte range remains important to the client.  When the server
3750	      reaches resource exhaustion, knowing which data is more important
3751	      allows the server to make better choices about which data to, for
3752	      example purge from a cache, or move to secondary storage.  It also
3753	      informs the server which delegations are more important, since if
3754	      delegations are working correctly, once delegated to a client, a
3755	      server might never receive another I/O request for the file.

3757	   A use case for this hint is that of the NFS client or application
3758	   restart.  In the event of restart, the app's/client's cache will be
3759	   cold and it will need to fill it from the server.  If the server is
3760	   maintaining a list (LRU most likely) of byte ranges tagged with
3761	   IO_ADVISE4_RECENTLY_USED, then the server could have stored the data
3762	   in these ranges into a storage medium that is less expensive than
3763	   DRAM, and faster than random access magnetic or optical media, such
3764	   as flash.  This allows the end to end application to storage system
3765	   to co-operate to meet a service level agreement/objective contracted
3766	   to the end user by the IT provider.

3768	   On the other side, this is effectively a hint regarding multi-level
3769	   caching, and it may be more useful to specify a more formal multi-
3770	   level caching system.  In addition, the action to be taken by the
3771	   server file system with this hint, and hence its usefulness, is
3772	   unclear.  For example, as most clients already cache data that they
3773	   know is important, having this data cached twice may be unnecessary.
3774	   In fact, substantial performance improvements have been demonstrated
3775	   by making caches more exclusive between each other [25], not the
3776	   other way around.  This means that there is a strong argument to be
3777	   made that servers should immediately purge the described cached data
3778	   upon receiving this hint.  Other work showed that even infinite sized
3779	   secondary caches can be largely ineffective [26], but this of course
3780	   is subject to the workload.

3782	11.9.  Changes to Operation 51: LAYOUTRETURN

3784	11.9.1.  Introduction

3786	   In the pNFS description provided in [2], the client is not enabled to
3787	   relay an error code from the DS to the MDS.  In the specification of
3788	   the Objects-Based Layout protocol [8], use is made of the opaque
3789	   lrf_body field of the LAYOUTRETURN argument to do such a relaying of
3790	   error codes.  In this section, we define a new data structure to
3791	   enable the passing of error codes back to the MDS and provide some
3792	   guidelines on what both the client and MDS should expect in such
3793	   circumstances.

3795	   There are two broad classes of errors, transient and persistent.  The
3796	   client SHOULD strive to only use this new mechanism to report
3797	   persistent errors.  It MUST be able to deal with transient issues by
3798	   itself.  Also, while the client might consider an issue to be
3799	   persistent, it MUST be prepared for the MDS to consider such issues
3800	   to be persistent.  A prime example of this is if the MDS fences off a
3801	   client from either a stateid or a filehandle.  The client will get an
3802	   error from the DS and might relay either NFS4ERR_ACCESS or
3803	   NFS4ERR_STALE_STATEID back to the MDS, with the belief that this is a
3804	   hard error.  The MDS on the other hand, is waiting for the client to
3805	   report such an error.  For it, the mission is accomplished in that
3806	   the client has returned a layout that the MDS had most likley
3807	   recalled.

3809	   The existing LAYOUTRETURN operation is extended by introducing a new
3810	   data structure to report errors, layoutreturn_device_error4.  Also,
3811	   layoutreturn_device_error4 is introduced to enable an array of errors
3812	   to be reported.

3814	11.9.2.  ARGUMENT

3816	   The ARGUMENT specification of the LAYOUTRETURN operation in section
3817	   18.44.1 of [2] is augmented by the following XDR code [24]:

3819	   struct layoutreturn_device_error4 {
3820	           deviceid4       lrde_deviceid;
3821	           nfsstat4        lrde_status;
3822	           nfs_opnum4      lrde_opnum;
3823	   };

3825	   struct layoutreturn_error_report4 {
3826	           layoutreturn_device_error4      lrer_errors<>;
3827	   };

3829	11.9.3.  RESULT

3831	   The RESULT of the LAYOUTRETURN operation is unchanged; see section
3832	   18.44.2 of [2].

3834	11.9.4.  DESCRIPTION

3836	   The following text is added to the end of the LAYOUTRETURN operation
3837	   DESCRIPTION in section 18.44.3 of [2].

3839	   When a client used LAYOUTRETURN with a type of LAYOUTRETURN4_FILE,
3840	   then if the lrf_body field is NULL, it indicates to the MDS that the
3841	   client experienced no errors.  If lrf_body is non-NULL, then the
3842	   field references error information which is layout type specific.
3843	   I.e., the Objects-Based Layout protocol can continue to utilize
3844	   lrf_body as specified in [8].  For both Files-Based Layouts, the
3845	   field references a layoutreturn_device_error4, which contains an
3846	   array of layoutreturn_device_error4.

3848	   Each individual layoutreturn_device_error4 descibes a single error
3849	   associated with a DS, which is identfied via lrde_deviceid.  The
3850	   operation which returned the error is identified via lrde_opnum.
3851	   Finally the NFS error value (nfsstat4) encountered is provided via
3852	   lrde_status and may consist of the following error codes:

3854	   NFS4_OKAY:  No issues were found for this device.

3856	   NFS4ERR_NXIO:  The client was unable to establish any communication
3857	      with the DS.

3859	   NFS4ERR_*:  The client was able to establish communication with the
3860	      DS and is returning one of the allowed error codes for the
3861	      operation denoted by lrde_opnum.

3863	11.9.5.  IMPLEMENTATION

3865	   The following text is added to the end of the LAYOUTRETURN operation
3866	   IMPLEMENTATION in section 18.4.4 of [2].

3868	   A client that expects to use pNFS for a mounted filesystem SHOULD
3869	   check for pNFS support at mount time.  This check SHOULD be performed
3870	   by sending a GETDEVICELIST operation, followed by layout-type-
3871	   specific checks for accessibility of each storage device returned by
3872	   GETDEVICELIST.  If the NFS server does not support pNFS, the
3873	   GETDEVICELIST operation will be rejected with an NFS4ERR_NOTSUPP
3874	   error; in this situation it is up to the client to determine whether
3875	   it is acceptable to proceed with NFS-only access.

3877	   Clients are expected to tolerate transient storage device errors, and
3878	   hence clients SHOULD NOT use the LAYOUTRETURN error handling for
3879	   device access problems that may be transient.  The methods by which a
3880	   client decides whether an access problem is transient vs. persistent
3881	   are implementation-specific, but may include retrying I/Os to a data
3882	   server under appropriate conditions.

3884	   When an I/O fails to a storage device, the client SHOULD retry the
3885	   failed I/O via the MDS.  In this situation, before retrying the I/O,
3886	   the client SHOULD return the layout, or the affected portion thereof,
3887	   and SHOULD indicate which storage device or devices was problematic.
3888	   If the client does not do this, the MDS may issue a layout recall
3889	   callback in order to perform the retried I/O.

3891	   The client needs to be cognizant that since this error handling is
3892	   optional in the MDS, the MDS may silently ignore this functionality.
3893	   Also, as the MDS may consider some issues the client reports to be
3894	   expected (see Section 11.9.1), the client might find it difficult to
3895	   detect a MDS which has not implemented error handling via
3896	   LAYOUTRETURN.

3898	   If an MDS is aware that a storage device is proving problematic to a
3899	   client, the MDS SHOULD NOT include that storage device in any pNFS
3900	   layouts sent to that client.  If the MDS is aware that a storage
3901	   device is affecting many clients, then the MDS SHOULD NOT include
3902	   that storage device in any pNFS layouts sent out.  Clients must still
3903	   be aware that the MDS might not have any choice in using the storage
3904	   device, i.e., there might only be one possible layout for the system.

3906	   Another interesting complication is that for existing files, the MDS
3907	   might have no choice in which storage devices to hand out to clients.
3908	   The MDS might try to restripe a file across a different storage
3909	   device, but clients need to be aware that not all implementations
3910	   have restriping support.

3912	   An MDS SHOULD react to a client return of layouts with errors by not
3913	   using the problematic storage devices in layouts for that client, but
3914	   the MDS is not required to indefinitely retain per-client storage
3915	   device error information.  An MDS is also not required to
3916	   automatically reinstate use of a previously problematic storage
3917	   device; administrative intervention may be required instead.

3919	   A client MAY perform I/O via the MDS even when the client holds a
3920	   layout that covers the I/O; servers MUST support this client
3921	   behavior, and MAY recall layouts as needed to complete I/Os.

3923	11.10.  Operation 65: READ_PLUS

3925	   If the client sends a READ operation, it is explicitly stating that
3926	   it is not supporting sparse files.  So if a READ occurs on a sparse
3927	   ADB, then the server must expand such ADBs to be raw bytes.  If a
3928	   READ occurs in the middle of an ADB, the server can only send back
3929	   bytes starting from that offset.

3931	   Such an operation is inefficient for transfer of sparse sections of
3932	   the file.  As such, READ is marked as OBSOLETE in NFSv4.2.  Instead,
3933	   a client should issue READ_PLUS.  Note that as the client has no a
3934	   priori knowledge of whether an ADB is present or not, it should
3935	   always use READ_PLUS.

3937	11.10.1.  ARGUMENT

3939	   struct READ_PLUS4args {
3940	           /* CURRENT_FH: file */
3941	           stateid4        rpa_stateid;
3942	           offset4         rpa_offset;
3943	           count4          rpa_count;
3944	   };

3946	11.10.2.  RESULT

3948	   union read_plus_content switch (data_content4 content) {
3949	   case NFS4_CONTENT_DATA:
3950	           opaque          rpc_data<>;
3951	   case NFS4_CONTENT_APP_BLOCK:
3952	           app_data_block4 rpc_block;
3953	   case NFS4_CONTENT_HOLE:
3954	           data_info4      rpc_hole;
3955	   default:
3956	           void;
3957	   };

3959	   /*
3960	    * Allow a return of an array of contents.
3961	    */
3962	   struct read_plus_res4 {
3963	           bool                    rpr_eof;
3964	           read_plus_content       rpr_contents<>;
3965	   };

3967	   union READ_PLUS4res switch (nfsstat4 status) {
3968	   case NFS4_OK:
3969	           read_plus_res4  resok4;
3970	   default:
3971	           void;
3972	   };

3974	11.10.3.  DESCRIPTION

3976	   Over the given range, READ_PLUS will return all data and ADBs found
3977	   as an array of read_plus_content.  It is possible to have consecutive
3978	   ADBs in the array as either different definitions of ADBs are present
3979	   or as the guard pattern changes.

3981	   Edge cases exist for ABDs which either begin before the rpa_offset
3982	   requested by the READ_PLUS or end after the rpa_count requested -
3983	   both of which may occur as not all applications which access the file
3984	   are aware of the main application imposing a format on the file
3985	   contents, i.e., tar, dd, cp, etc.  READ_PLUS MUST retrieve whole
3986	   ADBs, but it need not retrieve an entire sequences of ADBs.

3988	   The server MUST return a whole ADB because if it does not, it must
3989	   expand that partial ADB before it sends it to the client.  E.g., if
3990	   an ADB had a block size of 64k and the READ_PLUS was for 128k
3991	   starting at an offset of 32k inside the ADB, then the first 32k would
3992	   be converted to data.

3994	11.11.  Operation 66: SEEK

3996	   XXX

3998	11.11.1.  ARGUMENT

4000	   struct SEEK4args {
4001	           /* CURRENT_FH: file */
4002	           stateid4        sa_stateid;
4003	           offset4         sa_offset;
4004	           count4          sa_count;
4005	   };

4007	11.11.2.  RESULT

4009	   union seek_content switch (data_content4 content) {
4010	   case NFS4_CONTENT_DATA:
4011	           data_info4      sc_data;
4012	   case NFS4_CONTENT_APP_BLOCK:
4013	           app_data_block4 sc_block;
4014	   case NFS4_CONTENT_HOLE:
4015	           data_info4      sc_hole;
4016	   default:
4017	           void;
4018	   };

4020	   /*
4021	    * Allow a return of an array of contents.
4022	    */
4023	   struct seek_res4 {
4024	           bool                    sr_eof;
4025	           seek_content            sr_contents;
4026	   };

4028	   union SEEK4res switch (nfsstat4 status) {
4029	   case NFS4_OK:
4030	           seek_res4       resok4;
4031	   default:
4032	           void;
4033	   };

4035	11.11.3.  DESCRIPTION

4037	   Over the given range, SEEK will return a range for all data, holes,
4038	   and ADBs found as an array of seek_content.  It does not return
4039	   actual data.

4041	12.  NFSv4.2 Callback Operations

4043	12.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that the File's
4044	       Attributes Changed

4046	12.1.1.  ARGUMENTS

4048	   struct CB_ATTR_CHANGED4args {
4049	           nfs_fh4         acca_fh;
4050	           bitmap4         acca_critical;
4051	           bitmap4         acca_info;

4053	   };

4055	12.1.2.  RESULTS

4057	   struct CB_ATTR_CHANGED4res {
4058	           nfsstat4        accr_status;
4059	   };

4061	12.1.3.  DESCRIPTION

4063	   The CB_ATTR_CHANGED callback operation is used by the server to
4064	   indicate to the client that the file's attributes have been modified
4065	   on the server.  The server does not convey how the attributes have
4066	   changed, just that they have been modified.  The server can inform
4067	   the client about both critical and informational attribute changes in
4068	   the bitmask arguments.  The client SHOULD query the server about all
4069	   attributes set in acca_critical.  For all changes reflected in
4070	   acca_info, the client can decide whether or not it wants to poll the
4071	   server.

4073	   The CB_ATTR_CHANGED callback operation with the FATTR4_SEC_LABEL set
4074	   in acca_critical is the method used by the server to indicate that
4075	   the MAC label for the file referenced by acca_fh has changed.  In
4076	   many ways, the server does not care about the result returned by the
4077	   client.

4079	12.2.  Operation 15: CB_COPY - Report results of a server-side copy

4081	12.2.1.  ARGUMENT

4083	   union copy_info4 switch (nfsstat4 cca_status) {
4084	           case NFS4_OK:
4085	                   void;
4086	           default:
4087	                   length4         cca_bytes_copied;
4088	   };

4090	   struct CB_COPY4args {
4091	           nfs_fh4         cca_fh;
4092	           stateid4        cca_stateid;
4093	           copy_info4      cca_copy_info;
4094	   };

4096	12.2.2.  RESULT

4098	   struct CB_COPY4res {
4099	           nfsstat4        ccr_status;
4100	   };

4102	12.2.3.  DESCRIPTION

4104	   CB_COPY is used for both intra- and inter-server asynchronous copies.
4105	   The CB_COPY callback informs the client of the result of an
4106	   asynchronous server-side copy.  This operation is sent by the
4107	   destination server to the client in a CB_COMPOUND request.  The copy
4108	   is identified by the filehandle and stateid arguments.  The result is
4109	   indicated by the status field.  If the copy failed, cca_bytes_copied
4110	   contains the number of bytes copied before the failure occurred.  The
4111	   cca_bytes_copied value indicates the number of bytes copied but not
4112	   which specific bytes have been copied.

4114	   In the absence of an established backchannel, the server cannot
4115	   signal the completion of the COPY via a CB_COPY callback.  The loss
4116	   of a callback channel would be indicated by the server setting the
4117	   SEQ4_STATUS_CB_PATH_DOWN flag in the sr_status_flags field of the
4118	   SEQUENCE operation.  The client must re-establish the callback
4119	   channel to receive the status of the COPY operation.  Prolonged loss
4120	   of the callback channel could result in the server dropping the COPY
4121	   operation state and invalidating the copy stateid.

4123	   If the client supports the COPY operation, the client is REQUIRED to
4124	   support the CB_COPY operation.

4126	   The CB_COPY operation may fail for the following reasons (this is a
4127	   partial list):

4129	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
4130	      NFS client receiving this request.

4132	13.  IANA Considerations

4134	   This section uses terms that are defined in [27].

4136	14.  References

4138	14.1.  Normative References

4140	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
4141	         Levels", March 1997.

4143	   [2]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
4144	         (NFS) Version 4 Minor Version 1 Protocol", RFC 5661,
4145	         January 2010.

4147	   [3]   Haynes, T., "Network File System (NFS) Version 4 Minor Version
4148	         2 External Data Representation Standard (XDR) Description",
4149	         March 2011.

4151	   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
4152	         Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
4153	         January 2005.

4155	   [5]   Haynes, T. and N. Williams, "Remote Procedure Call (RPC)
4156	         Security Version 3", draft-williams-rpcsecgssv3 (work in
4157	         progress), 2011.

4159	   [6]   The Open Group, "Section 'posix_fadvise()' of System Interfaces
4160	         of The Open Group Base Specifications Issue 6, IEEE Std 1003.1,
4161	         2004 Edition", 2004.

4163	   [7]   Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
4164	         Specification", RFC 2203, September 1997.

4166	   [8]   Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel
4167	         NFS (pNFS) Operations", RFC 5664, January 2010.

4169	   [9]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
4170	         (NFS) Version 4 Minor Version 1 External Data Representation
4171	         Standard (XDR) Description", RFC 5662, January 2010.

4173	   [10]  Black, D., Glasgow, J., and S. Fridella, "Parallel NFS (pNFS)
4174	         Block/Volume Layout", RFC 5663, January 2010.

4176	14.2.  Informative References

4178	   [11]  Haynes, T. and D. Noveck, "Network File System (NFS) version 4
4179	         Protocol", draft-ietf-nfsv4-rfc3530bis-09 (Work In Progress),
4180	         March 2011.

4182	   [12]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
4183	         "NSDB Protocol for Federated Filesystems",
4184	         draft-ietf-nfsv4-federated-fs-protocol (Work In Progress),
4185	         2010.

4187	   [13]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
4188	         "Administration Protocol for Federated Filesystems",
4189	         draft-ietf-nfsv4-federated-fs-admin (Work In Progress), 2010.

4191	   [14]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
4192	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
4193	         HTTP/1.1", RFC 2616, June 1999.

4195	   [15]  Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9,
4196	         RFC 959, October 1985.

4198	   [16]  Simpson, W., "PPP Challenge Handshake Authentication Protocol
4199	         (CHAP)", RFC 1994, August 1996.

4201	   [17]  VanDeBogart, S., Frost, C., and E. Kohler, "Reducing Seek
4202	         Overhead with Application-Directed Prefetching", Proceedings of
4203	         USENIX Annual Technical Conference , June 2009.

4205	   [18]  Strohm, R., "Chapter 2, Data Blocks, Extents, and Segments, of
4206	         Oracle Database Concepts 11g Release 1 (11.1)", January 2011.

4208	   [19]  Ashdown, L., "Chapter 15, Validating Database Files and
4209	         Backups, of Oracle Database Backup and Recovery User's Guide
4210	         11g Release 1 (11.1)", August 2008.

4212	   [20]  McDougall, R. and J. Mauro, "Section 11.4.3, Detecting Memory
4213	         Corruption of Solaris Internals", 2007.

4215	   [21]  Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-
4216	         Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data
4217	         Corruption in the Storage Stack", Proceedings of the 6th USENIX
4218	         Symposium on File and Storage Technologies (FAST '08) , 2008.

4220	   [22]  "Section 46.6. Multi-Level Security (MLS) of Deployment Guide:
4221	         Deployment, configuration and administration of Red Hat
4222	         Enterprise Linux 5, Edition 6", 2011.

4224	   [23]  Quigley, D. and J. Lu, "Registry Specification for MAC Security
4225	         Label Formats", draft-quigley-label-format-registry (work in
4226	         progress), 2011.

4228	   [24]  Eisler, M., "XDR: External Data Representation Standard",
4229	         RFC 4506, May 2006.

4231	   [25]  Wong, T. and J. Wilkes, "My cache or yours? Making storage more
4232	         exclusive", Proceedings of the USENIX Annual Technical
4233	         Conference , 2002.

4235	   [26]  Muntz, D. and P. Honeyman, "Multi-level Caching in Distributed
4236	         File Systems", Proceedings of USENIX Annual Technical
4237	         Conference , 1992.

4239	   [27]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
4240	         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

4242	   [28]  Nowicki, B., "NFS: Network File System Protocol specification",
4243	         RFC 1094, March 1989.

4245	   [29]  Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3
4246	         Protocol Specification", RFC 1813, June 1995.

4248	   [30]  Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
4249	         RFC 1833, August 1995.

4251	   [31]  Eisler, M., "NFS Version 2 and Version 3 Security Issues and
4252	         the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5",
4253	         RFC 2623, June 1999.

4255	   [32]  Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997.

4257	   [33]  Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
4258	         June 1999.

4260	   [34]  Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On-
4261	         line Database", RFC 3232, January 2002.

4263	   [35]  Linn, J., "The Kerberos Version 5 GSS-API Mechanism", RFC 1964,
4264	         June 1996.

4266	   [36]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame,
4267	         C., Eisler, M., and D. Noveck, "Network File System (NFS)
4268	         version 4 Protocol", RFC 3530, April 2003.

4270	Appendix A.  Acknowledgments

4272	   For the pNFS Access Permissions Check, the original draft was by
4273	   Sorin Faibish, David Black, Mike Eisler, and Jason Glasgow.  The work
4274	   was influenced by discussions with Benny Halevy and Bruce Fields.  A
4275	   review was done by Tom Haynes.

4277	   For the Sharing change attribute implementation details with NFSv4
4278	   clients, the original draft was by Trond Myklebust.

4280	   For the NFS Server-side Copy, the original draft was by James
4281	   Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul
4282	   Iyer.  Tom Talpey co-authored an unpublished version of that
4283	   document.  It was also was reviewed by a number of individuals:
4284	   Pranoop Erasani, Tom Haynes, Arthur Lent, Trond Myklebust, Dave
4285	   Noveck, Theresa Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani,
4286	   and Nico Williams.

4288	   For the NFS space reservation operations, the original draft was by
4289	   Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer.

4291	   For the sparse file support, the original draft was by Dean
4292	   Hildebrand and Marc Eshel.  Valuable input and advice was received
4293	   from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and
4294	   Richard Scheffenegger.

4296	   For the Application IO Hints, the original draft was by Dean
4297	   Hildebrand, Mike Eisler, Trond Myklebust, and Sam Falkner.  Some
4298	   early reviwers included Benny Halevy and Pranoop Erasani.

4300	   For Labeled NFS, the original draft was by David Quigley, James
4301	   Morris, Jarret Lu, and Tom Haynes.  Peter Staubach, Trond Myklebust,
4302	   Sorrin Faibish, Nico Williams, and David Black also contributed in
4303	   the final push to get this accepted.

4305	Appendix B.  RFC Editor Notes

4307	   [RFC Editor: please remove this section prior to publishing this
4308	   document as an RFC]

4310	   [RFC Editor: prior to publishing this document as an RFC, please
4311	   replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
4312	   RFC number of this document]

4314	Author's Address

4316	   Thomas Haynes
4317	   NetApp
4318	   9110 E 66th St
4319	   Tulsa, OK  74133
4320	   USA

4322	   Phone: +1 918 307 1415
4323	   Email: thomas@netapp.com
4324	   URI:   http://www.tulsalabs.com