idnits 2.17.1 

draft-ietf-nfsv4-minorversion2-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  == There are 5 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     When a data server chooses to return a READ_HOLE result, it has the
     option of returning hole information for the data stored on that data
     server (as defined by the data layout), but it MUST not return a
     nfs_readplusreshole structure with a byte range that includes data
     managed by another data server.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Furthermore, each DS MUST not report to a client either a sparse
     ADB or data which belongs to another DS.  One implication of this
     requirement is that the app_data_block4's adb_block_size MUST be either
     be the stripe width or the stripe width must be an even multiple of it.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The second change is to provide a method for the server to notify
     the client that the attribute changed on an open file on the server.  If
     the file is closed, then during the open attempt, the client will gather
     the new attribute value.  The server MUST not communicate the new value
     of the attribute, the client MUST query it.  This requirement stems from
     the need for the client to provide sufficient access rights to the
     attribute.

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 06, 2011) is 4614 days in the past.  Is
     this intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 850, but not defined

  == Unused Reference: '8' is defined on line 3643, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 3647, but no explicit reference
     was found in the text

  == Unused Reference: '24' is defined on line 3704, but no explicit
     reference was found in the text

  == Unused Reference: '25' is defined on line 3707, but no explicit
     reference was found in the text

  == Unused Reference: '26' is defined on line 3710, but no explicit
     reference was found in the text

  == Unused Reference: '27' is defined on line 3713, but no explicit
     reference was found in the text

  == Unused Reference: '28' is defined on line 3717, but no explicit
     reference was found in the text

  == Unused Reference: '29' is defined on line 3719, but no explicit
     reference was found in the text

  == Unused Reference: '30' is defined on line 3722, but no explicit
     reference was found in the text

  == Unused Reference: '31' is defined on line 3725, but no explicit
     reference was found in the text

  == Unused Reference: '32' is defined on line 3728, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 5661 (ref. '2') (Obsoleted by RFC 8881)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  == Outdated reference: A later version (-35) exists of
     draft-ietf-nfsv4-rfc3530bis-09

  -- Obsolete informational reference (is this intentional?): RFC 2616 (ref.
     '13') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC
     7235)

  -- Obsolete informational reference (is this intentional?): RFC 5226 (ref.
     '23') (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 3530 (ref.
     '32') (Obsoleted by RFC 7530)


     Summary: 1 error (**), 0 flaws (~~), 20 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                          T. Haynes
3	Internet-Draft                                                    Editor
4	Intended status: Standards Track                      September 06, 2011
5	Expires: March 9, 2012

7	                     NFS Version 4 Minor Version 2
8	                 draft-ietf-nfsv4-minorversion2-05.txt

10	Abstract

12	   This Internet-Draft describes NFS version 4 minor version two,
13	   focusing mainly on the protocol extensions made from NFS version 4
14	   minor version 0 and NFS version 4 minor version 1.  Major extensions
15	   introduced in NFS version 4 minor version two include: Server-side
16	   Copy, Space Reservations, and Support for Sparse Files.

18	Requirements Language

20	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
21	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
22	   document are to be interpreted as described in RFC 2119 [1].

24	Status of this Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on March 9, 2012.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	   This document may contain material from IETF Documents or IETF
57	   Contributions published or made publicly available before November
58	   10, 2008.  The person(s) controlling the copyright in some of this
59	   material may not have granted the IETF Trust the right to allow
60	   modifications of such material outside the IETF Standards Process.
61	   Without obtaining an adequate license from the person(s) controlling
62	   the copyright in such materials, this document may not be modified
63	   outside the IETF Standards Process, and derivative works of it may
64	   not be created outside the IETF Standards Process, except to format
65	   it for publication as an RFC or to translate it into languages other
66	   than English.

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  6
71	     1.1.  The NFS Version 4 Minor Version 2 Protocol . . . . . . . .  6
72	     1.2.  Scope of This Document . . . . . . . . . . . . . . . . . .  6
73	     1.3.  NFSv4.2 Goals  . . . . . . . . . . . . . . . . . . . . . .  6
74	     1.4.  Overview of NFSv4.2 Features . . . . . . . . . . . . . . .  6
75	     1.5.  Differences from NFSv4.1 . . . . . . . . . . . . . . . . .  6
76	   2.  NFS Server-side Copy . . . . . . . . . . . . . . . . . . . . .  6
77	     2.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . .  7
78	     2.2.  Protocol Overview  . . . . . . . . . . . . . . . . . . . .  7
79	       2.2.1.  Intra-Server Copy  . . . . . . . . . . . . . . . . . .  9
80	       2.2.2.  Inter-Server Copy  . . . . . . . . . . . . . . . . . . 10
81	       2.2.3.  Server-to-Server Copy Protocol . . . . . . . . . . . . 13
82	     2.3.  Operations . . . . . . . . . . . . . . . . . . . . . . . . 15
83	       2.3.1.  netloc4 - Network Locations  . . . . . . . . . . . . . 15
84	       2.3.2.  Copy Offload Stateids  . . . . . . . . . . . . . . . . 16
85	     2.4.  Security Considerations  . . . . . . . . . . . . . . . . . 16
86	       2.4.1.  Inter-Server Copy Security . . . . . . . . . . . . . . 16
87	   3.  Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . . 24
88	     3.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 24
89	     3.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . 25
90	     3.3.  Overview of Sparse Files and NFSv4 . . . . . . . . . . . . 25
91	     3.4.  Operation 65: READ_PLUS  . . . . . . . . . . . . . . . . . 26
92	       3.4.1.  ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 26
93	       3.4.2.  RESULT . . . . . . . . . . . . . . . . . . . . . . . . 27
94	       3.4.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 27
95	       3.4.4.  IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . 29
96	       3.4.5.  READ_PLUS with Sparse Files Example  . . . . . . . . . 30
97	     3.5.  Related Work . . . . . . . . . . . . . . . . . . . . . . . 31
98	     3.6.  Other Proposed Designs . . . . . . . . . . . . . . . . . . 31
99	       3.6.1.  Multi-Data Server Hole Information . . . . . . . . . . 31
100	       3.6.2.  Data Result Array  . . . . . . . . . . . . . . . . . . 32
101	       3.6.3.  User-Defined Sparse Mask . . . . . . . . . . . . . . . 32
102	       3.6.4.  Allocated flag . . . . . . . . . . . . . . . . . . . . 32
103	       3.6.5.  Dense and Sparse pNFS File Layouts . . . . . . . . . . 33
104	   4.  Space Reservation  . . . . . . . . . . . . . . . . . . . . . . 33
105	     4.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 33
106	     4.2.  Operations and attributes  . . . . . . . . . . . . . . . . 35
107	     4.3.  Attribute 77: space_reserved . . . . . . . . . . . . . . . 35
108	     4.4.  Attribute 78: space_freed  . . . . . . . . . . . . . . . . 36
109	     4.5.  Attribute 79: max_hole_punch . . . . . . . . . . . . . . . 36
110	   5.  Application Data Block Support . . . . . . . . . . . . . . . . 36
111	     5.1.  Generic Framework  . . . . . . . . . . . . . . . . . . . . 37
112	       5.1.1.  Data Block Representation  . . . . . . . . . . . . . . 38
113	       5.1.2.  Data Content . . . . . . . . . . . . . . . . . . . . . 38
114	     5.2.  pNFS Considerations  . . . . . . . . . . . . . . . . . . . 38
115	     5.3.  An Example of Detecting Corruption . . . . . . . . . . . . 39
116	     5.4.  Example of READ_PLUS . . . . . . . . . . . . . . . . . . . 40
117	     5.5.  Zero Filled Holes  . . . . . . . . . . . . . . . . . . . . 41
118	   6.  Labeled NFS  . . . . . . . . . . . . . . . . . . . . . . . . . 41
119	     6.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 41
120	     6.2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . 42
121	     6.3.  MAC Security Attribute . . . . . . . . . . . . . . . . . . 43
122	       6.3.1.  Interpreting FATTR4_SEC_LABEL  . . . . . . . . . . . . 44
123	       6.3.2.  Delegations  . . . . . . . . . . . . . . . . . . . . . 44
124	       6.3.3.  Permission Checking  . . . . . . . . . . . . . . . . . 45
125	       6.3.4.  Object Creation  . . . . . . . . . . . . . . . . . . . 45
126	       6.3.5.  Existing Objects . . . . . . . . . . . . . . . . . . . 45
127	       6.3.6.  Label Changes  . . . . . . . . . . . . . . . . . . . . 45
128	     6.4.  pNFS Considerations  . . . . . . . . . . . . . . . . . . . 46
129	     6.5.  Discovery of Server LNFS Support . . . . . . . . . . . . . 47
130	     6.6.  MAC Security NFS Modes of Operation  . . . . . . . . . . . 47
131	       6.6.1.  Full Mode  . . . . . . . . . . . . . . . . . . . . . . 47
132	       6.6.2.  Smart Client Mode  . . . . . . . . . . . . . . . . . . 49
133	       6.6.3.  Smart Server Mode  . . . . . . . . . . . . . . . . . . 49
134	     6.7.  Security Considerations  . . . . . . . . . . . . . . . . . 50
135	   7.  Sharing change attribute implementation details with NFSv4
136	       clients  . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
137	     7.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 51
138	     7.2.  Definition of the 'change_attr_type' per-file system
139	           attribute  . . . . . . . . . . . . . . . . . . . . . . . . 51
140	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 53
141	   9.  Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . . 53
142	   10. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . . 56
143	     10.1. Operation 59: COPY - Initiate a server-side copy . . . . . 56
144	     10.2. Operation 60: COPY_ABORT - Cancel a server-side copy . . . 64
145	     10.3. Operation 61: COPY_NOTIFY - Notify a source server of
146	           a future copy  . . . . . . . . . . . . . . . . . . . . . . 65
147	     10.4. Operation 62: COPY_REVOKE - Revoke a destination
148	           server's copy privileges . . . . . . . . . . . . . . . . . 68
149	     10.5. Operation 63: COPY_STATUS - Poll for status of a
150	           server-side copy . . . . . . . . . . . . . . . . . . . . . 69
151	     10.6. Modification to Operation 42: EXCHANGE_ID -
152	           Instantiate Client ID  . . . . . . . . . . . . . . . . . . 70
153	     10.7. Operation 64: INITIALIZE . . . . . . . . . . . . . . . . . 71
154	     10.8. Changes to Operation 51: LAYOUTRETURN  . . . . . . . . . . 74
155	       10.8.1. Introduction . . . . . . . . . . . . . . . . . . . . . 75
156	       10.8.2. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 75
157	       10.8.3. RESULT . . . . . . . . . . . . . . . . . . . . . . . . 76
158	       10.8.4. DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 76
159	       10.8.5. IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . 76
160	     10.9. Operation 65: READ_PLUS  . . . . . . . . . . . . . . . . . 78
161	   11. NFSv4.2 Callback Operations  . . . . . . . . . . . . . . . . . 80
162	     11.1. Procedure 16: CB_ATTR_CHANGED - Notify Client that the
163	           File's Attributes Changed  . . . . . . . . . . . . . . . . 80

165	     11.2. Operation 15: CB_COPY - Report results of a
166	           server-side copy . . . . . . . . . . . . . . . . . . . . . 80
167	   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 82
168	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 82
169	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 82
170	     13.2. Informative References . . . . . . . . . . . . . . . . . . 83
171	   Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 84
172	   Appendix B.  RFC Editor Notes  . . . . . . . . . . . . . . . . . . 85
173	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 85

175	1.  Introduction

177	1.1.  The NFS Version 4 Minor Version 2 Protocol

179	   The NFS version 4 minor version 2 (NFSv4.2) protocol is the third
180	   minor version of the NFS version 4 (NFSv4) protocol.  The first minor
181	   version, NFSv4.0, is described in [10] and the second minor version,
182	   NFSv4.1, is described in [2].  It follows the guidelines for minor
183	   versioning that are listed in Section 11 of [10].

185	   As a minor version, NFSv4.2 is consistent with the overall goals for
186	   NFSv4, but extends the protocol so as to better meet those goals,
187	   based on experiences with NFSv4.1.  In addition, NFSv4.2 has adopted
188	   some additional goals, which motivate some of the major extensions in
189	   NFSv4.2.

191	1.2.  Scope of This Document

193	   This document describes the NFSv4.2 protocol.  With respect to
194	   NFSv4.0 and NFSv4.1, this document does not:

196	   o  describe the NFSv4.0 or NFSv4.1 protocols, except where needed to
197	      contrast with NFSv4.2.

199	   o  modify the specification of the NFSv4.0 or NFSv4.1 protocols.

201	   o  clarify the NFSv4.0 or NFSv4.1 protocols.  I.e., any
202	      clarifications made here apply to NFSv4.2 and neither of the prior
203	      protocols.

205	   The full XDR for NFSv4.2 is presented in [3].

207	1.3.  NFSv4.2 Goals

209	   [[Comment.1: This needs fleshing out! --TH]]

211	1.4.  Overview of NFSv4.2 Features

213	   [[Comment.2: This needs fleshing out! --TH]]

215	1.5.  Differences from NFSv4.1

217	   [[Comment.3: This needs fleshing out! --TH]]

219	2.  NFS Server-side Copy
220	2.1.  Introduction

222	   This section describes a server-side copy feature for the NFS
223	   protocol.

225	   The server-side copy feature provides a mechanism for the NFS client
226	   to perform a file copy on the server without the data being
227	   transmitted back and forth over the network.

229	   Without this feature, an NFS client copies data from one location to
230	   another by reading the data from the server over the network, and
231	   then writing the data back over the network to the server.  Using
232	   this server-side copy operation, the client is able to instruct the
233	   server to copy the data locally without the data being sent back and
234	   forth over the network unnecessarily.

236	   In general, this feature is useful whenever data is copied from one
237	   location to another on the server.  It is particularly useful when
238	   copying the contents of a file from a backup.  Backup-versions of a
239	   file are copied for a number of reasons, including restoring and
240	   cloning data.

242	   If the source object and destination object are on different file
243	   servers, the file servers will communicate with one another to
244	   perform the copy operation.  The server-to-server protocol by which
245	   this is accomplished is not defined in this document.

247	2.2.  Protocol Overview

249	   The server-side copy offload operations support both intra-server and
250	   inter-server file copies.  An intra-server copy is a copy in which
251	   the source file and destination file reside on the same server.  In
252	   an inter-server copy, the source file and destination file are on
253	   different servers.  In both cases, the copy may be performed
254	   synchronously or asynchronously.

256	   Throughout the rest of this document, we refer to the NFS server
257	   containing the source file as the "source server" and the NFS server
258	   to which the file is transferred as the "destination server".  In the
259	   case of an intra-server copy, the source server and destination
260	   server are the same server.  Therefore in the context of an intra-
261	   server copy, the terms source server and destination server refer to
262	   the single server performing the copy.

264	   The operations described below are designed to copy files.  Other
265	   file system objects can be copied by building on these operations or
266	   using other techniques.  For example if the user wishes to copy a
267	   directory, the client can synthesize a directory copy by first
268	   creating the destination directory and then copying the source
269	   directory's files to the new destination directory.  If the user
270	   wishes to copy a namespace junction [11] [12], the client can use the
271	   ONC RPC Federated Filesystem protocol [12] to perform the copy.
272	   Specifically the client can determine the source junction's
273	   attributes using the FEDFS_LOOKUP_FSN procedure and create a
274	   duplicate junction using the FEDFS_CREATE_JUNCTION procedure.

276	   For the inter-server copy protocol, the operations are defined to be
277	   compatible with a server-to-server copy protocol in which the
278	   destination server reads the file data from the source server.  This
279	   model in which the file data is pulled from the source by the
280	   destination has a number of advantages over a model in which the
281	   source pushes the file data to the destination.  The advantages of
282	   the pull model include:

284	   o  The pull model only requires a remote server (i.e., the
285	      destination server) to be granted read access.  A push model
286	      requires a remote server (i.e., the source server) to be granted
287	      write access, which is more privileged.

289	   o  The pull model allows the destination server to stop reading if it
290	      has run out of space.  In a push model, the destination server
291	      must flow control the source server in this situation.

293	   o  The pull model allows the destination server to easily flow
294	      control the data stream by adjusting the size of its read
295	      operations.  In a push model, the destination server does not have
296	      this ability.  The source server in a push model is capable of
297	      writing chunks larger than the destination server has requested in
298	      attributes and session parameters.  In theory, the destination
299	      server could perform a "short" write in this situation, but this
300	      approach is known to behave poorly in practice.

302	   The following operations are provided to support server-side copy:

304	   COPY_NOTIFY:  For inter-server copies, the client sends this
305	      operation to the source server to notify it of a future file copy
306	      from a given destination server for the given user.

308	   COPY_REVOKE:  Also for inter-server copies, the client sends this
309	      operation to the source server to revoke permission to copy a file
310	      for the given user.

312	   COPY:  Used by the client to request a file copy.

314	   COPY_ABORT:  Used by the client to abort an asynchronous file copy.

316	   COPY_STATUS:  Used by the client to poll the status of an
317	      asynchronous file copy.

319	   CB_COPY:  Used by the destination server to report the results of an
320	      asynchronous file copy to the client.

322	   These operations are described in detail in Section 2.3.  This
323	   section provides an overview of how these operations are used to
324	   perform server-side copies.

326	2.2.1.  Intra-Server Copy

328	   To copy a file on a single server, the client uses a COPY operation.
329	   The server may respond to the copy operation with the final results
330	   of the copy or it may perform the copy asynchronously and deliver the
331	   results using a CB_COPY operation callback.  If the copy is performed
332	   asynchronously, the client may poll the status of the copy using
333	   COPY_STATUS or cancel the copy using COPY_ABORT.

335	   A synchronous intra-server copy is shown in Figure 1.  In this
336	   example, the NFS server chooses to perform the copy synchronously.
337	   The copy operation is completed, either successfully or
338	   unsuccessfully, before the server replies to the client's request.
339	   The server's reply contains the final result of the operation.

341	     Client                                  Server
342	        +                                      +
343	        |                                      |
344	        |--- COPY ---------------------------->| Client requests
345	        |<------------------------------------/| a file copy
346	        |                                      |
347	        |                                      |

349	                Figure 1: A synchronous intra-server copy.

351	   An asynchronous intra-server copy is shown in Figure 2.  In this
352	   example, the NFS server performs the copy asynchronously.  The
353	   server's reply to the copy request indicates that the copy operation
354	   was initiated and the final result will be delivered at a later time.
355	   The server's reply also contains a copy stateid.  The client may use
356	   this copy stateid to poll for status information (as shown) or to
357	   cancel the copy using a COPY_ABORT.  When the server completes the
358	   copy, the server performs a callback to the client and reports the
359	   results.

361	     Client                                  Server
362	        +                                      +
363	        |                                      |
364	        |--- COPY ---------------------------->| Client requests
365	        |<------------------------------------/| a file copy
366	        |                                      |
367	        |                                      |
368	        |--- COPY_STATUS --------------------->| Client may poll
369	        |<------------------------------------/| for status
370	        |                                      |
371	        |                  .                   | Multiple COPY_STATUS
372	        |                  .                   | operations may be sent.
373	        |                  .                   |
374	        |                                      |
375	        |<-- CB_COPY --------------------------| Server reports results
376	        |\------------------------------------>|
377	        |                                      |

379	               Figure 2: An asynchronous intra-server copy.

381	2.2.2.  Inter-Server Copy

383	   A copy may also be performed between two servers.  The copy protocol
384	   is designed to accommodate a variety of network topologies.  As shown
385	   in Figure 3, the client and servers may be connected by multiple
386	   networks.  In particular, the servers may be connected by a
387	   specialized, high speed network (network 192.168.33.0/24 in the
388	   diagram) that does not include the client.  The protocol allows the
389	   client to setup the copy between the servers (over network
390	   10.11.78.0/24 in the diagram) and for the servers to communicate on
391	   the high speed network if they choose to do so.

393	                             192.168.33.0/24
394	                 +-------------------------------------+
395	                 |                                     |
396	                 |                                     |
397	                 | 192.168.33.18                       | 192.168.33.56
398	         +-------+------+                       +------+------+
399	         |     Source   |                       | Destination |
400	         +-------+------+                       +------+------+
401	                 | 10.11.78.18                         | 10.11.78.56
402	                 |                                     |
403	                 |                                     |
404	                 |             10.11.78.0/24           |
405	                 +------------------+------------------+
406	                                    |
407	                                    |
408	                                    | 10.11.78.243
409	                              +-----+-----+
410	                              |   Client  |
411	                              +-----------+

413	            Figure 3: An example inter-server network topology.

415	   For an inter-server copy, the client notifies the source server that
416	   a file will be copied by the destination server using a COPY_NOTIFY
417	   operation.  The client then initiates the copy by sending the COPY
418	   operation to the destination server.  The destination server may
419	   perform the copy synchronously or asynchronously.

421	   A synchronous inter-server copy is shown in Figure 4.  In this case,
422	   the destination server chooses to perform the copy before responding
423	   to the client's COPY request.

425	   An asynchronous copy is shown in Figure 5.  In this case, the
426	   destination server chooses to respond to the client's COPY request
427	   immediately and then perform the copy asynchronously.

429	     Client                Source         Destination
430	        +                    +                 +
431	        |                    |                 |
432	        |--- COPY_NOTIFY --->|                 |
433	        |<------------------/|                 |
434	        |                    |                 |
435	        |                    |                 |
436	        |--- COPY ---------------------------->|
437	        |                    |                 |
438	        |                    |                 |
439	        |                    |<----- read -----|
440	        |                    |\--------------->|
441	        |                    |                 |
442	        |                    |        .        | Multiple reads may
443	        |                    |        .        | be necessary
444	        |                    |        .        |
445	        |                    |                 |
446	        |                    |                 |
447	        |<------------------------------------/| Destination replies
448	        |                    |                 | to COPY

450	                Figure 4: A synchronous inter-server copy.

452	     Client                Source         Destination
453	        +                    +                 +
454	        |                    |                 |
455	        |--- COPY_NOTIFY --->|                 |
456	        |<------------------/|                 |
457	        |                    |                 |
458	        |                    |                 |
459	        |--- COPY ---------------------------->|
460	        |<------------------------------------/|
461	        |                    |                 |
462	        |                    |                 |
463	        |                    |<----- read -----|
464	        |                    |\--------------->|
465	        |                    |                 |
466	        |                    |        .        | Multiple reads may
467	        |                    |        .        | be necessary
468	        |                    |        .        |
469	        |                    |                 |
470	        |                    |                 |
471	        |--- COPY_STATUS --------------------->| Client may poll
472	        |<------------------------------------/| for status
473	        |                    |                 |
474	        |                    |        .        | Multiple COPY_STATUS
475	        |                    |        .        | operations may be sent
476	        |                    |        .        |
477	        |                    |                 |
478	        |                    |                 |
479	        |                    |                 |
480	        |<-- CB_COPY --------------------------| Destination reports
481	        |\------------------------------------>| results
482	        |                    |                 |

484	               Figure 5: An asynchronous inter-server copy.

486	2.2.3.  Server-to-Server Copy Protocol

488	   During an inter-server copy, the destination server reads the file
489	   data from the source server.  The source server and destination
490	   server are not required to use a specific protocol to transfer the
491	   file data.  The choice of what protocol to use is ultimately the
492	   destination server's decision.

494	2.2.3.1.  Using NFSv4.x as a Server-to-Server Copy Protocol

496	   The destination server MAY use standard NFSv4.x (where x >= 1) to
497	   read the data from the source server.  If NFSv4.x is used for the
498	   server-to-server copy protocol, the destination server can use the
499	   filehandle contained in the COPY request with standard NFSv4.x
500	   operations to read data from the source server.  Specifically, the
501	   destination server may use the NFSv4.x OPEN operation's CLAIM_FH
502	   facility to open the file being copied and obtain an open stateid.
503	   Using the stateid, the destination server may then use NFSv4.x READ
504	   operations to read the file.

506	2.2.3.2.  Using an alternative Server-to-Server Copy Protocol

508	   In a homogeneous environment, the source and destination servers
509	   might be able to perform the file copy extremely efficiently using
510	   specialized protocols.  For example the source and destination
511	   servers might be two nodes sharing a common file system format for
512	   the source and destination file systems.  Thus the source and
513	   destination are in an ideal position to efficiently render the image
514	   of the source file to the destination file by replicating the file
515	   system formats at the block level.  Another possibility is that the
516	   source and destination might be two nodes sharing a common storage
517	   area network, and thus there is no need to copy any data at all, and
518	   instead ownership of the file and its contents might simply be re-
519	   assigned to the destination.  To allow for these possibilities, the
520	   destination server is allowed to use a server-to-server copy protocol
521	   of its choice.

523	   In a heterogeneous environment, using a protocol other than NFSv4.x
524	   (e.g,.  HTTP [13] or FTP [14]) presents some challenges.  In
525	   particular, the destination server is presented with the challenge of
526	   accessing the source file given only an NFSv4.x filehandle.

528	   One option for protocols that identify source files with path names
529	   is to use an ASCII hexadecimal representation of the source
530	   filehandle as the file name.

532	   Another option for the source server is to use URLs to direct the
533	   destination server to a specialized service.  For example, the
534	   response to COPY_NOTIFY could include the URL
535	   ftp://s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII
536	   hexadecimal representation of the source filehandle.  When the
537	   destination server receives the source server's URL, it would use
538	   "_FH/0x12345" as the file name to pass to the FTP server listening on
539	   port 9999 of s1.example.com.  On port 9999 there would be a special
540	   instance of the FTP service that understands how to convert NFS
541	   filehandles to an open file descriptor (in many operating systems,
542	   this would require a new system call, one which is the inverse of the
543	   makefh() function that the pre-NFSv4 MOUNT service needs).

545	   Authenticating and identifying the destination server to the source
546	   server is also a challenge.  Recommendations for how to accomplish
547	   this are given in Section 2.4.1.2.4 and Section 2.4.1.4.

549	2.3.  Operations

551	   In the sections that follow, several operations are defined that
552	   together provide the server-side copy feature.  These operations are
553	   intended to be OPTIONAL operations as defined in section 17 of [2].
554	   The COPY_NOTIFY, COPY_REVOKE, COPY, COPY_ABORT, and COPY_STATUS
555	   operations are designed to be sent within an NFSv4 COMPOUND
556	   procedure.  The CB_COPY operation is designed to be sent within an
557	   NFSv4 CB_COMPOUND procedure.

559	   Each operation is performed in the context of the user identified by
560	   the ONC RPC credential of its containing COMPOUND or CB_COMPOUND
561	   request.  For example, a COPY_ABORT operation issued by a given user
562	   indicates that a specified COPY operation initiated by the same user
563	   be canceled.  Therefore a COPY_ABORT MUST NOT interfere with a copy
564	   of the same file initiated by another user.

566	   An NFS server MAY allow an administrative user to monitor or cancel
567	   copy operations using an implementation specific interface.

569	2.3.1.  netloc4 - Network Locations

571	   The server-side copy operations specify network locations using the
572	   netloc4 data type shown below:

574	   enum netloc_type4 {
575	           NL4_NAME        = 0,
576	           NL4_URL         = 1,
577	           NL4_NETADDR     = 2
578	   };
579	   union netloc4 switch (netloc_type4 nl_type) {
580	           case NL4_NAME:          utf8str_cis nl_name;
581	           case NL4_URL:           utf8str_cis nl_url;
582	           case NL4_NETADDR:       netaddr4    nl_addr;
583	   };

585	   If the netloc4 is of type NL4_NAME, the nl_name field MUST be
586	   specified as a UTF-8 string.  The nl_name is expected to be resolved
587	   to a network address via DNS, LDAP, NIS, /etc/hosts, or some other
588	   means.  If the netloc4 is of type NL4_URL, a server URL [4]
589	   appropriate for the server-to-server copy operation is specified as a
590	   UTF-8 string.  If the netloc4 is of type NL4_NETADDR, the nl_addr
591	   field MUST contain a valid netaddr4 as defined in Section 3.3.9 of
592	   [2].

594	   When netloc4 values are used for an inter-server copy as shown in
595	   Figure 3, their values may be evaluated on the source server,
596	   destination server, and client.  The network environment in which
597	   these systems operate should be configured so that the netloc4 values
598	   are interpreted as intended on each system.

600	2.3.2.  Copy Offload Stateids

602	   A server may perform a copy offload operation asynchronously.  An
603	   asynchronous copy is tracked using a copy offload stateid.  Copy
604	   offload stateids are included in the COPY, COPY_ABORT, COPY_STATUS,
605	   and CB_COPY operations.

607	   Section 8.2.4 of [2] specifies that stateids are valid until either
608	   (A) the client or server restart or (B) the client returns the
609	   resource.

611	   A copy offload stateid will be valid until either (A) the client or
612	   server restart or (B) the client returns the resource by issuing a
613	   COPY_ABORT operation or the client replies to a CB_COPY operation.

615	   A copy offload stateid's seqid MUST NOT be 0 (zero).  In the context
616	   of a copy offload operation, it is ambiguous to indicate the most
617	   recent copy offload operation using a stateid with seqid of 0 (zero).
618	   Therefore a copy offload stateid with seqid of 0 (zero) MUST be
619	   considered invalid.

621	2.4.  Security Considerations

623	   The security considerations pertaining to NFSv4 [10] apply to this
624	   document.

626	   The standard security mechanisms provide by NFSv4 [10] may be used to
627	   secure the protocol described in this document.

629	   NFSv4 clients and servers supporting the the inter-server copy
630	   operations described in this document are REQUIRED to implement [5],
631	   including the RPCSEC_GSSv3 privileges copy_from_auth and
632	   copy_to_auth.  If the server-to-server copy protocol is ONC RPC
633	   based, the servers are also REQUIRED to implement the RPCSEC_GSSv3
634	   privilege copy_confirm_auth.  These requirements to implement are not
635	   requirements to use.  NFSv4 clients and servers are RECOMMENDED to
636	   use [5] to secure server-side copy operations.

638	2.4.1.  Inter-Server Copy Security

640	2.4.1.1.  Requirements for Secure Inter-Server Copy

642	   Inter-server copy is driven by several requirements:

644	   o  The specification MUST NOT mandate an inter-server copy protocol.
645	      There are many ways to copy data.  Some will be more optimal than
646	      others depending on the identities of the source server and
647	      destination server.  For example the source and destination
648	      servers might be two nodes sharing a common file system format for
649	      the source and destination file systems.  Thus the source and
650	      destination are in an ideal position to efficiently render the
651	      image of the source file to the destination file by replicating
652	      the file system formats at the block level.  In other cases, the
653	      source and destination might be two nodes sharing a common storage
654	      area network, and thus there is no need to copy any data at all,
655	      and instead ownership of the file and its contents simply gets re-
656	      assigned to the destination.

658	   o  The specification MUST provide guidance for using NFSv4.x as a
659	      copy protocol.  For those source and destination servers willing
660	      to use NFSv4.x there are specific security considerations that
661	      this specification can and does address.

663	   o  The specification MUST NOT mandate pre-configuration between the
664	      source and destination server.  Requiring that the source and
665	      destination first have a "copying relationship" increases the
666	      administrative burden.  However the specification MUST NOT
667	      preclude implementations that require pre-configuration.

669	   o  The specification MUST NOT mandate a trust relationship between
670	      the source and destination server.  The NFSv4 security model
671	      requires mutual authentication between a principal on an NFS
672	      client and a principal on an NFS server.  This model MUST continue
673	      with the introduction of COPY.

675	2.4.1.2.  Inter-Server Copy with RPCSEC_GSSv3

677	   When the client sends a COPY_NOTIFY to the source server to expect
678	   the destination to attempt to copy data from the source server, it is
679	   expected that this copy is being done on behalf of the principal
680	   (called the "user principal") that sent the RPC request that encloses
681	   the COMPOUND procedure that contains the COPY_NOTIFY operation.  The
682	   user principal is identified by the RPC credentials.  A mechanism
683	   that allows the user principal to authorize the destination server to
684	   perform the copy in a manner that lets the source server properly
685	   authenticate the destination's copy, and without allowing the
686	   destination to exceed its authorization is necessary.

688	   An approach that sends delegated credentials of the client's user
689	   principal to the destination server is not used for the following
690	   reasons.  If the client's user delegated its credentials, the
691	   destination would authenticate as the user principal.  If the
692	   destination were using the NFSv4 protocol to perform the copy, then
693	   the source server would authenticate the destination server as the
694	   user principal, and the file copy would securely proceed.  However,
695	   this approach would allow the destination server to copy other files.
696	   The user principal would have to trust the destination server to not
697	   do so.  This is counter to the requirements, and therefore is not
698	   considered.  Instead an approach using RPCSEC_GSSv3 [5] privileges is
699	   proposed.

701	   One of the stated applications of the proposed RPCSEC_GSSv3 protocol
702	   is compound client host and user authentication [+ privilege
703	   assertion].  For inter-server file copy, we require compound NFS
704	   server host and user authentication [+ privilege assertion].  The
705	   distinction between the two is one without meaning.

707	   RPCSEC_GSSv3 introduces the notion of privileges.  We define three
708	   privileges:

710	   copy_from_auth:  A user principal is authorizing a source principal
711	      ("nfs@<source>") to allow a destination principal ("nfs@
712	      <destination>") to copy a file from the source to the destination.
713	      This privilege is established on the source server before the user
714	      principal sends a COPY_NOTIFY operation to the source server.

716	   struct copy_from_auth_priv {
717	           secret4             cfap_shared_secret;
718	           netloc4             cfap_destination;
719	           /* the NFSv4 user name that the user principal maps to */
720	           utf8str_mixed       cfap_username;
721	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
722	           unsigned int        cfap_seq_num;
723	   };

725	      cap_shared_secret is a secret value the user principal generates.

727	   copy_to_auth:  A user principal is authorizing a destination
728	      principal ("nfs@<destination>") to allow it to copy a file from
729	      the source to the destination.  This privilege is established on
730	      the destination server before the user principal sends a COPY
731	      operation to the destination server.

733	   struct copy_to_auth_priv {
734	           /* equal to cfap_shared_secret */
735	           secret4              ctap_shared_secret;
736	           netloc4              ctap_source;
737	           /* the NFSv4 user name that the user principal maps to */
738	           utf8str_mixed        ctap_username;
739	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
740	           unsigned int         ctap_seq_num;
741	   };

743	      ctap_shared_secret is a secret value the user principal generated
744	      and was used to establish the copy_from_auth privilege with the
745	      source principal.

747	   copy_confirm_auth:  A destination principal is confirming with the
748	      source principal that it is authorized to copy data from the
749	      source on behalf of the user principal.  When the inter-server
750	      copy protocol is NFSv4, or for that matter, any protocol capable
751	      of being secured via RPCSEC_GSSv3 (i.e., any ONC RPC protocol),
752	      this privilege is established before the file is copied from the
753	      source to the destination.

755	   struct copy_confirm_auth_priv {
756	           /* equal to GSS_GetMIC() of cfap_shared_secret */
757	           opaque              ccap_shared_secret_mic<>;
758	           /* the NFSv4 user name that the user principal maps to */
759	           utf8str_mixed       ccap_username;
760	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
761	           unsigned int        ccap_seq_num;
762	   };

764	2.4.1.2.1.  Establishing a Security Context

766	   When the user principal wants to COPY a file between two servers, if
767	   it has not established copy_from_auth and copy_to_auth privileges on
768	   the servers, it establishes them:

770	   o  The user principal generates a secret it will share with the two
771	      servers.  This shared secret will be placed in the
772	      cfap_shared_secret and ctap_shared_secret fields of the
773	      appropriate privilege data types, copy_from_auth_priv and
774	      copy_to_auth_priv.

776	   o  An instance of copy_from_auth_priv is filled in with the shared
777	      secret, the destination server, and the NFSv4 user id of the user
778	      principal.  It will be sent with an RPCSEC_GSS3_CREATE procedure,
779	      and so cfap_seq_num is set to the seq_num of the credential of the
780	      RPCSEC_GSS3_CREATE procedure.  Because cfap_shared_secret is a
781	      secret, after XDR encoding copy_from_auth_priv, GSS_Wrap() (with
782	      privacy) is invoked on copy_from_auth_priv.  The
783	      RPCSEC_GSS3_CREATE procedure's arguments are:

785	      struct {
786	         rpc_gss3_gss_binding    *compound_binding;
787	         rpc_gss3_chan_binding   *chan_binding_mic;
788	         rpc_gss3_assertion      assertions<>;
789	         rpc_gss3_extension      extensions<>;
790	      } rpc_gss3_create_args;

792	      The string "copy_from_auth" is placed in assertions[0].privs.  The
793	      output of GSS_Wrap() is placed in extensions[0].data.  The field
794	      extensions[0].critical is set to TRUE.  The source server calls
795	      GSS_Unwrap() on the privilege, and verifies that the seq_num
796	      matches the credential.  It then verifies that the NFSv4 user id
797	      being asserted matches the source server's mapping of the user
798	      principal.  If it does, the privilege is established on the source
799	      server as: <"copy_from_auth", user id, destination>.  The
800	      successful reply to RPCSEC_GSS3_CREATE has:

802	      struct {
803	         opaque                  handle<>;
804	         rpc_gss3_chan_binding   *chan_binding_mic;
805	         rpc_gss3_assertion      granted_assertions<>;
806	         rpc_gss3_assertion      server_assertions<>;
807	         rpc_gss3_extension      extensions<>;
808	      } rpc_gss3_create_res;

810	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
811	      use on COPY_NOTIFY requests involving the source and destination
812	      server. granted_assertions[0].privs will be equal to
813	      "copy_from_auth".  The server will return a GSS_Wrap() of
814	      copy_to_auth_priv.

816	   o  An instance of copy_to_auth_priv is filled in with the shared
817	      secret, the source server, and the NFSv4 user id.  It will be sent
818	      with an RPCSEC_GSS3_CREATE procedure, and so ctap_seq_num is set
819	      to the seq_num of the credential of the RPCSEC_GSS3_CREATE
820	      procedure.  Because ctap_shared_secret is a secret, after XDR
821	      encoding copy_to_auth_priv, GSS_Wrap() is invoked on
822	      copy_to_auth_priv.  The RPCSEC_GSS3_CREATE procedure's arguments
823	      are:

825	      struct {
826	         rpc_gss3_gss_binding    *compound_binding;
827	         rpc_gss3_chan_binding   *chan_binding_mic;
828	         rpc_gss3_assertion      assertions<>;
829	         rpc_gss3_extension      extensions<>;
830	      } rpc_gss3_create_args;

832	      The string "copy_to_auth" is placed in assertions[0].privs.  The
833	      output of GSS_Wrap() is placed in extensions[0].data.  The field
834	      extensions[0].critical is set to TRUE.  After unwrapping,
835	      verifying the seq_num, and the user principal to NFSv4 user ID
836	      mapping, the destination establishes a privilege of
837	      <"copy_to_auth", user id, source>.  The successful reply to
838	      RPCSEC_GSS3_CREATE has:

840	      struct {
841	         opaque                  handle<>;
842	         rpc_gss3_chan_binding   *chan_binding_mic;
843	         rpc_gss3_assertion      granted_assertions<>;
844	         rpc_gss3_assertion      server_assertions<>;
845	         rpc_gss3_extension      extensions<>;
846	      } rpc_gss3_create_res;

848	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
849	      use on COPY requests involving the source and destination server.
850	      The field granted_assertions[0].privs will be equal to
851	      "copy_to_auth".  The server will return a GSS_Wrap() of
852	      copy_to_auth_priv.

854	2.4.1.2.2.  Starting a Secure Inter-Server Copy

856	   When the client sends a COPY_NOTIFY request to the source server, it
857	   uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle.
858	   cna_destination_server in COPY_NOTIFY MUST be the same as the name of
859	   the destination server specified in copy_from_auth_priv.  Otherwise,
860	   COPY_NOTIFY will fail with NFS4ERR_ACCESS.  The source server
861	   verifies that the privilege <"copy_from_auth", user id, destination>
862	   exists, and annotates it with the source filehandle, if the user
863	   principal has read access to the source file, and if administrative
864	   policies give the user principal and the NFS client read access to
865	   the source file (i.e., if the ACCESS operation would grant read
866	   access).  Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS.

868	   When the client sends a COPY request to the destination server, it
869	   uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle.
870	   ca_source_server in COPY MUST be the same as the name of the source
871	   server specified in copy_to_auth_priv.  Otherwise, COPY will fail
872	   with NFS4ERR_ACCESS.  The destination server verifies that the
873	   privilege <"copy_to_auth", user id, source> exists, and annotates it
874	   with the source and destination filehandles.  If the client has
875	   failed to establish the "copy_to_auth" policy it will reject the
876	   request with NFS4ERR_PARTNER_NO_AUTH.

878	   If the client sends a COPY_REVOKE to the source server to rescind the
879	   destination server's copy privilege, it uses the privileged
880	   "copy_from_auth" RPCSEC_GSSv3 handle and the cra_destination_server
881	   in COPY_REVOKE MUST be the same as the name of the destination server
882	   specified in copy_from_auth_priv.  The source server will then delete
883	   the <"copy_from_auth", user id, destination> privilege and fail any
884	   subsequent copy requests sent under the auspices of this privilege
885	   from the destination server.

887	2.4.1.2.3.  Securing ONC RPC Server-to-Server Copy Protocols

889	   After a destination server has a "copy_to_auth" privilege established
890	   on it, and it receives a COPY request, if it knows it will use an ONC
891	   RPC protocol to copy data, it will establish a "copy_confirm_auth"
892	   privilege on the source server, using nfs@<destination> as the
893	   initiator principal, and nfs@<source> as the target principal.

895	   The value of the field ccap_shared_secret_mic is a GSS_VerifyMIC() of
896	   the shared secret passed in the copy_to_auth privilege.  The field
897	   ccap_username is the mapping of the user principal to an NFSv4 user
898	   name ("user"@"domain" form), and MUST be the same as ctap_username
899	   and cfap_username.  The field ccap_seq_num is the seq_num of the
900	   RPCSEC_GSSv3 credential used for the RPCSEC_GSS3_CREATE procedure the
901	   destination will send to the source server to establish the
902	   privilege.

904	   The source server verifies the privilege, and establishes a
905	   <"copy_confirm_auth", user id, destination> privilege.  If the source
906	   server fails to verify the privilege, the COPY operation will be
907	   rejected with NFS4ERR_PARTNER_NO_AUTH.  All subsequent ONC RPC
908	   requests sent from the destination to copy data from the source to
909	   the destination will use the RPCSEC_GSSv3 handle returned by the
910	   source's RPCSEC_GSS3_CREATE response.

912	   Note that the use of the "copy_confirm_auth" privilege accomplishes
913	   the following:

915	   o  if a protocol like NFS is being used, with export policies, export
916	      policies can be overridden in case the destination server as-an-
917	      NFS-client is not authorized

919	   o  manual configuration to allow a copy relationship between the
920	      source and destination is not needed.

922	   If the attempt to establish a "copy_confirm_auth" privilege fails,
923	   then when the user principal sends a COPY request to destination, the
924	   destination server will reject it with NFS4ERR_PARTNER_NO_AUTH.

926	2.4.1.2.4.  Securing Non ONC RPC Server-to-Server Copy Protocols

928	   If the destination won't be using ONC RPC to copy the data, then the
929	   source and destination are using an unspecified copy protocol.  The
930	   destination could use the shared secret and the NFSv4 user id to
931	   prove to the source server that the user principal has authorized the
932	   copy.

934	   For protocols that authenticate user names with passwords (e.g., HTTP
935	   [13] and FTP [14]), the nfsv4 user id could be used as the user name,
936	   and an ASCII hexadecimal representation of the RPCSEC_GSSv3 shared
937	   secret could be used as the user password or as input into non-
938	   password authentication methods like CHAP [15].

940	2.4.1.3.  Inter-Server Copy via ONC RPC but without RPCSEC_GSSv3

942	   ONC RPC security flavors other than RPCSEC_GSSv3 MAY be used with the
943	   server-side copy offload operations described in this document.  In
944	   particular, host-based ONC RPC security flavors such as AUTH_NONE and
945	   AUTH_SYS MAY be used.  If a host-based security flavor is used, a
946	   minimal level of protection for the server-to-server copy protocol is
947	   possible.

949	   In the absence of strong security mechanisms such as RPCSEC_GSSv3,
950	   the challenge is how the source server and destination server
951	   identify themselves to each other, especially in the presence of
952	   multi-homed source and destination servers.  In a multi-homed
953	   environment, the destination server might not contact the source
954	   server from the same network address specified by the client in the
955	   COPY_NOTIFY.  This can be overcome using the procedure described
956	   below.

958	   When the client sends the source server the COPY_NOTIFY operation,
959	   the source server may reply to the client with a list of target
960	   addresses, names, and/or URLs and assign them to the unique triple:
961	   <source fh, user ID, destination address Y>.  If the destination uses
962	   one of these target netlocs to contact the source server, the source
963	   server will be able to uniquely identify the destination server, even
964	   if the destination server does not connect from the address specified
965	   by the client in COPY_NOTIFY.

967	   For example, suppose the network topology is as shown in Figure 3.
968	   If the source filehandle is 0x12345, the source server may respond to
969	   a COPY_NOTIFY for destination 10.11.78.56 with the URLs:

971	      nfs://10.11.78.18//_COPY/10.11.78.56/_FH/0x12345

973	      nfs://192.168.33.18//_COPY/10.11.78.56/_FH/0x12345

975	   The client will then send these URLs to the destination server in the
976	   COPY operation.  Suppose that the 192.168.33.0/24 network is a high
977	   speed network and the destination server decides to transfer the file
978	   over this network.  If the destination contacts the source server
979	   from 192.168.33.56 over this network using NFSv4.1, it does the
980	   following:

982	   COMPOUND  { PUTROOTFH, LOOKUP "_COPY" ; LOOKUP "10.11.78.56"; LOOKUP
983	      "_FH" ; OPEN "0x12345" ; GETFH }

985	   The source server will therefore know that these NFSv4.1 operations
986	   are being issued by the destination server identified in the
987	   COPY_NOTIFY.

989	2.4.1.4.  Inter-Server Copy without ONC RPC and RPCSEC_GSSv3

991	   The same techniques as Section 2.4.1.3, using unique URLs for each
992	   destination server, can be used for other protocols (e.g., HTTP [13]
993	   and FTP [14]) as well.

995	3.  Sparse Files

997	3.1.  Introduction

999	   A sparse file is a common way of representing a large file without
1000	   having to utilize all of the disk space for it.  Consequently, a
1001	   sparse file uses less physical space than its size indicates.  This
1002	   means the file contains 'holes', byte ranges within the file that
1003	   contain no data.  Most modern file systems support sparse files,
1004	   including most UNIX file systems and NTFS, but notably not Apple's
1005	   HFS+.  Common examples of sparse files include Virtual Machine (VM)
1006	   OS/disk images, database files, log files, and even checkpoint
1007	   recovery files most commonly used by the HPC community.

1009	   If an application reads a hole in a sparse file, the file system must
1010	   return all zeros to the application.  For local data access there is
1011	   little penalty, but with NFS these zeroes must be transferred back to
1012	   the client.  If an application uses the NFS client to read data into
1013	   memory, this wastes time and bandwidth as the application waits for
1014	   the zeroes to be transferred.

1016	   A sparse file is typically created by initializing the file to be all
1017	   zeros - nothing is written to the data in the file, instead the hole
1018	   is recorded in the metadata for the file.  So a 8G disk image might
1019	   be represented initially by a couple hundred bits in the inode and
1020	   nothing on the disk.  If the VM then writes 100M to a file in the
1021	   middle of the image, there would now be two holes represented in the
1022	   metadata and 100M in the data.

1024	   This section introduces a new operation READ_PLUS which supports all
1025	   the features of READ but includes an extension to support sparse
1026	   pattern files.  READ_PLUS is guaranteed to perform no worse than
1027	   READ, and can dramatically improve performance with sparse files.
1028	   READ_PLUS does not depend on pNFS protocol features, but can be used
1029	   by pNFS to support sparse files.

1031	3.2.  Terminology

1033	   Regular file:  An object of file type NF4REG or NF4NAMEDATTR.

1035	   Sparse file:  A Regular file that contains one or more Holes.

1037	   Hole:  A byte range within a Sparse file that contains regions of all
1038	      zeroes.  For block-based file systems, this could also be an
1039	      unallocated region of the file.

1041	   Hole Threshold  The minimum length of a Hole as determined by the
1042	      server.  If a server chooses to define a Hole Threshold, then it
1043	      would not return hole information (nfs_readplusreshole) with a
1044	      hole_offset and hole_length that specify a range shorter than the
1045	      Hole Threshold.

1047	3.3.  Overview of Sparse Files and NFSv4

1049	   This section provides sparse file support to the largest number of
1050	   NFS client and server implementations, and as such proposes to add a
1051	   new return code to the READ_PLUS operation instead of proposing
1052	   additions or extensions of new or existing optional features (such as
1053	   pNFS).

1055	3.4.  Operation 65: READ_PLUS

1057	   The section introduces a new read operation, named READ_PLUS, which
1058	   allows NFS clients to avoid reading holes in a sparse file.
1059	   READ_PLUS is guaranteed to perform no worse than READ, and can
1060	   dramatically improve performance with sparse files.

1062	   READ_PLUS supports all the features of the existing NFSv4.1 READ
1063	   operation [2] and adds a simple yet significant extension to the
1064	   format of its response.  The change allows the client to avoid
1065	   returning all zeroes from a file hole, wasting computational and
1066	   network resources and reducing performance.  READ_PLUS uses a new
1067	   result structure that tells the client that the result is all zeroes
1068	   AND the byte-range of the hole in which the request was made.
1069	   Returning the hole's byte-range, and only upon request, avoids
1070	   transferring large Data Region Maps that may be soon invalidated and
1071	   contain information about a file that may not even be read in its
1072	   entirely.

1074	   A new read operation is required due to NFSv4.1 minor versioning
1075	   rules that do not allow modification of existing operation's
1076	   arguments or results.  READ_PLUS is designed in such a way to allow
1077	   future extensions to the result structure.  The same approach could
1078	   be taken to extend the argument structure, but a good use case is
1079	   first required to make such a change.

1081	3.4.1.  ARGUMENT

1083	   struct READ_PLUS4args {
1084	           /* CURRENT_FH: file */
1085	           stateid4        rpa_stateid;
1086	           offset4         rpa_offset;
1087	           count4          rpa_count;
1088	   };

1090	3.4.2.  RESULT

1092	   union read_plus_content switch (data_content4 content) {
1093	   case NFS4_CONTENT_DATA:
1094	           opaque          rpc_data<>;
1095	   case NFS4_CONTENT_APP_BLOCK:
1096	           app_data_block4 rpc_block;
1097	   case NFS4_CONTENT_HOLE:
1098	           hole_info4      rpc_hole;
1099	   default:
1100	           void;
1101	   };

1103	   /*
1104	    * Allow a return of an array of contents.
1105	    */
1106	   struct read_plus_res4 {
1107	           bool                    rpr_eof;
1108	           read_plus_content       rpr_contents<>;
1109	   };

1111	   union READ_PLUS4res switch (nfsstat4 status) {
1112	   case NFS4_OK:
1113	           read_plus_res4  resok4;
1114	   default:
1115	           void;
1116	   };

1118	3.4.3.  DESCRIPTION

1120	   The READ_PLUS operation is based upon the NFSv4.1 READ operation [2],
1121	   and similarly reads data from the regular file identified by the
1122	   current filehandle.

1124	   The client provides an offset of where the READ_PLUS is to start and
1125	   a count of how many bytes are to be read.  An offset of zero means to
1126	   read data starting at the beginning of the file.  If offset is
1127	   greater than or equal to the size of the file, the status NFS4_OK is
1128	   returned with nfs_readplusrestype4 set to READ_OK, data length set to
1129	   zero, and eof set to TRUE.  The READ_PLUS is subject to access
1130	   permissions checking.

1132	   If the client specifies a count value of zero, the READ_PLUS succeeds
1133	   and returns zero bytes of data, again subject to access permissions
1134	   checking.  In all situations, the server may choose to return fewer
1135	   bytes than specified by the client.  The client needs to check for
1136	   this condition and handle the condition appropriately.

1138	   If the client specifies an offset and count value that is entirely
1139	   contained within a hole of the file, the status NFS4_OK is returned
1140	   with nfs_readplusresok4 set to READ_HOLE, and if information is
1141	   available regarding the hole, a nfs_readplusreshole structure
1142	   containing the offset and range of the entire hole.  The
1143	   nfs_readplusreshole structure is considered valid until the file is
1144	   changed (detected via the change attribute).  The server MUST provide
1145	   the same semantics for nfs_readplusreshole as if the client read the
1146	   region and received zeroes; the implied holes contents lifetime MUST
1147	   be exactly the same as any other read data.

1149	   If the client specifies an offset and count value that begins in a
1150	   non-hole of the file but extends into hole the server should return a
1151	   short read with status NFS4_OK, nfs_readplusresok4 set to READ_OK,
1152	   and data length set to the number of bytes returned.  The client will
1153	   then issue another READ_PLUS for the remaining bytes, which the
1154	   server will respond with information about the hole in the file.

1156	   If the server knows that the requested byte range is into a hole of
1157	   the file, but has no further information regarding the hole, it
1158	   returns a nfs_readplusreshole structure with holeres4 set to
1159	   HOLE_NOINFO.

1161	   If hole information is available and can be returned to the client,
1162	   the server returns a nfs_readplusreshole structure with the value of
1163	   holeres4 to HOLE_INFO.  The values of hole_offset and hole_length
1164	   define the byte-range for the current hole in the file.  These values
1165	   represent the information known to the server and may describe a
1166	   byte-range smaller than the true size of the hole.

1168	   Except when special stateids are used, the stateid value for a
1169	   READ_PLUS request represents a value returned from a previous byte-
1170	   range lock or share reservation request or the stateid associated
1171	   with a delegation.  The stateid identifies the associated owners if
1172	   any and is used by the server to verify that the associated locks are
1173	   still valid (e.g., have not been revoked).

1175	   If the read ended at the end-of-file (formally, in a correctly formed
1176	   READ_PLUS operation, if offset + count is equal to the size of the
1177	   file), or the READ_PLUS operation extends beyond the size of the file
1178	   (if offset + count is greater than the size of the file), eof is
1179	   returned as TRUE; otherwise, it is FALSE.  A successful READ_PLUS of
1180	   an empty file will always return eof as TRUE.

1182	   If the current filehandle is not an ordinary file, an error will be
1183	   returned to the client.  In the case that the current filehandle
1184	   represents an object of type NF4DIR, NFS4ERR_ISDIR is returned.  If
1185	   the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
1186	   returned.  In all other cases, NFS4ERR_WRONG_TYPE is returned.

1188	   For a READ_PLUS with a stateid value of all bits equal to zero, the
1189	   server MAY allow the READ_PLUS to be serviced subject to mandatory
1190	   byte-range locks or the current share deny modes for the file.  For a
1191	   READ_PLUS with a stateid value of all bits equal to one, the server
1192	   MAY allow READ_PLUS operations to bypass locking checks at the
1193	   server.

1195	   On success, the current filehandle retains its value.

1197	3.4.4.  IMPLEMENTATION

1199	   If the server returns a "short read" (i.e., fewer data than requested
1200	   and eof is set to FALSE), the client should send another READ_PLUS to
1201	   get the remaining data.  A server may return less data than requested
1202	   under several circumstances.  The file may have been truncated by
1203	   another client or perhaps on the server itself, changing the file
1204	   size from what the requesting client believes to be the case.  This
1205	   would reduce the actual amount of data available to the client.  It
1206	   is possible that the server reduce the transfer size and so return a
1207	   short read result.  Server resource exhaustion may also occur in a
1208	   short read.

1210	   If mandatory byte-range locking is in effect for the file, and if the
1211	   byte-range corresponding to the data to be read from the file is
1212	   WRITE_LT locked by an owner not associated with the stateid, the
1213	   server will return the NFS4ERR_LOCKED error.  The client should try
1214	   to get the appropriate READ_LT via the LOCK operation before re-
1215	   attempting the READ_PLUS.  When the READ_PLUS completes, the client
1216	   should release the byte-range lock via LOCKU.  In addition, the
1217	   server MUST return a nfs_readplusreshole structure with values of
1218	   hole_offset and hole_length that are within the owner's locked byte
1219	   range.

1221	   If another client has an OPEN_DELEGATE_WRITE delegation for the file
1222	   being read, the delegation must be recalled, and the operation cannot
1223	   proceed until that delegation is returned or revoked.  Except where
1224	   this happens very quickly, one or more NFS4ERR_DELAY errors will be
1225	   returned to requests made while the delegation remains outstanding.
1226	   Normally, delegations will not be recalled as a result of a READ_PLUS
1227	   operation since the recall will occur as a result of an earlier OPEN.
1228	   However, since it is possible for a READ_PLUS to be done with a
1229	   special stateid, the server needs to check for this case even though
1230	   the client should have done an OPEN previously.

1232	3.4.4.1.  Additional pNFS Implementation Information

1234	   With pNFS, the semantics of using READ_PLUS remains the same.  Any
1235	   data server MAY return a READ_HOLE result for a READ_PLUS request
1236	   that it receives.

1238	   When a data server chooses to return a READ_HOLE result, it has the
1239	   option of returning hole information for the data stored on that data
1240	   server (as defined by the data layout), but it MUST not return a
1241	   nfs_readplusreshole structure with a byte range that includes data
1242	   managed by another data server.

1244	   1.  Data servers that cannot determine hole information SHOULD return
1245	       HOLE_NOINFO.

1247	   2.  Data servers that can obtain hole information for the parts of
1248	       the file stored on that data server, the data server SHOULD
1249	       return HOLE_INFO and the byte range of the hole stored on that
1250	       data server.

1252	   A data server should do its best to return as much information about
1253	   a hole as is feasible without having to contact the metadata server.
1254	   If communication with the metadata server is required, then every
1255	   attempt should be taken to minimize the number of requests.

1257	   If mandatory locking is enforced, then the data server must also
1258	   ensure that to return only information for a Hole that is within the
1259	   owner's locked byte range.

1261	3.4.5.  READ_PLUS with Sparse Files Example

1263	   To see how the return value READ_HOLE will work, the following table
1264	   describes a sparse file.  For each byte range, the file contains
1265	   either non-zero data or a hole.  In addition, the server in this
1266	   example uses a hole threshold of 32K.

1268	                        +-------------+----------+
1269	                        | Byte-Range  | Contents |
1270	                        +-------------+----------+
1271	                        | 0-15999     | Hole     |
1272	                        | 16K-31999   | Non-Zero |
1273	                        | 32K-255999  | Hole     |
1274	                        | 256K-287999 | Non-Zero |
1275	                        | 288K-353999 | Hole     |
1276	                        | 354K-417999 | Non-Zero |
1277	                        +-------------+----------+

1279	                                  Table 1

1281	   Under the given circumstances, if a client was to read the file from
1282	   beginning to end with a max read size of 64K, the following will be
1283	   the result.  This assumes the client has already opened the file and
1284	   acquired a valid stateid and just needs to issue READ_PLUS requests.

1286	   1.  READ_PLUS(s, 0, 64K) --> NFS_OK, readplusrestype4 = READ_OK, eof
1287	       = false, data<>[32K].  Return a short read, as the last half of
1288	       the request was all zeroes.  Note that the first hole is read
1289	       back as all zeros as it is below the hole threshhold.

1291	   2.  READ_PLUS(s, 32K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE,
1292	       nfs_readplusreshole(HOLE_INFO)(32K, 224K).  The requested range
1293	       was all zeros, and the current hole begins at offset 32K and is
1294	       224K in length.

1296	   3.  READ_PLUS(s, 256K, 64K) --> NFS_OK, readplusrestype4 = READ_OK,
1297	       eof = false, data<>[32K].  Return a short read, as the last half
1298	       of the request was all zeroes.

1300	   4.  READ_PLUS(s, 288K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE,
1301	       nfs_readplusreshole(HOLE_INFO)(288K, 66K).

1303	   5.  READ_PLUS(s, 354K, 64K) --> NFS_OK, readplusrestype4 = READ_OK,
1304	       eof = true, data<>[64K].

1306	3.5.  Related Work

1308	   Solaris and ZFS support an extension to lseek(2) that allows
1309	   applications to discover holes in a file.  The values, SEEK_HOLE and
1310	   SEEK_DATA, allow clients to seek to the next hole or beginning of
1311	   data, respectively.

1313	   XFS supports the XFS_IOC_GETBMAP extended attribute, which returns
1314	   the Data Region Map for a file.  Clients can then use this
1315	   information to avoid reading holes in a file.

1317	   NTFS and CIFS support the FSCTL_SET_SPARSE attribute, which allows
1318	   applications to control whether empty regions of the file are
1319	   preallocated and filled in with zeros or simply left unallocated.

1321	3.6.  Other Proposed Designs

1323	3.6.1.  Multi-Data Server Hole Information

1325	   The current design prohibits pnfs data servers from returning hole
1326	   information for regions of a file that are not stored on that data
1327	   server.  Having data servers return information regarding other data
1328	   servers changes the fundamental principal that all metadata
1329	   information comes from the metadata server.

1331	   Here is a brief description if we did choose to support multi-data
1332	   server hole information:

1334	   For a data server that can obtain hole information for the entire
1335	   file without severe performance impact, it MAY return HOLE_INFO and
1336	   the byte range of the entire file hole.  When a pNFS client receives
1337	   a READ_HOLE result and a non-empty nfs_readplusreshole structure, it
1338	   MAY use this information in conjunction with a valid layout for the
1339	   file to determine the next data server for the next region of data
1340	   that is not in a hole.

1342	3.6.2.  Data Result Array

1344	   If a single read request contains one or more Holes with a length
1345	   greater than the Sparse Threshold, the current design would return
1346	   results indicating a short read to the client.  A client would then
1347	   send a series of read requests to the server to retrieve information
1348	   for the Holes and the remaining data.  To avoid turning a single read
1349	   request into several exchanges between the client and server, the
1350	   server may need to choose a relatively large Sparse Threshold in
1351	   order to decrease the number of short reads it creates.  A large
1352	   Sparse Threshold may miss many smaller holes, which in turn may
1353	   negate the benefits of sparse read support.

1355	   To avoid this situation, one option is to have the READ_PLUS
1356	   operation return information for multiple holes in a single return
1357	   value.  This would allow several small holes to be described in a
1358	   single read response without requiring multliple exchanges between
1359	   the client and server.

1361	   One important item to consider with returning an array of data chunks
1362	   is its impact on RDMA, which may use different block sizes on the
1363	   client and server (among other things).

1365	3.6.3.  User-Defined Sparse Mask

1367	   Add mask (instead of just zeroes).  Specified by server or client?

1369	3.6.4.  Allocated flag

1371	   A Hole on the server may be an allocated byte-range consisting of all
1372	   zeroes or may not be allocated at all.  To ensure this information is
1373	   properly communicated to the client, it may be beneficial to add a
1374	   'alloc' flag to the HOLE_INFO section of nfs_readplusreshole.  This
1375	   would allow an NFS client to copy a file from one file system to
1376	   another and have it more closely resemble the original.

1378	3.6.5.  Dense and Sparse pNFS File Layouts

1380	   The hole information returned form a data server must be understood
1381	   by pNFS clients using both Dense or Sparse file layout types.  Does
1382	   the current READ_PLUS return value work for both layout types?  Does
1383	   the data server know if it is using dense or sparse so that it can
1384	   return the correct hole_offset and hole_length values?

1386	4.  Space Reservation

1388	4.1.  Introduction

1390	   This section describes a set of operations that allow applications
1391	   such as hypervisors to reserve space for a file, report the amount of
1392	   actual disk space a file occupies and freeup the backing space of a
1393	   file when it is not required.  In virtualized environments, virtual
1394	   disk files are often stored on NFS mounted volumes.  Since virtual
1395	   disk files represent the hard disks of virtual machines, hypervisors
1396	   often have to guarantee certain properties for the file.

1398	   One such example is space reservation.  When a hypervisor creates a
1399	   virtual disk file, it often tries to preallocate the space for the
1400	   file so that there are no future allocation related errors during the
1401	   operation of the virtual machine.  Such errors prevent a virtual
1402	   machine from continuing execution and result in downtime.

1404	   Currently, in order to achieve such a guarantee, applications zero
1405	   the entire file.  The initial zeroing allocates the backing blocks
1406	   and all subsequent writes are overwrites of already allocated blocks.
1407	   This approach is not only inefficient in terms of the amount of I/O
1408	   done, it is also not guaranteed to work on filesystems that are log
1409	   structured or deduplicated.  An efficient way of guaranteeing space
1410	   reservation would be beneficial to such applications.

1412	   If the space_reserved attribute is set on a file, it is guaranteed
1413	   that writes that do not grow the file will not fail with
1414	   NFSERR_NOSPC.

1416	   Another useful feature would be the ability to report the number of
1417	   blocks that would be freed when a file is deleted.  Currently, NFS
1418	   reports two size attributes:

1420	   size  The logical file size of the file.

1422	   space_used  The size in bytes that the file occupies on disk

1424	   While these attributes are sufficient for space accounting in
1425	   traditional filesystems, they prove to be inadequate in modern
1426	   filesystems that support block sharing.  In such filesystems,
1427	   multiple inodes can point to a single block with a block reference
1428	   count to guard against premature freeing.  Having a way to tell the
1429	   number of blocks that would be freed if the file was deleted would be
1430	   useful to applications that wish to migrate files when a volume is
1431	   low on space.

1433	   Since virtual disks represent a hard drive in a virtual machine, a
1434	   virtual disk can be viewed as a filesystem within a file.  Since not
1435	   all blocks within a filesystem are in use, there is an opportunity to
1436	   reclaim blocks that are no longer in use.  A call to deallocate
1437	   blocks could result in better space efficiency.  Lesser space MAY be
1438	   consumed for backups after block deallocation.

1440	   The following operations and attributes can be used to resolve this
1441	   issues:

1443	   space_reserved  This attribute specifies whether the blocks backing
1444	      the file have been preallocated.

1446	   space_freed  This attribute specifies the space freed when a file is
1447	      deleted, taking block sharing into consideration.

1449	   max_hole_punch  This attribute specifies the maximum sized hole that
1450	      can be punched on the filesystem.

1452	   INITIALIZED  This operation zeroes and/or deallocates the blocks
1453	      backing a region of the file.

1455	   If space_used of a file is interpreted to mean the size in bytes of
1456	   all disk blocks pointed to by the inode of the file, then shared
1457	   blocks get double counted, over-reporting the space utilization.
1458	   This also has the adverse effect that the deletion of a file with
1459	   shared blocks frees up less than space_used bytes.

1461	   On the other hand, if space_used is interpreted to mean the size in
1462	   bytes of those disk blocks unique to the inode of the file, then
1463	   shared blocks are not counted in any file, resulting in under-
1464	   reporting of the space utilization.

1466	   For example, two files A and B have 10 blocks each.  Let 6 of these
1467	   blocks be shared between them.  Thus, the combined space utilized by
1468	   the two files is 14 * BLOCK_SIZE bytes.  In the former case, the
1469	   combined space utilization of the two files would be reported as 20 *
1470	   BLOCK_SIZE.  However, deleting either would only result in 4 *
1471	   BLOCK_SIZE being freed.  Conversely, the latter interpretation would
1472	   report that the space utilization is only 8 * BLOCK_SIZE.

1474	   Adding another size attribute, space_freed, is helpful in solving
1475	   this problem. space_freed is the number of blocks that are allocated
1476	   to the given file that would be freed on its deletion.  In the
1477	   example, both A and B would report space_freed as 4 * BLOCK_SIZE and
1478	   space_used as 10 * BLOCK_SIZE.  If A is deleted, B will report
1479	   space_freed as 10 * BLOCK_SIZE as the deletion of B would result in
1480	   the deallocation of all 10 blocks.

1482	   The addition of this problem doesn't solve the problem of space being
1483	   over-reported.  However, over-reporting is better than under-
1484	   reporting.

1486	4.2.  Operations and attributes

1488	   In the sections that follow, one operation and three attributes are
1489	   defined that together provide the space management facilities
1490	   outlined earlier in the document.  The operation is intended to be
1491	   OPTIONAL and the attributes RECOMMENDED as defined in section 17 of
1492	   [2].

1494	4.3.  Attribute 77: space_reserved

1496	   The space_reserve attribute is a read/write attribute of type
1497	   boolean.  It is a per file attribute.  When the space_reserved
1498	   attribute is set via SETATTR, the server must ensure that there is
1499	   disk space to accommodate every byte in the file before it can return
1500	   success.  If the server cannot guarantee this, it must return
1501	   NFS4ERR_NOSPC.

1503	   If the client tries to grow a file which has the space_reserved
1504	   attribute set, the server must guarantee that there is disk space to
1505	   accommodate every byte in the file with the new size before it can
1506	   return success.  If the server cannot guarantee this, it must return
1507	   NFS4ERR_NOSPC.

1509	   It is not required that the server allocate the space to the file
1510	   before returning success.  The allocation can be deferred, however,
1511	   it must be guaranteed that it will not fail for lack of space.

1513	   The value of space_reserved can be obtained at any time through
1514	   GETATTR.

1516	   In order to avoid ambiguity, the space_reserve bit cannot be set
1517	   along with the size bit in SETATTR.  Increasing the size of a file
1518	   with space_reserve set will fail if space reservation cannot be
1519	   guaranteed for the new size.  If the file size is decreased, space
1520	   reservation is only guaranteed for the new size and the extra blocks
1521	   backing the file can be released.

1523	4.4.  Attribute 78: space_freed

1525	   space_freed gives the number of bytes freed if the file is deleted.
1526	   This attribute is read only and is of type length4.  It is a per file
1527	   attribute.

1529	4.5.  Attribute 79: max_hole_punch

1531	   max_hole_punch specifies the maximum size of a hole that the
1532	   INITIALIZE operation can handle.  This attribute is read only and of
1533	   type length4.  It is a per filesystem attribute.  This attribute MUST
1534	   be implemented if INITIALIZE is implemented.  [[Comment.4:
1535	   max_hole_punch when doing ADB initialization? --TH]]

1537	5.  Application Data Block Support

1539	   At the OS level, files are contained on disk blocks.  Applications
1540	   are also free to impose structure on the data contained in a file and
1541	   we can define an Application Data Block (ADB) to be such a structure.
1542	   From the application's viewpoint, it only wants to handle ADBs and
1543	   not raw bytes (see [16]).  An ADB is typically comprised of two
1544	   sections: a header and data.  The header describes the
1545	   characteristics of the block and can provide a means to detect
1546	   corruption in the data payload.  The data section is typically
1547	   initialized to all zeros.

1549	   The format of the header is application specific, but there are two
1550	   main components typically encountered:

1552	   1.  An ADB Number (ADBN), which allows the application to determine
1553	       which data block is being referenced.  The ADBN is a logical
1554	       block number and is useful when the client is not storing the
1555	       blocks in contiguous memory.

1557	   2.  Fields to describe the state of the ADB and a means to detect
1558	       block corruption.  For both pieces of data, a useful property is
1559	       that allowed values be unique in that if passed across the
1560	       network, corruption due to translation between big and little
1561	       endian architectures are detectable.  For example, 0xF0DEDEF0 has
1562	       the same bit pattern in both architectures.

1564	   Applications already impose structures on files [16] and detect
1565	   corruption in data blocks [17].  What they are not able to do is
1566	   efficiently transfer and store ADBs.  To initialize a file with ADBs,
1567	   the client must send the full ADB to the server and that must be
1568	   stored on the server.  When the application is initializing a file to
1569	   have the ADB structure, it could compress the ADBs to just the
1570	   information to necessary to later reconstruct the header portion of
1571	   the ADB when the contents are read back.  Using sparse file
1572	   techniques, the disk blocks described by would not be allocated.
1573	   Unlike sparse file techniques, there would be a small cost to store
1574	   the compressed header data.

1576	   In this section, we are going to define a generic framework for an
1577	   ADB, present one approach to detecting corruption in a given ADB
1578	   implementation, and describe the model for how the client and server
1579	   can support efficient initialization of ADBs, reading of ADB holes,
1580	   punching holes in ADBs, and space reservation.  Further, we need to
1581	   be able to extend this model to applications which do not support
1582	   ADBs, but wish to be able to handle sparse files, hole punching, and
1583	   space reservation.

1585	5.1.  Generic Framework

1587	   We want the representation of the ADB to be flexible enough to
1588	   support many different applications.  The most basic approach is no
1589	   imposition of a block at all, which means we are working with the raw
1590	   bytes.  Such an approach would be useful for storing holes, punching
1591	   holes, etc.  In more complex deployments, a server might be
1592	   supporting multiple applications, each with their own definition of
1593	   the ADB.  One might store the ADBN at the start of the block and then
1594	   have a guard pattern to detect corruption [18].  The next might store
1595	   the ADBN at an offset of 100 bytes within the block and have no guard
1596	   pattern at all.  The point is that existing applications might
1597	   already have well defined formats for their data blocks.

1599	   The guard pattern can be used to represent the state of the block, to
1600	   protect against corruption, or both.  Again, it needs to be able to
1601	   be placed anywhere within the ADB.

1603	   We need to be able to represent the starting offset of the block and
1604	   the size of the block.  Note that nothing prevents the application
1605	   from defining different sized blocks in a file.

1607	5.1.1.  Data Block Representation

1609	   struct app_data_block4 {
1610	           offset4         adb_offset;
1611	           length4         adb_block_size;
1612	           length4         adb_block_count;
1613	           length4         adb_reloff_blocknum;
1614	           count4          adb_block_num;
1615	           length4         adb_reloff_pattern;
1616	           opaque          adb_pattern<>;
1617	   };

1619	   The app_data_block4 structure captures the abstraction presented for
1620	   the ADB.  The additional fields present are to allow the transmission
1621	   of adb_block_count ADBs at one time.  We also use adb_block_num to
1622	   convey the ADBN of the first block in the sequence.  Each ADB will
1623	   contain the same adb_pattern string.

1625	   As both adb_block_num and adb_pattern are optional, if either
1626	   adb_reloff_pattern or adb_reloff_blocknum is set to NFS4_UINT64_MAX,
1627	   then the corresponding field is not set in any of the ADB.

1629	5.1.2.  Data Content

1631	   /*
1632	    * Use an enum such that we can extend new types.
1633	    */
1634	   enum data_content4 {
1635	           NFS4_CONTENT_DATA = 0,
1636	           NFS4_CONTENT_APP_BLOCK = 1,
1637	           NFS4_CONTENT_HOLE = 2
1638	   };

1640	   New operations might need to differentiate between wanting to access
1641	   data versus an ADB.  Also, future minor versions might want to
1642	   introduce new data formats.  This enumeration allows that to occur.

1644	5.2.  pNFS Considerations

1646	   While this document does not mandate how sparse ADBs are recorded on
1647	   the server, it does make the assumption that such information is not
1648	   in the file.  I.e., the information is metadata.  As such, the
1649	   INITIALIZE operation is defined to be not supported by the DS - it
1650	   must be issued to the MDS.  But since the client must not assume a
1651	   priori whether a read is sparse or not, the READ_PLUS operation MUST
1652	   be supported by both the DS and the MDS.  I.e., the client might
1653	   impose on the MDS to asynchronously read the data from the DS.

1655	   Furthermore, each DS MUST not report to a client either a sparse ADB
1656	   or data which belongs to another DS.  One implication of this
1657	   requirement is that the app_data_block4's adb_block_size MUST be
1658	   either be the stripe width or the stripe width must be an even
1659	   multiple of it.

1661	   The second implication here is that the DS must be able to use the
1662	   Control Protocol to determine from the MDS where the sparse ADBs
1663	   occur.  [[Comment.5: Need to discuss what happens if after the file
1664	   is being written to and an INITIALIZE occurs? --TH]] Perhaps instead
1665	   of the DS pulling from the MDS, the MDS pushes to the DS?  Thus an
1666	   INITIALIZE causes a new push?  [[Comment.6: Still need to consider
1667	   race cases of the DS getting a WRITE and the MDS getting an
1668	   INITIALIZE. --TH]]

1670	5.3.  An Example of Detecting Corruption

1672	   In this section, we define an ADB format in which corruption can be
1673	   detected.  Note that this is just one possible format and means to
1674	   detect corruption.

1676	   Consider a very basic implementation of an operating system's disk
1677	   blocks.  A block is either data or it is an indirect block which
1678	   allows for files to be larger than one block.  It is desired to be
1679	   able to initialize a block.  Lastly, to quickly unlink a file, a
1680	   block can be marked invalid.  The contents remain intact - which
1681	   would enable this OS application to undelete a file.

1683	   The application defines 4k sized data blocks, with an 8 byte block
1684	   counter occurring at offset 0 in the block, and with the guard
1685	   pattern occurring at offset 8 inside the block.  Furthermore, the
1686	   guard pattern can take one of four states:

1688	   0xfeedface -   This is the FREE state and indicates that the ADB
1689	      format has been applied.

1691	   0xcafedead -   This is the DATA state and indicates that real data
1692	      has been written to this block.

1694	   0xe4e5c001 -   This is the INDIRECT state and indicates that the
1695	      block contains block counter numbers that are chained off of this
1696	      block.

1698	   0xba1ed4a3 -   This is the INVALID state and indicates that the block
1699	      contains data whose contents are garbage.

1701	   Finally, it also defines an 8 byte checksum [19] starting at byte 16
1702	   which applies to the remaining contents of the block.  If the state
1703	   is FREE, then that checksum is trivially zero.  As such, the
1704	   application has no need to transfer the checksum implicitly inside
1705	   the ADB - it need not make the transfer layer aware of the fact that
1706	   there is a checksum (see [17] for an example of checksums used to
1707	   detect corruption in application data blocks).

1709	   Corruption in each ADB can be detected thusly:

1711	   o  If the guard pattern is anything other than one of the allowed
1712	      values, including all zeros.

1714	   o  If the guard pattern is FREE and any other byte in the remainder
1715	      of the ADB is anything other than zero.

1717	   o  If the guard pattern is anything other than FREE, then if the
1718	      stored checksum does not match the computed checksum.

1720	   o  If the guard pattern is INDIRECT and one of the stored indirect
1721	      block numbers has a value greater than the number of ADBs in the
1722	      file.

1724	   o  If the guard pattern is INDIRECT and one of the stored indirect
1725	      block numbers is a duplicate of another stored indirect block
1726	      number.

1728	   As can be seen, the application can detect errors based on the
1729	   combination of the guard pattern state and the checksum.  But also,
1730	   the application can detect corruption based on the state and the
1731	   contents of the ADB.  This last point is important in validating the
1732	   minimum amount of data we incorporated into our generic framework.
1733	   I.e., the guard pattern is sufficient in allowing applications to
1734	   design their own corruption detection.

1736	   Finally, it is important to note that none of these corruption checks
1737	   occur in the transport layer.  The server and client components are
1738	   totally unaware of the file format and might report everything as
1739	   being transferred correctly even in the case the application detects
1740	   corruption.

1742	5.4.  Example of READ_PLUS

1744	   The hypothetical application presented in Section 5.3 can be used to
1745	   illustrate how READ_PLUS would return an array of results.  A file is
1746	   created and initialized with 100 4k ADBs in the FREE state:

1748	      INITIALIZE {0, 4k, 100, 0, 0, 8, 0xfeedface}

1750	   Further, assume the application writes a single ADB at 16k, changing
1751	   the guard pattern to 0xcafedead, we would then have in memory:

1753	      0 -> (16k - 1)   : 4k, 4, 0, 0, 8, 0xfeedface
1754	      16k -> (20k - 1) : 00 00 00 05 ca fe de ad XX XX ... XX XX
1755	      20k -> 400k      : 4k, 95, 0, 6, 0xfeedface

1757	   And when the client did a READ_PLUS of 64k at the start of the file,
1758	   it would get back a result of an ADB, some data, and a final ADB:

1760	      ADB {0, 4, 0, 0, 8, 0xfeedface}
1761	      data 4k
1762	      ADB {20k, 4k, 59, 0, 6, 0xfeedface}

1764	5.5.  Zero Filled Holes

1766	   As applications are free to define the structure of an ADB, it is
1767	   trivial to define an ADB which supports zero filled holes.  Such a
1768	   case would encompass the traditional definitions of a sparse file and
1769	   hole punching.  For example, to punch a 64k hole, starting at 100M,
1770	   into an existing file which has no ADB structure:

1772	      INITIALIZE {100M, 64k, 1, NFS4_UINT64_MAX,
1773	                  0, NFS4_UINT64_MAX, 0x0}

1775	6.  Labeled NFS

1777	6.1.  Introduction

1779	   Access control models such as Unix permissions or Access Control
1780	   Lists are commonly referred to as Discretionary Access Control (DAC)
1781	   models.  These systems base their access decisions on user identity
1782	   and resource ownership.  In contrast Mandatory Access Control (MAC)
1783	   models base their access control decisions on the label on the
1784	   subject (usually a process) and the object it wishes to access.
1785	   These labels may contain user identity information but usually
1786	   contain additional information.  In DAC systems users are free to
1787	   specify the access rules for resources that they own.  MAC models
1788	   base their security decisions on a system wide policy established by
1789	   an administrator or organization which the users do not have the
1790	   ability to override.  In this section, we add a MAC model to NFSv4.

1792	   The first change necessary is to devise a method for transporting and
1793	   storing security label data on NFSv4 file objects.  Security labels
1794	   have several semantics that are met by NFSv4 recommended attributes
1795	   such as the ability to set the label value upon object creation.
1796	   Access control on these attributes are done through a combination of
1797	   two mechanisms.  As with other recommended attributes on file objects
1798	   the usual DAC checks (ACLs and permission bits) will be performed to
1799	   ensure that proper file ownership is enforced.  In addition a MAC
1800	   system MAY be employed on the client, server, or both to enforce
1801	   additional policy on what subjects may modify security label
1802	   information.

1804	   The second change is to provide a method for the server to notify the
1805	   client that the attribute changed on an open file on the server.  If
1806	   the file is closed, then during the open attempt, the client will
1807	   gather the new attribute value.  The server MUST not communicate the
1808	   new value of the attribute, the client MUST query it.  This
1809	   requirement stems from the need for the client to provide sufficient
1810	   access rights to the attribute.

1812	   The final change necessary is a modification to the RPC layer used in
1813	   NFSv4 in the form of a new version of the RPCSEC_GSS [6] framework.
1814	   In order for an NFSv4 server to apply MAC checks it must obtain
1815	   additional information from the client.  Several methods were
1816	   explored for performing this and it was decided that the best
1817	   approach was to incorporate the ability to make security attribute
1818	   assertions through the RPC mechanism.  RPCSECGSSv3 [5] outlines a
1819	   method to assert additional security information such as security
1820	   labels on gss context creation and have that data bound to all RPC
1821	   requests that make use of that context.

1823	6.2.  Definitions

1825	   Label Format Specifier (LFS):  is an identifier used by the client to
1826	      establish the syntactic format of the security label and the
1827	      semantic meaning of its components.  These specifiers exist in a
1828	      registry associated with documents describing the format and
1829	      semantics of the label.

1831	   Label Format Registry:  is the IANA registry containing all
1832	      registered LFS along with references to the documents that
1833	      describe the syntactic format and semantics of the security label.

1835	   Policy Identifier (PI):  is an optional part of the definition of a
1836	      Label Format Specifier which allows for clients and server to
1837	      identify specific security policies.

1839	   Domain of Interpretation (DOI):  represents an administrative
1840	      security boundary, where all systems within the DOI have
1841	      semantically coherent labeling.  That is, a security attribute
1842	      must always mean exactly the same thing anywhere within the DOI.

1844	   Object:  is a passive resource within the system that we wish to be
1845	      protected.  Objects can be entities such as files, directories,
1846	      pipes, sockets, and many other system resources relevant to the
1847	      protection of the system state.

1849	   Subject:  A subject is an active entity usually a process which is
1850	      requesting access to an object.

1852	   Multi-Level Security (MLS):  is a traditional model where objects are
1853	      given a sensitivity level (Unclassified, Secret, Top Secret, etc)
1854	      and a category set [20].

1856	6.3.  MAC Security Attribute

1858	   MAC models base access decisions on security attributes bound to
1859	   subjects and objects.  This information can range from a user
1860	   identity for an identity based MAC model, sensitivity levels for
1861	   Multi-level security, or a type for Type Enforcement.  These models
1862	   base their decisions on different criteria but the semantics of the
1863	   security attribute remain the same.  The semantics required by the
1864	   security attributes are listed below:

1866	   o  Must provide flexibility with respect to MAC model.

1868	   o  Must provide the ability to atomically set security information
1869	      upon object creation

1871	   o  Must provide the ability to enforce access control decisions both
1872	      on the client and the server

1874	   o  Must not expose an object to either the client or server name
1875	      space before its security information has been bound to it.

1877	   NFSv4 implements the security attribute as a recommended attribute.
1878	   These attributes have a fixed format and semantics, which conflicts
1879	   with the flexible nature of the security attribute.  To resolve this
1880	   the security attribute consists of two components.  The first
1881	   component is a LFS as defined in [21] to allow for interoperability
1882	   between MAC mechanisms.  The second component is an opaque field
1883	   which is the actual security attribute data.  To allow for various
1884	   MAC models NFSv4 should be used solely as a transport mechanism for
1885	   the security attribute.  It is the responsibility of the endpoints to
1886	   consume the security attribute and make access decisions based on
1887	   their respective models.  In addition, creation of objects through
1888	   OPEN and CREATE allows for the security attribute to be specified
1889	   upon creation.  By providing an atomic create and set operation for
1890	   the security attribute it is possible to enforce the second and
1891	   fourth requirements.  The recommended attribute FATTR4_SEC_LABEL will
1892	   be used to satisfy this requirement.

1894	6.3.1.  Interpreting FATTR4_SEC_LABEL

1896	   The XDR [22] necessary to implement Labeled NFSv4 is presented below:

1898	   const FATTR4_SEC_LABEL   = 81;

1900	   typedef uint32_t  policy4;

1902	                                 Figure 6

1904	   struct labelformat_spec4 {
1905	           policy4 lfs_lfs;
1906	           policy4 lfs_pi;
1907	   };

1909	   struct sec_label_attr_info {
1910	           labelformat_spec4       slai_lfs;
1911	           opaque                  slai_data<>;
1912	   };

1914	   The FATTR4_SEC_LABEL contains an array of two components with the
1915	   first component being an LFS.  It serves to provide the receiving end
1916	   with the information necessary to translate the security attribute
1917	   into a form that is usable by the endpoint.  Label Formats assigned
1918	   an LFS may optionally choose to include a Policy Identifier field to
1919	   allow for complex policy deployments.  The LFS and Label Format
1920	   Registry are described in detail in [21].  The translation used to
1921	   interpret the security attribute is not specified as part of the
1922	   protocol as it may depend on various factors.  The second component
1923	   is an opaque section which contains the data of the attribute.  This
1924	   component is dependent on the MAC model to interpret and enforce.

1926	   In particular, it is the responsibility of the LFS specification to
1927	   define a maximum size for the opaque section, slai_data<>.  When
1928	   creating or modifying a label for an object, the client needs to be
1929	   guaranteed that the server will accept a label that is sized
1930	   correctly.  By both client and server being part of a specific MAC
1931	   model, the client will be aware of the size.

1933	6.3.2.  Delegations

1935	   In the event that a security attribute is changed on the server while
1936	   a client holds a delegation on the file, the client should follow the
1937	   existing protocol with respect to attribute changes.  It should flush
1938	   all changes back to the server and relinquish the delegation.

1940	6.3.3.  Permission Checking

1942	   It is not feasible to enumerate all possible MAC models and even
1943	   levels of protection within a subset of these models.  This means
1944	   that the NFSv4 client and servers cannot be expected to directly make
1945	   access control decisions based on the security attribute.  Instead
1946	   NFSv4 should defer permission checking on this attribute to the host
1947	   system.  These checks are performed in addition to existing DAC and
1948	   ACL checks outlined in the NFSv4 protocol.  Section 6.6 gives a
1949	   specific example of how the security attribute is handled under a
1950	   particular MAC model.

1952	6.3.4.  Object Creation

1954	   When creating files in NFSv4 the OPEN and CREATE operations are used.
1955	   One of the parameters to these operations is an fattr4 structure
1956	   containing the attributes the file is to be created with.  This
1957	   allows NFSv4 to atomically set the security attribute of files upon
1958	   creation.  When a client is MAC aware it must always provide the
1959	   initial security attribute upon file creation.  In the event that the
1960	   server is the only MAC aware entity in the system it should ignore
1961	   the security attribute specified by the client and instead make the
1962	   determination itself.  A more in depth explanation can be found in
1963	   Section 6.6.

1965	6.3.5.  Existing Objects

1967	   Note that under the MAC model, all objects must have labels.
1968	   Therefore, if an existing server is upgraded to include LNFS support,
1969	   then it is the responsibility of the security system to define the
1970	   behavior for existing objects.  For example, if the security system
1971	   is LFS 0, which means the server just stores and returns labels, then
1972	   existing files should return labels which are set to an empty value.

1974	6.3.6.  Label Changes

1976	   As per the requirements, when a file's security label is modified,
1977	   the server must notify all clients which have the file opened of the
1978	   change in label.  It does so with CB_ATTR_CHANGED.  There are
1979	   preconditions to making an attribute change imposed by NFSv4 and the
1980	   security system might want to impose others.  In the process of
1981	   meeting these preconditions, the server may chose to either serve the
1982	   request in whole or return NFS4ERR_DELAY to the SETATTR operation.

1984	   If there are open delegations on the file belonging to client other
1985	   than the one making the label change, then the process described in
1986	   Section 6.3.2 must be followed.

1988	   As the server is always presented with the subject label from the
1989	   client, it does not necessarily need to communicate the fact that the
1990	   label has changed to the client.  In the cases where the change
1991	   outright denies the client access, the client will be able to quickly
1992	   determine that there is a new label in effect.  It is in cases where
1993	   the client may share the same object between multiple subjects or a
1994	   security system which is not strictly hierarchical that the
1995	   CB_ATTR_CHANGED callback is very useful.  It allows the server to
1996	   inform the clients that the cached security attribute is now stale.

1998	   Consider a system in which the clients enforce MAC checks and and the
1999	   server has a very simple security system which just stores the
2000	   labels.  In this system, the MAC label check always allows access,
2001	   regardless of the subject label.

2003	   The way in which MAC labels are enforced is by the smart client.  So
2004	   if client A changes a security label on a file, then the server MUST
2005	   inform all clients that have the file opened that the label has
2006	   changed via CB_ATTR_CHANGED.  Then the clients MUST retrieve the new
2007	   label and MUST enforce access via the new attribute values.

2009	   [[Comment.7: Describe a LFS of 0, which will be the means to indicate
2010	   such a deployment.  In the current LFR, 0 is marked as reserved.  If
2011	   we use it, then we define the default LFS to be used by a LNFS aware
2012	   server.  I.e., it lets smart clients work together in the face of a
2013	   dumb server.  Note that will supporting this system is optional, it
2014	   will make for a very good debugging mode during development.  I.e.,
2015	   even if a server does not deploy with another security system, this
2016	   mode gets your foot in the door. --TH]]

2018	6.4.  pNFS Considerations

2020	   This section examines the issues in deploying LNFS in a pNFS
2021	   community of servers.

2023	6.4.1.  MAC Label Checks

2025	   The new FATTR4_SEC_LABEL attribute is metadata information and as
2026	   such the DS is not aware of the value contained on the MDS.
2027	   Fortunately, the NFSv4.1 protocol [2] already has provisions for
2028	   doing access level checks from the DS to the MDS.  In order for the
2029	   DS to validate the subject label presented by the client, it SHOULD
2030	   utilize this mechanism.

2032	   If a file's FATTR4_SEC_LABEL is changed, then the MDS should utilize
2033	   CB_ATTR_CHANGED to inform the client of that fact.  If the MDS is
2034	   maintaining

2036	6.5.  Discovery of Server LNFS Support

2038	   The server can easily determine that a client supports LNFS when it
2039	   queries for the FATTR4_SEC_LABEL label for an object.  Note that it
2040	   cannot assume that the presence of RPCSEC_GSSv3 indicates LNFS
2041	   support.  The client might need to discover which LFS the server
2042	   supports.

2044	   A server which supports LNFS MUST allow a client with any subject
2045	   label to retrieve the FATTR4_SEC_LABEL attribute for the root
2046	   filehandle, ROOTFH.  The following compound must always succeed as
2047	   far as a MAC label check is concerned:

2049	        PUTROOTFH, GETATTR {FATTR4_SEC_LABEL}

2051	   Note that the server might have imposed a security flavor on the root
2052	   that precludes such access.  I.e., if the server requires kerberized
2053	   access and the client presents a compound with AUTH_SYS, then the
2054	   server is allowed to return NFS4ERR_WRONGSEC in this case.  But if
2055	   the client presents a correct security flavor, then the server MUST
2056	   return the FATTR4_SEC_LABEL attribute with the supported LFS filled
2057	   in.

2059	6.6.  MAC Security NFS Modes of Operation

2061	   A system using Labeled NFS may operate in three modes.  The first
2062	   mode provides the most protection and is called "full mode".  In this
2063	   mode both the client and server implement a MAC model allowing each
2064	   end to make an access control decision.  The remaining two modes are
2065	   variations on each other and are called "smart client" and "smart
2066	   server" modes.  In these modes one end of the connection is not
2067	   implementing a MAC model and because of this these operating modes
2068	   offer less protection than full mode.

2070	6.6.1.  Full Mode

2072	   Full mode environments consist of MAC aware NFSv4 servers and clients
2073	   and may be composed of mixed MAC models and policies.  The system
2074	   requires that both the client and server have an opportunity to
2075	   perform an access control check based on all relevant information
2076	   within the network.  The file object security attribute is provided
2077	   using the mechanism described in Section 6.3.  The security attribute
2078	   of the subject making the request is transported at the RPC layer
2079	   using the mechanism described in RPCSECGSSv3 [5].

2081	6.6.1.1.  Initial Labeling and Translation

2083	   The ability to create a file is an action that a MAC model may wish
2084	   to mediate.  The client is given the responsibility to determine the
2085	   initial security attribute to be placed on a file.  This allows the
2086	   client to make a decision as to the acceptable security attributes to
2087	   create a file with before sending the request to the server.  Once
2088	   the server receives the creation request from the client it may
2089	   choose to evaluate if the security attribute is acceptable.

2091	   Security attributes on the client and server may vary based on MAC
2092	   model and policy.  To handle this the security attribute field has an
2093	   LFS component.  This component is a mechanism for the host to
2094	   identify the format and meaning of the opaque portion of the security
2095	   attribute.  A full mode environment may contain hosts operating in
2096	   several different LFSs and DOIs.  In this case a mechanism for
2097	   translating the opaque portion of the security attribute is needed.
2098	   The actual translation function will vary based on MAC model and
2099	   policy and is out of the scope of this document.  If a translation is
2100	   unavailable for a given LFS and DOI then the request SHOULD be
2101	   denied.  Another recourse is to allow the host to provide a fallback
2102	   mapping for unknown security attributes.

2104	6.6.1.2.  Policy Enforcement

2106	   In full mode access control decisions are made by both the clients
2107	   and servers.  When a client makes a request it takes the security
2108	   attribute from the requesting process and makes an access control
2109	   decision based on that attribute and the security attribute of the
2110	   object it is trying to access.  If the client denies that access an
2111	   RPC call to the server is never made.  If however the access is
2112	   allowed the client will make a call to the NFS server.

2114	   When the server receives the request from the client it extracts the
2115	   security attribute conveyed in the RPC request.  The server then uses
2116	   this security attribute and the attribute of the object the client is
2117	   trying to access to make an access control decision.  If the server's
2118	   policy allows this access it will fulfill the client's request,
2119	   otherwise it will return NFS4ERR_ACCESS.

2121	   Implementations MAY validate security attributes supplied over the
2122	   network to ensure that they are within a set of attributes permitted
2123	   from a specific peer, and if not, reject them.  Note that a system
2124	   may permit a different set of attributes to be accepted from each
2125	   peer.

2127	6.6.2.  Smart Client Mode

2129	   Smart client environments consist of NFSv4 servers that are not MAC
2130	   aware but NFSv4 clients that are.  Clients in this environment are
2131	   may consist of groups implementing different MAC models policies.
2132	   The system requires that all clients in the environment be
2133	   responsible for access control checks.  Due to the amount of trust
2134	   placed in the clients this mode is only to be used in a trusted
2135	   environment.

2137	6.6.2.1.  Initial Labeling and Translation

2139	   Just like in full mode the client is responsible for determining the
2140	   initial label upon object creation.  The server in smart client mode
2141	   does not implement a MAC model, however, it may provide the ability
2142	   to restrict the creation and labeling of object with certain labels
2143	   based on different criteria as described in Section 6.6.1.2.

2145	   In a smart client environment a group of clients operate in a single
2146	   DOI.  This removes the need for the clients to maintain a set of DOI
2147	   translations.  Servers should provide a method to allow different
2148	   groups of clients to access the server at the same time.  However it
2149	   should not let two groups of clients operating in different DOIs to
2150	   access the same files.

2152	6.6.2.2.  Policy Enforcement

2154	   In smart client mode access control decisions are made by the
2155	   clients.  When a client accesses an object it obtains the security
2156	   attribute of the object from the server and combines it with the
2157	   security attribute of the process making the request to make an
2158	   access control decision.  This check is in addition to the DAC checks
2159	   provided by NFSv4 so this may fail based on the DAC criteria even if
2160	   the MAC policy grants access.  As the policy check is located on the
2161	   client an access control denial should take the form that is native
2162	   to the platform.

2164	6.6.3.  Smart Server Mode

2166	   Smart server environments consist of NFSv4 servers that are MAC aware
2167	   and one or more MAC unaware clients.  The server is the only entity
2168	   enforcing policy, and may selectively provide standard NFS services
2169	   to clients based on their authentication credentials and/or
2170	   associated network attributes (e.g., IP address, network interface).
2171	   The level of trust and access extended to a client in this mode is
2172	   configuration-specific.

2174	6.6.3.1.  Initial Labeling and Translation

2176	   In smart server mode all labeling and access control decisions are
2177	   performed by the NFSv4 server.  In this environment the NFSv4 clients
2178	   are not MAC aware so they cannot provide input into the access
2179	   control decision.  This requires the server to determine the initial
2180	   labeling of objects.  Normally the subject to use in this calculation
2181	   would originate from the client.  Instead the NFSv4 server may choose
2182	   to assign the subject security attribute based on their
2183	   authentication credentials and/or associated network attributes
2184	   (e.g., IP address, network interface).

2186	   In smart server mode security attributes are contained solely within
2187	   the NFSv4 server.  This means that all security attributes used in
2188	   the system remain within a single LFS and DOI.  Since security
2189	   attributes will not cross DOIs or change format there is no need to
2190	   provide any translation functionality above that which is needed
2191	   internally by the MAC model.

2193	6.6.3.2.  Policy Enforcement

2195	   All access control decisions in smart server mode are made by the
2196	   server.  The server will assign the subject a security attribute
2197	   based on some criteria (e.g., IP address, network interface).  Using
2198	   the newly calculated security attribute and the security attribute of
2199	   the object being requested the MAC model makes the access control
2200	   check and returns NFS4ERR_ACCESS on a denial and NFS4_OK on success.
2201	   This check is done transparently to the client so if the MAC
2202	   permission check fails the client may be unaware of the reason for
2203	   the permission failure.  When operating in this mode administrators
2204	   attempting to debug permission failures should be aware to check the
2205	   MAC policy running on the server in addition to the DAC settings.

2207	6.7.  Security Considerations

2209	   This entire document deals with security issues.

2211	   Depending on the level of protection the MAC system offers there may
2212	   be a requirement to tightly bind the security attribute to the data.

2214	   When only one of the client or server enforces labels, it is
2215	   important to realize that the other side is not enforcing MAC
2216	   protections.  Alternate methods might be in use to handle the lack of
2217	   MAC support and care should be taken to identify and mitigate threats
2218	   from possible tampering outside of these methods.

2220	   An example of this is that a server that modifies READDIR or LOOKUP
2221	   results based on the client's subject label might want to always
2222	   construct the same subject label for a client which does not present
2223	   one.  This will prevent a non-LNFS client from mixing entries in the
2224	   directory cache.

2226	7.  Sharing change attribute implementation details with NFSv4 clients

2228	7.1.  Introduction

2230	   Although both the NFSv4 [10] and NFSv4.1 protocol [2], define the
2231	   change attribute as being mandatory to implement, there is little in
2232	   the way of guidance.  The only feature that is mandated by them is
2233	   that the value must change whenever the file data or metadata change.

2235	   While this allows for a wide range of implementations, it also leaves
2236	   the client with a conundrum: how does it determine which is the most
2237	   recent value for the change attribute in a case where several RPC
2238	   calls have been issued in parallel?  In other words if two COMPOUNDs,
2239	   both containing WRITE and GETATTR requests for the same file, have
2240	   been issued in parallel, how does the client determine which of the
2241	   two change attribute values returned in the replies to the GETATTR
2242	   requests corresponds to the most recent state of the file?  In some
2243	   cases, the only recourse may be to send another COMPOUND containing a
2244	   third GETATTR that is fully serialised with the first two.

2246	   NFSv4.2 avoids this kind of inefficiency by allowing the server to
2247	   share details about how the change attribute is expected to evolve,
2248	   so that the client may immediately determine which, out of the
2249	   several change attribute values returned by the server, is the most
2250	   recent.

2252	7.2.  Definition of the 'change_attr_type' per-file system attribute

2254	   enum change_attr_typeinfo {
2255	              NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR         = 0,
2256	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER        = 1,
2257	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2,
2258	              NFS4_CHANGE_TYPE_IS_TIME_METADATA          = 3,
2259	              NFS4_CHANGE_TYPE_IS_UNDEFINED              = 4
2260	   };

2262	        +------------------+----+---------------------------+-----+
2263	        | Name             | Id | Data Type                 | Acc |
2264	        +------------------+----+---------------------------+-----+
2265	        | change_attr_type | XX | enum change_attr_typeinfo | R   |
2266	        +------------------+----+---------------------------+-----+

2268	   The solution enables the NFS server to provide additional information
2269	   about how it expects the change attribute value to evolve after the
2270	   file data or metadata has changed. 'change_attr_type' is defined as a
2271	   new recommended attribute, and takes values from enum
2272	   change_attr_typeinfo as follows:

2274	   NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR:  The change attribute value MUST
2275	      monotonically increase for every atomic change to the file
2276	      attributes, data or directory contents.

2278	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER:  The change attribute value MUST
2279	      be incremented by one unit for every atomic change to the file
2280	      attributes, data or directory contents.  This property is
2281	      preserved when writing to pNFS data servers.

2283	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS:  The change attribute
2284	      value MUST be incremented by one unit for every atomic change to
2285	      the file attributes, data or directory contents.  In the case
2286	      where the client is writing to pNFS data servers, the number of
2287	      increments is not guaranteed to exactly match the number of
2288	      writes.

2290	   NFS4_CHANGE_TYPE_IS_TIME_METADATA:  The change attribute is
2291	      implemented as suggested in the NFSv4 spec [10] in terms of the
2292	      time_metadata attribute.

2294	   NFS4_CHANGE_TYPE_IS_UNDEFINED:  The change attribute does not take
2295	      values that fit into any of these categories.

2297	   If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR,
2298	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or
2299	   NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at
2300	   the very least that the change attribute is monotonically increasing,
2301	   which is sufficient to resolve the question of which value is the
2302	   most recent.

2304	   If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then
2305	   by inspecting the value of the 'time_delta' attribute it additionally
2306	   has the option of detecting rogue server implementations that use
2307	   time_metadata in violation of the spec.

2309	   Finally, if the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it
2310	   has the ability to predict what the resulting change attribute value
2311	   should be after a COMPOUND containing a SETATTR, WRITE, or CREATE.
2312	   This again allows it to detect changes made in parallel by another
2313	   client.  The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits
2314	   the same, but only if the client is not doing pNFS WRITEs.

2316	8.  Security Considerations

2318	9.  Operations: REQUIRED, RECOMMENDED, or OPTIONAL

2320	   The following tables summarize the operations of the NFSv4.2 protocol
2321	   and the corresponding designation of REQUIRED, RECOMMENDED, and
2322	   OPTIONAL to implement or MUST NOT implement.  The designation of MUST
2323	   NOT implement is reserved for those operations that were defined in
2324	   either NFSv4.0 or NFSV4.1 and MUST NOT be implemented in NFSv4.2.

2326	   For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation
2327	   for operations sent by the client is for the server implementation.
2328	   The client is generally required to implement the operations needed
2329	   for the operating environment for which it serves.  For example, a
2330	   read-only NFSv4.2 client would have no need to implement the WRITE
2331	   operation and is not required to do so.

2333	   The REQUIRED or OPTIONAL designation for callback operations sent by
2334	   the server is for both the client and server.  Generally, the client
2335	   has the option of creating the backchannel and sending the operations
2336	   on the fore channel that will be a catalyst for the server sending
2337	   callback operations.  A partial exception is CB_RECALL_SLOT; the only
2338	   way the client can avoid supporting this operation is by not creating
2339	   a backchannel.

2341	   Since this is a summary of the operations and their designation,
2342	   there are subtleties that are not presented here.  Therefore, if
2343	   there is a question of the requirements of implementation, the
2344	   operation descriptions themselves must be consulted along with other
2345	   relevant explanatory text within this either specification or that of
2346	   NFSv4.1 [2]..

2348	   The abbreviations used in the second and third columns of the table
2349	   are defined as follows.

2351	   REQ  REQUIRED to implement

2353	   REC  RECOMMEND to implement

2355	   OPT  OPTIONAL to implement

2357	   MNI  MUST NOT implement

2359	   For the NFSv4.2 features that are OPTIONAL, the operations that
2360	   support those features are OPTIONAL, and the server would return
2361	   NFS4ERR_NOTSUPP in response to the client's use of those operations.
2362	   If an OPTIONAL feature is supported, it is possible that a set of
2363	   operations related to the feature become REQUIRED to implement.  The
2364	   third column of the table designates the feature(s) and if the
2365	   operation is REQUIRED or OPTIONAL in the presence of support for the
2366	   feature.

2368	   The OPTIONAL features identified and their abbreviations are as
2369	   follows:

2371	   pNFS  Parallel NFS

2373	   FDELG  File Delegations

2375	   DDELG  Directory Delegations

2377	   COPY  Server Side Copy

2379	   ADB  Application Data Blocks

2381	                                Operations

2383	   +----------------------+--------------------+-----------------------+
2384	   | Operation            | REQ, REC, OPT, or  | Feature (REQ, REC, or |
2385	   |                      | MNI                | OPT)                  |
2386	   +----------------------+--------------------+-----------------------+
2387	   | ACCESS               | REQ                |                       |
2388	   | BACKCHANNEL_CTL      | REQ                |                       |
2389	   | BIND_CONN_TO_SESSION | REQ                |                       |
2390	   | CLOSE                | REQ                |                       |
2391	   | COMMIT               | REQ                |                       |
2392	   | COPY                 | OPT                | COPY (REQ)            |
2393	   | COPY_ABORT           | OPT                | COPY (REQ)            |
2394	   | COPY_NOTIFY          | OPT                | COPY (REQ)            |
2395	   | COPY_REVOKE          | OPT                | COPY (REQ)            |
2396	   | COPY_STATUS          | OPT                | COPY (REQ)            |
2397	   | CREATE               | REQ                |                       |
2398	   | CREATE_SESSION       | REQ                |                       |
2399	   | DELEGPURGE           | OPT                | FDELG (REQ)           |
2400	   | DELEGRETURN          | OPT                | FDELG, DDELG, pNFS    |
2401	   |                      |                    | (REQ)                 |
2402	   | DESTROY_CLIENTID     | REQ                |                       |
2403	   | DESTROY_SESSION      | REQ                |                       |
2404	   | EXCHANGE_ID          | REQ                |                       |
2405	   | FREE_STATEID         | REQ                |                       |
2406	   | GETATTR              | REQ                |                       |
2407	   | GETDEVICEINFO        | OPT                | pNFS (REQ)            |
2408	   | GETDEVICELIST        | OPT                | pNFS (OPT)            |
2409	   | GETFH                | REQ                |                       |
2410	   | INITIALIZE           | OPT                | ADB (REQ)             |
2411	   | GET_DIR_DELEGATION   | OPT                | DDELG (REQ)           |
2412	   | LAYOUTCOMMIT         | OPT                | pNFS (REQ)            |
2413	   | LAYOUTGET            | OPT                | pNFS (REQ)            |
2414	   | LAYOUTRETURN         | OPT                | pNFS (REQ)            |
2415	   | LINK                 | OPT                |                       |
2416	   | LOCK                 | REQ                |                       |
2417	   | LOCKT                | REQ                |                       |
2418	   | LOCKU                | REQ                |                       |
2419	   | LOOKUP               | REQ                |                       |
2420	   | LOOKUPP              | REQ                |                       |
2421	   | NVERIFY              | REQ                |                       |
2422	   | OPEN                 | REQ                |                       |
2423	   | OPENATTR             | OPT                |                       |
2424	   | OPEN_CONFIRM         | MNI                |                       |
2425	   | OPEN_DOWNGRADE       | REQ                |                       |
2426	   | PUTFH                | REQ                |                       |
2427	   | PUTPUBFH             | REQ                |                       |
2428	   | PUTROOTFH            | REQ                |                       |
2429	   | READ                 | OPT                |                       |
2430	   | READDIR              | REQ                |                       |
2431	   | READLINK             | OPT                |                       |
2432	   | READ_PLUS            | OPT                | ADB (REQ)             |
2433	   | RECLAIM_COMPLETE     | REQ                |                       |
2434	   | RELEASE_LOCKOWNER    | MNI                |                       |
2435	   | REMOVE               | REQ                |                       |
2436	   | RENAME               | REQ                |                       |
2437	   | RENEW                | MNI                |                       |
2438	   | RESTOREFH            | REQ                |                       |
2439	   | SAVEFH               | REQ                |                       |
2440	   | SECINFO              | REQ                |                       |
2441	   | SECINFO_NO_NAME      | REC                | pNFS file layout      |
2442	   |                      |                    | (REQ)                 |
2443	   | SEQUENCE             | REQ                |                       |
2444	   | SETATTR              | REQ                |                       |
2445	   | SETCLIENTID          | MNI                |                       |
2446	   | SETCLIENTID_CONFIRM  | MNI                |                       |
2447	   | SET_SSV              | REQ                |                       |
2448	   | TEST_STATEID         | REQ                |                       |
2449	   | VERIFY               | REQ                |                       |
2450	   | WANT_DELEGATION      | OPT                | FDELG (OPT)           |
2451	   | WRITE                | REQ                |                       |
2452	   +----------------------+--------------------+-----------------------+
2453	                            Callback Operations

2455	   +-------------------------+-------------------+---------------------+
2456	   | Operation               | REQ, REC, OPT, or | Feature (REQ, REC,  |
2457	   |                         | MNI               | or OPT)             |
2458	   +-------------------------+-------------------+---------------------+
2459	   | CB_COPY                 | OPT               | COPY (REQ)          |
2460	   | CB_GETATTR              | OPT               | FDELG (REQ)         |
2461	   | CB_LAYOUTRECALL         | OPT               | pNFS (REQ)          |
2462	   | CB_NOTIFY               | OPT               | DDELG (REQ)         |
2463	   | CB_NOTIFY_DEVICEID      | OPT               | pNFS (OPT)          |
2464	   | CB_NOTIFY_LOCK          | OPT               |                     |
2465	   | CB_PUSH_DELEG           | OPT               | FDELG (OPT)         |
2466	   | CB_RECALL               | OPT               | FDELG, DDELG, pNFS  |
2467	   |                         |                   | (REQ)               |
2468	   | CB_RECALL_ANY           | OPT               | FDELG, DDELG, pNFS  |
2469	   |                         |                   | (REQ)               |
2470	   | CB_RECALL_SLOT          | REQ               |                     |
2471	   | CB_RECALLABLE_OBJ_AVAIL | OPT               | DDELG, pNFS (REQ)   |
2472	   | CB_SEQUENCE             | OPT               | FDELG, DDELG, pNFS  |
2473	   |                         |                   | (REQ)               |
2474	   | CB_WANTS_CANCELLED      | OPT               | FDELG, DDELG, pNFS  |
2475	   |                         |                   | (REQ)               |
2476	   +-------------------------+-------------------+---------------------+

2478	10.  NFSv4.2 Operations

2480	10.1.  Operation 59: COPY - Initiate a server-side copy

2482	10.1.1.  ARGUMENT

2484	   const COPY4_GUARDED     = 0x00000001;
2485	   const COPY4_METADATA    = 0x00000002;

2487	   struct COPY4args {
2488	           /* SAVED_FH: source file */
2489	           /* CURRENT_FH: destination file or */
2490	           /*             directory           */
2491	           offset4         ca_src_offset;
2492	           offset4         ca_dst_offset;
2493	           length4         ca_count;
2494	           uint32_t        ca_flags;
2495	           component4      ca_destination;
2496	           netloc4         ca_source_server<>;
2497	   };

2499	10.1.2.  RESULT

2501	   union COPY4res switch (nfsstat4 cr_status) {
2502	           case NFS4_OK:
2503	                   stateid4        cr_callback_id<1>;
2504	           default:
2505	                   length4         cr_bytes_copied;
2506	   };

2508	10.1.3.  DESCRIPTION

2510	   The COPY operation is used for both intra-server and inter-server
2511	   copies.  In both cases, the COPY is always sent from the client to
2512	   the destination server of the file copy.  The COPY operation requests
2513	   that a file be copied from the location specified by the SAVED_FH
2514	   value to the location specified by the combination of CURRENT_FH and
2515	   ca_destination.

2517	   The SAVED_FH must be a regular file.  If SAVED_FH is not a regular
2518	   file, the operation MUST fail and return NFS4ERR_WRONG_TYPE.

2520	   In order to set SAVED_FH to the source file handle, the compound
2521	   procedure requesting the COPY will include a sub-sequence of
2522	   operations such as

2524	      PUTFH source-fh
2525	      SAVEFH

2527	   If the request is for a server-to-server copy, the source-fh is a
2528	   filehandle from the source server and the compound procedure is being
2529	   executed on the destination server.  In this case, the source-fh is a
2530	   foreign filehandle on the server receiving the COPY request.  If
2531	   either PUTFH or SAVEFH checked the validity of the filehandle, the
2532	   operation would likely fail and return NFS4ERR_STALE.

2534	   In order to avoid this problem, the minor version incorporating the
2535	   COPY operations will need to make a few small changes in the handling
2536	   of existing operations.  If a server supports the server-to-server
2537	   COPY feature, a PUTFH followed by a SAVEFH MUST NOT return
2538	   NFS4ERR_STALE for either operation.  These restrictions do not pose
2539	   substantial difficulties for servers.  The CURRENT_FH and SAVED_FH
2540	   may be validated in the context of the operation referencing them and
2541	   an NFS4ERR_STALE error returned for an invalid file handle at that
2542	   point.

2544	   The CURRENT_FH and ca_destination together specify the destination of
2545	   the copy operation.  If ca_destination is of 0 (zero) length, then
2546	   CURRENT_FH specifies the target file.  In this case, CURRENT_FH MUST
2547	   be a regular file and not a directory.  If ca_destination is not of 0
2548	   (zero) length, the ca_destination argument specifies the file name to
2549	   which the data will be copied within the directory identified by
2550	   CURRENT_FH.  In this case, CURRENT_FH MUST be a directory and not a
2551	   regular file.

2553	   If the file named by ca_destination does not exist and the operation
2554	   completes successfully, the file will be visible in the file system
2555	   namespace.  If the file does not exist and the operation fails, the
2556	   file MAY be visible in the file system namespace depending on when
2557	   the failure occurs and on the implementation of the NFS server
2558	   receiving the COPY operation.  If the ca_destination name cannot be
2559	   created in the destination file system (due to file name
2560	   restrictions, such as case or length), the operation MUST fail.

2562	   The ca_src_offset is the offset within the source file from which the
2563	   data will be read, the ca_dst_offset is the offset within the
2564	   destination file to which the data will be written, and the ca_count
2565	   is the number of bytes that will be copied.  An offset of 0 (zero)
2566	   specifies the start of the file.  A count of 0 (zero) requests that
2567	   all bytes from ca_src_offset through EOF be copied to the
2568	   destination.  If concurrent modifications to the source file overlap
2569	   with the source file region being copied, the data copied may include
2570	   all, some, or none of the modifications.  The client can use standard
2571	   NFS operations (e.g., OPEN with OPEN4_SHARE_DENY_WRITE or mandatory
2572	   byte range locks) to protect against concurrent modifications if the
2573	   client is concerned about this.  If the source file's end of file is
2574	   being modified in parallel with a copy that specifies a count of 0
2575	   (zero) bytes, the amount of data copied is implementation dependent
2576	   (clients may guard against this case by specifying a non-zero count
2577	   value or preventing modification of the source file as mentioned
2578	   above).

2580	   If the source offset or the source offset plus count is greater than
2581	   or equal to the size of the source file, the operation will fail with
2582	   NFS4ERR_INVAL.  The destination offset or destination offset plus
2583	   count may be greater than the size of the destination file.  This
2584	   allows for the client to issue parallel copies to implement
2585	   operations such as "cat file1 file2 file3 file4 > dest".

2587	   If the destination file is created as a result of this command, the
2588	   destination file's size will be equal to the number of bytes
2589	   successfully copied.  If the destination file already existed, the
2590	   destination file's size may increase as a result of this operation
2591	   (e.g. if ca_dst_offset plus ca_count is greater than the
2592	   destination's initial size).

2594	   If the ca_source_server list is specified, then this is an inter-
2595	   server copy operation and the source file is on a remote server.  The
2596	   client is expected to have previously issued a successful COPY_NOTIFY
2597	   request to the remote source server.  The ca_source_server list
2598	   SHOULD be the same as the COPY_NOTIFY response's cnr_source_server
2599	   list.  If the client includes the entries from the COPY_NOTIFY
2600	   response's cnr_source_server list in the ca_source_server list, the
2601	   source server can indicate a specific copy protocol for the
2602	   destination server to use by returning a URL, which specifies both a
2603	   protocol service and server name.  Server-to-server copy protocol
2604	   considerations are described in Section 2.2.3 and Section 2.4.1.

2606	   The ca_flags argument allows the copy operation to be customized in
2607	   the following ways using the guarded flag (COPY4_GUARDED) and the
2608	   metadata flag (COPY4_METADATA).

2610	   If the guarded flag is set and the destination exists on the server,
2611	   this operation will fail with NFS4ERR_EXIST.

2613	   If the guarded flag is not set and the destination exists on the
2614	   server, the behavior is implementation dependent.

2616	   If the metadata flag is set and the client is requesting a whole file
2617	   copy (i.e., ca_count is 0 (zero)), a subset of the destination file's
2618	   attributes MUST be the same as the source file's corresponding
2619	   attributes and a subset of the destination file's attributes SHOULD
2620	   be the same as the source file's corresponding attributes.  The
2621	   attributes in the MUST and SHOULD copy subsets will be defined for
2622	   each NFS version.

2624	   For NFSv4.1, Table 2 and Table 3 list the REQUIRED and RECOMMENDED
2625	   attributes respectively.  A "MUST" in the "Copy to destination file?"
2626	   column indicates that the attribute is part of the MUST copy set.  A
2627	   "SHOULD" in the "Copy to destination file?" column indicates that the
2628	   attribute is part of the SHOULD copy set.

2630	          +--------------------+----+---------------------------+
2631	          | Name               | Id | Copy to destination file? |
2632	          +--------------------+----+---------------------------+
2633	          | supported_attrs    | 0  | no                        |
2634	          | type               | 1  | MUST                      |
2635	          | fh_expire_type     | 2  | no                        |
2636	          | change             | 3  | SHOULD                    |
2637	          | size               | 4  | MUST                      |
2638	          | link_support       | 5  | no                        |
2639	          | symlink_support    | 6  | no                        |
2640	          | named_attr         | 7  | no                        |
2641	          | fsid               | 8  | no                        |
2642	          | unique_handles     | 9  | no                        |
2643	          | lease_time         | 10 | no                        |
2644	          | rdattr_error       | 11 | no                        |
2645	          | filehandle         | 19 | no                        |
2646	          | suppattr_exclcreat | 75 | no                        |
2647	          +--------------------+----+---------------------------+

2649	                                  Table 2

2651	          +--------------------+----+---------------------------+
2652	          | Name               | Id | Copy to destination file? |
2653	          +--------------------+----+---------------------------+
2654	          | acl                | 12 | MUST                      |
2655	          | aclsupport         | 13 | no                        |
2656	          | archive            | 14 | no                        |
2657	          | cansettime         | 15 | no                        |
2658	          | case_insensitive   | 16 | no                        |
2659	          | case_preserving    | 17 | no                        |
2660	          | change_policy      | 60 | no                        |
2661	          | chown_restricted   | 18 | MUST                      |
2662	          | dacl               | 58 | MUST                      |
2663	          | dir_notif_delay    | 56 | no                        |
2664	          | dirent_notif_delay | 57 | no                        |
2665	          | fileid             | 20 | no                        |
2666	          | files_avail        | 21 | no                        |
2667	          | files_free         | 22 | no                        |
2668	          | files_total        | 23 | no                        |
2669	          | fs_charset_cap     | 76 | no                        |
2670	          | fs_layout_type     | 62 | no                        |
2671	          | fs_locations       | 24 | no                        |
2672	          | fs_locations_info  | 67 | no                        |
2673	          | fs_status          | 61 | no                        |
2674	          | hidden             | 25 | MUST                      |
2675	          | homogeneous        | 26 | no                        |
2676	          | layout_alignment   | 66 | no                        |
2677	          | layout_blksize     | 65 | no                        |
2678	          | layout_hint        | 63 | no                        |
2679	          | layout_type        | 64 | no                        |
2680	          | maxfilesize        | 27 | no                        |
2681	          | maxlink            | 28 | no                        |
2682	          | maxname            | 29 | no                        |
2683	          | maxread            | 30 | no                        |
2684	          | maxwrite           | 31 | no                        |
2685	          | max_hole_punch     | 31 | no                        |
2686	          | mdsthreshold       | 68 | no                        |
2687	          | mimetype           | 32 | MUST                      |
2688	          | mode               | 33 | MUST                      |
2689	          | mode_set_masked    | 74 | no                        |
2690	          | mounted_on_fileid  | 55 | no                        |
2691	          | no_trunc           | 34 | no                        |
2692	          | numlinks           | 35 | no                        |
2693	          | owner              | 36 | MUST                      |
2694	          | owner_group        | 37 | MUST                      |
2695	          | quota_avail_hard   | 38 | no                        |
2696	          | quota_avail_soft   | 39 | no                        |
2697	          | quota_used         | 40 | no                        |
2698	          | rawdev             | 41 | no                        |
2699	          | retentevt_get      | 71 | MUST                      |
2700	          | retentevt_set      | 72 | no                        |
2701	          | retention_get      | 69 | MUST                      |
2702	          | retention_hold     | 73 | MUST                      |
2703	          | retention_set      | 70 | no                        |
2704	          | sacl               | 59 | MUST                      |
2705	          | space_avail        | 42 | no                        |
2706	          | space_free         | 43 | no                        |
2707	          | space_freed        | 78 | no                        |
2708	          | space_reserved     | 77 | MUST                      |
2709	          | space_total        | 44 | no                        |
2710	          | space_used         | 45 | no                        |
2711	          | system             | 46 | MUST                      |
2712	          | time_access        | 47 | MUST                      |
2713	          | time_access_set    | 48 | no                        |
2714	          | time_backup        | 49 | no                        |
2715	          | time_create        | 50 | MUST                      |
2716	          | time_delta         | 51 | no                        |
2717	          | time_metadata      | 52 | SHOULD                    |
2718	          | time_modify        | 53 | MUST                      |
2719	          | time_modify_set    | 54 | no                        |
2720	          +--------------------+----+---------------------------+

2722	                                  Table 3

2724	   [NOTE: The source file's attribute values will take precedence over
2725	   any attribute values inherited by the destination file.]
2726	   In the case of an inter-server copy or an intra-server copy between
2727	   file systems, the attributes supported for the source file and
2728	   destination file could be different.  By definition,the REQUIRED
2729	   attributes will be supported in all cases.  If the metadata flag is
2730	   set and the source file has a RECOMMENDED attribute that is not
2731	   supported for the destination file, the copy MUST fail with
2732	   NFS4ERR_ATTRNOTSUPP.

2734	   Any attribute supported by the destination server that is not set on
2735	   the source file SHOULD be left unset.

2737	   Metadata attributes not exposed via the NFS protocol SHOULD be copied
2738	   to the destination file where appropriate.

2740	   The destination file's named attributes are not duplicated from the
2741	   source file.  After the copy process completes, the client MAY
2742	   attempt to duplicate named attributes using standard NFSv4
2743	   operations.  However, the destination file's named attribute
2744	   capabilities MAY be different from the source file's named attribute
2745	   capabilities.

2747	   If the metadata flag is not set and the client is requesting a whole
2748	   file copy (i.e., ca_count is 0 (zero)), the destination file's
2749	   metadata is implementation dependent.

2751	   If the client is requesting a partial file copy (i.e., ca_count is
2752	   not 0 (zero)), the client SHOULD NOT set the metadata flag and the
2753	   server MUST ignore the metadata flag.

2755	   If the operation does not result in an immediate failure, the server
2756	   will return NFS4_OK, and the CURRENT_FH will remain the destination's
2757	   filehandle.

2759	   If an immediate failure does occur, cr_bytes_copied will be set to
2760	   the number of bytes copied to the destination file before the error
2761	   occurred.  The cr_bytes_copied value indicates the number of bytes
2762	   copied but not which specific bytes have been copied.

2764	   A return of NFS4_OK indicates that either the operation is complete
2765	   or the operation was initiated and a callback will be used to deliver
2766	   the final status of the operation.

2768	   If the cr_callback_id is returned, this indicates that the operation
2769	   was initiated and a CB_COPY callback will deliver the final results
2770	   of the operation.  The cr_callback_id stateid is termed a copy
2771	   stateid in this context.  The server is given the option of returning
2772	   the results in a callback because the data may require a relatively
2773	   long period of time to copy.

2775	   If no cr_callback_id is returned, the operation completed
2776	   synchronously and no callback will be issued by the server.  The
2777	   completion status of the operation is indicated by cr_status.

2779	   If the copy completes successfully, either synchronously or
2780	   asynchronously, the data copied from the source file to the
2781	   destination file MUST appear identical to the NFS client.  However,
2782	   the NFS server's on disk representation of the data in the source
2783	   file and destination file MAY differ.  For example, the NFS server
2784	   might encrypt, compress, deduplicate, or otherwise represent the on
2785	   disk data in the source and destination file differently.

2787	   In the event of a failure the state of the destination file is
2788	   implementation dependent.  The COPY operation may fail for the
2789	   following reasons (this is a partial list).

2791	   NFS4ERR_MOVED:  The file system which contains the source file, or
2792	      the destination file or directory is not present.  The client can
2793	      determine the correct location and reissue the operation with the
2794	      correct location.

2796	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
2797	      NFS server receiving this request.

2799	   NFS4ERR_PARTNER_NOTSUPP:  The remote server does not support the
2800	      server-to-server copy offload protocol.

2802	   NFS4ERR_OFFLOAD_DENIED:  The copy offload operation is supported by
2803	      both the source and the destination, but the destination is not
2804	      allowing it for this file.  If the client sees this error, it
2805	      should fall back to the normal copy semantics.

2807	   NFS4ERR_PARTNER_NO_AUTH:  The remote server does not authorize a
2808	      server-to-server copy offload operation.  This may be due to the
2809	      client's failure to send the COPY_NOTIFY operation to the remote
2810	      server, the remote server receiving a server-to-server copy
2811	      offload request after the copy lease time expired, or for some
2812	      other permission problem.

2814	   NFS4ERR_FBIG:  The copy operation would have caused the file to grow
2815	      beyond the server's limit.

2817	   NFS4ERR_NOTDIR:  The CURRENT_FH is a file and ca_destination has non-
2818	      zero length.

2820	   NFS4ERR_WRONG_TYPE:  The SAVED_FH is not a regular file.

2822	   NFS4ERR_ISDIR:  The CURRENT_FH is a directory and ca_destination has
2823	      zero length.

2825	   NFS4ERR_INVAL:  The source offset or offset plus count are greater
2826	      than or equal to the size of the source file.

2828	   NFS4ERR_DELAY:  The server does not have the resources to perform the
2829	      copy operation at the current time.  The client should retry the
2830	      operation sometime in the future.

2832	   NFS4ERR_METADATA_NOTSUPP:  The destination file cannot support the
2833	      same metadata as the source file.

2835	   NFS4ERR_WRONGSEC:  The security mechanism being used by the client
2836	      does not match the server's security policy.

2838	10.2.  Operation 60: COPY_ABORT - Cancel a server-side copy

2840	10.2.1.  ARGUMENT

2842	   struct COPY_ABORT4args {
2843	           /* CURRENT_FH: desination file */
2844	           stateid4        caa_stateid;
2845	   };

2847	10.2.2.  RESULT

2849	   struct COPY_ABORT4res {
2850	           nfsstat4        car_status;
2851	   };

2853	10.2.3.  DESCRIPTION

2855	   COPY_ABORT is used for both intra- and inter-server asynchronous
2856	   copies.  The COPY_ABORT operation allows the client to cancel a
2857	   server-side copy operation that it initiated.  This operation is sent
2858	   in a COMPOUND request from the client to the destination server.
2859	   This operation may be used to cancel a copy when the application that
2860	   requested the copy exits before the operation is completed or for
2861	   some other reason.

2863	   The request contains the filehandle and copy stateid cookies that act
2864	   as the context for the previously initiated copy operation.

2866	   The result's car_status field indicates whether the cancel was
2867	   successful or not.  A value of NFS4_OK indicates that the copy
2868	   operation was canceled and no callback will be issued by the server.
2869	   A copy operation that is successfully canceled may result in none,
2870	   some, or all of the data copied.

2872	   If the server supports asynchronous copies, the server is REQUIRED to
2873	   support the COPY_ABORT operation.

2875	   The COPY_ABORT operation may fail for the following reasons (this is
2876	   a partial list):

2878	   NFS4ERR_NOTSUPP:  The abort operation is not supported by the NFS
2879	      server receiving this request.

2881	   NFS4ERR_RETRY:  The abort failed, but a retry at some time in the
2882	      future MAY succeed.

2884	   NFS4ERR_COMPLETE_ALREADY:  The abort failed, and a callback will
2885	      deliver the results of the copy operation.

2887	   NFS4ERR_SERVERFAULT:  An error occurred on the server that does not
2888	      map to a specific error code.

2890	10.3.  Operation 61: COPY_NOTIFY - Notify a source server of a future
2891	       copy

2893	10.3.1.  ARGUMENT

2895	   struct COPY_NOTIFY4args {
2896	           /* CURRENT_FH: source file */
2897	           netloc4         cna_destination_server;
2898	   };

2900	10.3.2.  RESULT

2902	   struct COPY_NOTIFY4resok {
2903	           nfstime4        cnr_lease_time;
2904	           netloc4         cnr_source_server<>;
2905	   };

2907	   union COPY_NOTIFY4res switch (nfsstat4 cnr_status) {
2908	           case NFS4_OK:
2909	                   COPY_NOTIFY4resok       resok4;
2910	           default:
2911	                   void;
2912	   };

2914	10.3.3.  DESCRIPTION

2916	   This operation is used for an inter-server copy.  A client sends this
2917	   operation in a COMPOUND request to the source server to authorize a
2918	   destination server identified by cna_destination_server to read the
2919	   file specified by CURRENT_FH on behalf of the given user.

2921	   The cna_destination_server MUST be specified using the netloc4
2922	   network location format.  The server is not required to resolve the
2923	   cna_destination_server address before completing this operation.

2925	   If this operation succeeds, the source server will allow the
2926	   cna_destination_server to copy the specified file on behalf of the
2927	   given user.  If COPY_NOTIFY succeeds, the destination server is
2928	   granted permission to read the file as long as both of the following
2929	   conditions are met:

2931	   o  The destination server begins reading the source file before the
2932	      cnr_lease_time expires.  If the cnr_lease_time expires while the
2933	      destination server is still reading the source file, the
2934	      destination server is allowed to finish reading the file.

2936	   o  The client has not issued a COPY_REVOKE for the same combination
2937	      of user, filehandle, and destination server.

2939	   The cnr_lease_time is chosen by the source server.  A cnr_lease_time
2940	   of 0 (zero) indicates an infinite lease.  To renew the copy lease
2941	   time the client should resend the same copy notification request to
2942	   the source server.

2944	   To avoid the need for synchronized clocks, copy lease times are
2945	   granted by the server as a time delta.  However, there is a
2946	   requirement that the client and server clocks do not drift
2947	   excessively over the duration of the lease.  There is also the issue
2948	   of propagation delay across the network which could easily be several
2949	   hundred milliseconds as well as the possibility that requests will be
2950	   lost and need to be retransmitted.

2952	   To take propagation delay into account, the client should subtract it
2953	   from copy lease times (e.g., if the client estimates the one-way
2954	   propagation delay as 200 milliseconds, then it can assume that the
2955	   lease is already 200 milliseconds old when it gets it).  In addition,
2956	   it will take another 200 milliseconds to get a response back to the
2957	   server.  So the client must send a lease renewal or send the copy
2958	   offload request to the cna_destination_server at least 400
2959	   milliseconds before the copy lease would expire.  If the propagation
2960	   delay varies over the life of the lease (e.g., the client is on a
2961	   mobile host), the client will need to continuously subtract the
2962	   increase in propagation delay from the copy lease times.

2964	   The server's copy lease period configuration should take into account
2965	   the network distance of the clients that will be accessing the
2966	   server's resources.  It is expected that the lease period will take
2967	   into account the network propagation delays and other network delay
2968	   factors for the client population.  Since the protocol does not allow
2969	   for an automatic method to determine an appropriate copy lease
2970	   period, the server's administrator may have to tune the copy lease
2971	   period.

2973	   A successful response will also contain a list of names, addresses,
2974	   and URLs called cnr_source_server, on which the source is willing to
2975	   accept connections from the destination.  These might not be
2976	   reachable from the client and might be located on networks to which
2977	   the client has no connection.

2979	   If the client wishes to perform an inter-server copy, the client MUST
2980	   send a COPY_NOTIFY to the source server.  Therefore, the source
2981	   server MUST support COPY_NOTIFY.

2983	   For a copy only involving one server (the source and destination are
2984	   on the same server), this operation is unnecessary.

2986	   The COPY_NOTIFY operation may fail for the following reasons (this is
2987	   a partial list):

2989	   NFS4ERR_MOVED:  The file system which contains the source file is not
2990	      present on the source server.  The client can determine the
2991	      correct location and reissue the operation with the correct
2992	      location.

2994	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
2995	      NFS server receiving this request.

2997	   NFS4ERR_WRONGSEC:  The security mechanism being used by the client
2998	      does not match the server's security policy.

3000	10.4.  Operation 62: COPY_REVOKE - Revoke a destination server's copy
3001	       privileges

3003	10.4.1.  ARGUMENT

3005	   struct COPY_REVOKE4args {
3006	           /* CURRENT_FH: source file */
3007	           netloc4         cra_destination_server;
3008	   };

3010	10.4.2.  RESULT

3012	   struct COPY_REVOKE4res {
3013	           nfsstat4        crr_status;
3014	   };

3016	10.4.3.  DESCRIPTION

3018	   This operation is used for an inter-server copy.  A client sends this
3019	   operation in a COMPOUND request to the source server to revoke the
3020	   authorization of a destination server identified by
3021	   cra_destination_server from reading the file specified by CURRENT_FH
3022	   on behalf of given user.  If the cra_destination_server has already
3023	   begun copying the file, a successful return from this operation
3024	   indicates that further access will be prevented.

3026	   The cra_destination_server MUST be specified using the netloc4
3027	   network location format.  The server is not required to resolve the
3028	   cra_destination_server address before completing this operation.

3030	   The COPY_REVOKE operation is useful in situations in which the source
3031	   server granted a very long or infinite lease on the destination
3032	   server's ability to read the source file and all copy operations on
3033	   the source file have been completed.

3035	   For a copy only involving one server (the source and destination are
3036	   on the same server), this operation is unnecessary.

3038	   If the server supports COPY_NOTIFY, the server is REQUIRED to support
3039	   the COPY_REVOKE operation.

3041	   The COPY_REVOKE operation may fail for the following reasons (this is
3042	   a partial list):

3044	   NFS4ERR_MOVED:  The file system which contains the source file is not
3045	      present on the source server.  The client can determine the
3046	      correct location and reissue the operation with the correct
3047	      location.

3049	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
3050	      NFS server receiving this request.

3052	10.5.  Operation 63: COPY_STATUS - Poll for status of a server-side copy

3054	10.5.1.  ARGUMENT

3056	   struct COPY_STATUS4args {
3057	           /* CURRENT_FH: destination file */
3058	           stateid4        csa_stateid;
3059	   };

3061	10.5.2.  RESULT

3063	   struct COPY_STATUS4resok {
3064	           length4         csr_bytes_copied;
3065	           nfsstat4        csr_complete<1>;
3066	   };

3068	   union COPY_STATUS4res switch (nfsstat4 csr_status) {
3069	           case NFS4_OK:
3070	                   COPY_STATUS4resok       resok4;
3071	           default:
3072	                   void;
3073	   };

3075	10.5.3.  DESCRIPTION

3077	   COPY_STATUS is used for both intra- and inter-server asynchronous
3078	   copies.  The COPY_STATUS operation allows the client to poll the
3079	   server to determine the status of an asynchronous copy operation.
3080	   This operation is sent by the client to the destination server.

3082	   If this operation is successful, the number of bytes copied are
3083	   returned to the client in the csr_bytes_copied field.  The
3084	   csr_bytes_copied value indicates the number of bytes copied but not
3085	   which specific bytes have been copied.

3087	   If the optional csr_complete field is present, the copy has
3088	   completed.  In this case the status value indicates the result of the
3089	   asynchronous copy operation.  In all cases, the server will also
3090	   deliver the final results of the asynchronous copy in a CB_COPY
3091	   operation.

3093	   The failure of this operation does not indicate the result of the
3094	   asynchronous copy in any way.

3096	   If the server supports asynchronous copies, the server is REQUIRED to
3097	   support the COPY_STATUS operation.

3099	   The COPY_STATUS operation may fail for the following reasons (this is
3100	   a partial list):

3102	   NFS4ERR_NOTSUPP:  The copy status operation is not supported by the
3103	      NFS server receiving this request.

3105	   NFS4ERR_BAD_STATEID:  The stateid is not valid (see Section 2.3.2
3106	      below).

3108	   NFS4ERR_EXPIRED:  The stateid has expired (see Copy Offload Stateid
3109	      section below).

3111	10.6.  Modification to Operation 42: EXCHANGE_ID - Instantiate Client ID

3113	10.6.1.  ARGUMENT

3115	      /* new */
3116	      const EXCHGID4_FLAG_SUPP_FENCE_OPS      = 0x00000004;

3118	10.6.2.  RESULT

3120	      Unchanged

3122	10.6.3.  MOTIVATION

3124	   Enterprise applications require guarantees that an operation has
3125	   either aborted or completed.  NFSv4.1 provides this guarantee as long
3126	   as the session is alive: simply send a SEQUENCE operation on the same
3127	   slot with a new sequence number, and the successful return of
3128	   SEQUENCE indicates the previous operation has completed.  However, if
3129	   the session is lost, there is no way to know when any in progress
3130	   operations have aborted or completed.  In hindsight, the NFSv4.1
3131	   specification should have mandated that DESTROY_SESSION abort/
3132	   complete all outstanding operations.

3134	10.6.4.  DESCRIPTION

3136	   A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability
3137	   when it sends an EXCHANGE_ID operation.  The server SHOULD set this
3138	   capability in the EXCHANGE_ID reply whether the client requests it or
3139	   not.  If the client ID is created with this capability then the
3140	   following will occur:

3142	   o  The server will not reply to DESTROY_SESSION until all operations
3143	      in progress are completed or aborted.

3145	   o  The server will not reply to subsequent EXCHANGE_ID invoked on the
3146	      same Client Owner with a new verifier until all operations in
3147	      progress on the Client ID's session are completed or aborted.

3149	   o  When DESTROY_CLIENTID is invoked, if there are sessions (both idle
3150	      and non-idle), opens, locks, delegations, layouts, and/or wants
3151	      (Section 18.49) associated with the client ID are removed.
3152	      Pending operations will be completed or aborted before the
3153	      sessions, opens, locks, delegations, layouts, and/or wants are
3154	      deleted.

3156	   o  The NFS server SHOULD support client ID trunking, and if it does
3157	      and the EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a
3158	      session ID created on one node of the storage cluster MUST be
3159	      destroyable via DESTROY_SESSION.  In addition, DESTROY_CLIENTID
3160	      and an EXCHANGE_ID with a new verifier affects all sessions
3161	      regardless what node the sessions were created on.

3163	10.7.  Operation 64: INITIALIZE

3165	   This operation can be used to initialize the structure imposed by an
3166	   application onto a file and to punch a hole into a file.

3168	   The server has no concept of the structure imposed by the
3169	   application.  It is only when the application writes to a section of
3170	   the file does order get imposed.  In order to detect corruption even
3171	   before the application utilizes the file, the application will want
3172	   to initialize a range of ADBs.  It uses the INITIALIZE operation to
3173	   do so.

3175	10.7.1.  ARGUMENT

3177	   /*
3178	    * We use data_content4 in case we wish to
3179	    * extend new types later. Note that we
3180	    * are explicitly disallowing data.
3181	    */
3182	   union initialize_arg4 switch (data_content4 content) {
3183	   case NFS4_CONTENT_APP_BLOCK:
3184	           app_data_block4 ia_adb;
3185	   case NFS4_CONTENT_HOLE:
3186	           hole_info4      ia_hole;
3187	   default:
3188	           void;
3189	   };

3191	   struct INITIALIZE4args {
3192	           /* CURRENT_FH: file */
3193	           stateid4        ia_stateid;
3194	           stable_how4     ia_stable;
3195	           initialize_arg4 ia_data<>;
3196	   };

3198	10.7.2.  RESULT

3200	   struct INITIALIZE4resok {
3201	           count4          ir_count;
3202	           stable_how4     ir_committed;
3203	           verifier4       ir_writeverf;
3204	           data_content4   ir_sparse;
3205	   };

3207	   union INITIALIZE4res switch (nfsstat4 status) {
3208	   case NFS4_OK:
3209	           INITIALIZE4resok        resok4;
3210	   default:
3211	           void;
3212	   };

3214	10.7.3.  DESCRIPTION

3216	   When the client invokes the INITIALIZE operation, it has two desired
3217	   results:

3219	   1.  The structure described by the app_data_block4 be imposed on the
3220	       file.

3222	   2.  The contents described by the app_data_block4 be sparse.

3224	   If the server supports the INITIALIZE operation, it still might not
3225	   support sparse files.  So if it receives the INITIALIZE operation,
3226	   then it MUST populate the contents of the file with the initialized
3227	   ADBs.  In other words, if the server supports INITIALIZE, then it
3228	   supports the concept of ADBs.  [[Comment.8: Do we want to support an
3229	   asynchronous INITIALIZE?  Do we have to? --TH]]

3231	   If the data was already initialized, There are two interesting
3232	   scenarios:

3234	   1.  The data blocks are allocated.

3236	   2.  Initializing in the middle of an existing ADB.

3238	   If the data blocks were already allocated, then the INITIALIZE is a
3239	   hole punch operation.  If INITIALIZE supports sparse files, then the
3240	   data blocks are to be deallocated.  If not, then the data blocks are
3241	   to be rewritten in the indicated ADB format.  [[Comment.9: Need to
3242	   document interaction between space reservation and hole punching?
3243	   --TH]]

3245	   Since the server has no knowledge of ADBs, it should not report
3246	   misaligned creation of ADBs.  Even while it can detect them, it
3247	   cannot disallow them, as the application might be in the process of
3248	   changing the size of the ADBs.  Thus the server must be prepared to
3249	   handle an INITIALIZE into an existing ADB.

3251	   This document does not mandate the manner in which the server stores
3252	   ADBs sparsely for a file.  It does assume that if ADBs are stored
3253	   sparsely, then the server can detect when an INITIALIZE arrives that
3254	   will force a new ADB to start inside an existing ADB.  For example,
3255	   assume that ADBi has a adb_block_size of 4k and that an INITIALIZE
3256	   starts 1k inside ADBi.  The server should [[Comment.10: Need to flesh
3257	   this out. --TH]]

3259	10.7.3.1.  Hole punching

3261	   Whenever a client wishes to deallocate the blocks backing a
3262	   particular region in the file, it calls the INITIALIZE operation with
3263	   the current filehandle set to the filehandle of the file in question,
3264	   start offset and length in bytes of the region set in hpa_offset and
3265	   hpa_count respectively.  All further reads to this region MUST return
3266	   zeros until overwritten.  The filehandle specified must be that of a
3267	   regular file.

3269	   Situations may arise where ia_hole.hi_offset and/or ia_hole.hi_offset
3270	   + ia_hole.hi_length will not be aligned to a boundary that the server
3271	   does allocations/ deallocations in.  For most filesystems, this is
3272	   the block size of the file system.  In such a case, the server can
3273	   deallocate as many bytes as it can in the region.  The blocks that
3274	   cannot be deallocated MUST be zeroed.  Except for the block
3275	   deallocation and maximum hole punching capability, a INITIALIZE
3276	   operation is to be treated similar to a write of zeroes.

3278	   The server is not required to complete deallocating the blocks
3279	   specified in the operation before returning.  It is acceptable to
3280	   have the deallocation be deferred.  In fact, INITIALIZE is merely a
3281	   hint; it is valid for a server to return success without ever doing
3282	   anything towards deallocating the blocks backing the region
3283	   specified.  However, any future reads to the region MUST return
3284	   zeroes.

3286	   If used to hole punch, INITIALIZE will result in the space_used
3287	   attribute being decreased by the number of bytes that were
3288	   deallocated.  The space_freed attribute may or may not decrease,
3289	   depending on the support and whether the blocks backing the specified
3290	   range were shared or not.  The size attribute will remain unchanged.

3292	   The INITIALIZE operation MUST NOT change the space reservation
3293	   guarantee of the file.  While the server can deallocate the blocks
3294	   specified by hpa_offset and hpa_count, future writes to this region
3295	   MUST NOT fail with NFSERR_NOSPC.

3297	   The INITIALIZE operation may fail for the following reasons (this is
3298	   a partial list):

3300	   NFS4ERR_NOTSUPP  The Hole punch operations are not supported by the
3301	      NFS server receiving this request.

3303	   NFS4ERR_DIR  The current filehandle is of type NF4DIR.

3305	   NFS4ERR_SYMLINK  The current filehandle is of type NF4LNK.

3307	   NFS4ERR_WRONG_TYPE  The current filehandle does not designate an
3308	      ordinary file.

3310	10.8.  Changes to Operation 51: LAYOUTRETURN
3311	10.8.1.  Introduction

3313	   In the pNFS description provided in [2], the client is not enabled to
3314	   relay an error code from the DS to the MDS.  In the specification of
3315	   the Objects-Based Layout protocol [7], use is made of the opaque
3316	   lrf_body field of the LAYOUTRETURN argument to do such a relaying of
3317	   error codes.  In this section, we define a new data structure to
3318	   enable the passing of error codes back to the MDS and provide some
3319	   guidelines on what both the client and MDS should expect in such
3320	   circumstances.

3322	   There are two broad classes of errors, transient and persistent.  The
3323	   client SHOULD strive to only use this new mechanism to report
3324	   persistent errors.  It MUST be able to deal with transient issues by
3325	   itself.  Also, while the client might consider an issue to be
3326	   persistent, it MUST be prepared for the MDS to consider such issues
3327	   to be persistent.  A prime example of this is if the MDS fences off a
3328	   client from either a stateid or a filehandle.  The client will get an
3329	   error from the DS and might relay either NFS4ERR_ACCESS or
3330	   NFS4ERR_STALE_STATEID back to the MDS, with the belief that this is a
3331	   hard error.  The MDS on the other hand, is waiting for the client to
3332	   report such an error.  For it, the mission is accomplished in that
3333	   the client has returned a layout that the MDS had most likley
3334	   recalled.

3336	   The existing LAYOUTRETURN operation is extended by introducing a new
3337	   data structure to report errors, layoutreturn_device_error4.  Also,
3338	   layoutreturn_device_error4 is introduced to enable an array of errors
3339	   to be reported.

3341	10.8.2.  ARGUMENT

3343	   The ARGUMENT specification of the LAYOUTRETURN operation in section
3344	   18.44.1 of [2] is augmented by the following XDR code [22]:

3346	   struct layoutreturn_device_error4 {
3347	           deviceid4       lrde_deviceid;
3348	           nfsstat4        lrde_status;
3349	           nfs_opnum4      lrde_opnum;
3350	   };

3352	   struct layoutreturn_error_report4 {
3353	           layoutreturn_device_error4      lrer_errors<>;
3354	   };

3356	10.8.3.  RESULT

3358	   The RESULT of the LAYOUTRETURN operation is unchanged; see section
3359	   18.44.2 of [2].

3361	10.8.4.  DESCRIPTION

3363	   The following text is added to the end of the LAYOUTRETURN operation
3364	   DESCRIPTION in section 18.44.3 of [2].

3366	   When a client used LAYOUTRETURN with a type of LAYOUTRETURN4_FILE,
3367	   then if the lrf_body field is NULL, it indicates to the MDS that the
3368	   client experienced no errors.  If lrf_body is non-NULL, then the
3369	   field references error information which is layout type specific.
3370	   I.e., the Objects-Based Layout protocol can continue to utilize
3371	   lrf_body as specified in [7].  For both Files-Based Layouts, the
3372	   field references a layoutreturn_device_error4, which contains an
3373	   array of layoutreturn_device_error4.

3375	   Each individual layoutreturn_device_error4 descibes a single error
3376	   associated with a DS, which is identfied via lrde_deviceid.  The
3377	   operation which returned the error is identified via lrde_opnum.
3378	   Finally the NFS error value (nfsstat4) encountered is provided via
3379	   lrde_status and may consist of the following error codes:

3381	   NFS4_OKAY:  No issues were found for this device.

3383	   NFS4ERR_NXIO:  The client was unable to establish any communication
3384	      with the DS.

3386	   NFS4ERR_*:  The client was able to establish communication with the
3387	      DS and is returning one of the allowed error codes for the
3388	      operation denoted by lrde_opnum.

3390	10.8.5.  IMPLEMENTATION

3392	   The following text is added to the end of the LAYOUTRETURN operation
3393	   IMPLEMENTATION in section 18.4.4 of [2].

3395	   A client that expects to use pNFS for a mounted filesystem SHOULD
3396	   check for pNFS support at mount time.  This check SHOULD be performed
3397	   by sending a GETDEVICELIST operation, followed by layout-type-
3398	   specific checks for accessibility of each storage device returned by
3399	   GETDEVICELIST.  If the NFS server does not support pNFS, the
3400	   GETDEVICELIST operation will be rejected with an NFS4ERR_NOTSUPP
3401	   error; in this situation it is up to the client to determine whether
3402	   it is acceptable to proceed with NFS-only access.

3404	   Clients are expected to tolerate transient storage device errors, and
3405	   hence clients SHOULD NOT use the LAYOUTRETURN error handling for
3406	   device access problems that may be transient.  The methods by which a
3407	   client decides whether an access problem is transient vs. persistent
3408	   are implementation-specific, but may include retrying I/Os to a data
3409	   server under appropriate conditions.

3411	   When an I/O fails to a storage device, the client SHOULD retry the
3412	   failed I/O via the MDS.  In this situation, before retrying the I/O,
3413	   the client SHOULD return the layout, or the affected portion thereof,
3414	   and SHOULD indicate which storage device or devices was problematic.
3415	   If the client does not do this, the MDS may issue a layout recall
3416	   callback in order to perform the retried I/O.

3418	   The client needs to be cognizant that since this error handling is
3419	   optional in the MDS, the MDS may silently ignore this functionality.
3420	   Also, as the MDS may consider some issues the client reports to be
3421	   expected (see Section 10.8.1), the client might find it difficult to
3422	   detect a MDS which has not implemented error handling via
3423	   LAYOUTRETURN.

3425	   If an MDS is aware that a storage device is proving problematic to a
3426	   client, the MDS SHOULD NOT include that storage device in any pNFS
3427	   layouts sent to that client.  If the MDS is aware that a storage
3428	   device is affecting many clients, then the MDS SHOULD NOT include
3429	   that storage device in any pNFS layouts sent out.  Clients must still
3430	   be aware that the MDS might not have any choice in using the storage
3431	   device, i.e., there might only be one possible layout for the system.

3433	   Another interesting complication is that for existing files, the MDS
3434	   might have no choice in which storage devices to hand out to clients.
3435	   The MDS might try to restripe a file across a different storage
3436	   device, but clients need to be aware that not all implementations
3437	   have restriping support.

3439	   An MDS SHOULD react to a client return of layouts with errors by not
3440	   using the problematic storage devices in layouts for that client, but
3441	   the MDS is not required to indefinitely retain per-client storage
3442	   device error information.  An MDS is also not required to
3443	   automatically reinstate use of a previously problematic storage
3444	   device; administrative intervention may be required instead.

3446	   A client MAY perform I/O via the MDS even when the client holds a
3447	   layout that covers the I/O; servers MUST support this client
3448	   behavior, and MAY recall layouts as needed to complete I/Os.

3450	10.9.  Operation 65: READ_PLUS

3452	   If the client sends a READ operation, it is explicitly stating that
3453	   it is not supporting sparse files.  So if a READ occurs on a sparse
3454	   ADB, then the server must expand such ADBs to be raw bytes.  If a
3455	   READ occurs in the middle of an ADB, the server can only send back
3456	   bytes starting from that offset.

3458	   Such an operation is inefficient for transfer of sparse sections of
3459	   the file.  As such, READ is marked as OBSOLETE in NFSv4.2.  Instead,
3460	   a client should issue READ_PLUS.  Note that as the client has no a
3461	   priori knowledge of whether an ADB is present or not, it should
3462	   always use READ_PLUS.

3464	10.9.1.  ARGUMENT

3466	   struct READ_PLUS4args {
3467	           /* CURRENT_FH: file */
3468	           stateid4        rpa_stateid;
3469	           offset4         rpa_offset;
3470	           count4          rpa_count;
3471	   };

3473	10.9.2.  RESULT

3475	   union read_plus_content switch (data_content4 content) {
3476	   case NFS4_CONTENT_DATA:
3477	           opaque          rpc_data<>;
3478	   case NFS4_CONTENT_APP_BLOCK:
3479	           app_data_block4 rpc_block;
3480	   case NFS4_CONTENT_HOLE:
3481	           hole_info4      rpc_hole;
3482	   default:
3483	           void;
3484	   };

3486	   /*
3487	    * Allow a return of an array of contents.
3488	    */
3489	   struct read_plus_res4 {
3490	           bool                    rpr_eof;
3491	           read_plus_content       rpr_contents<>;
3492	   };

3494	   union READ_PLUS4res switch (nfsstat4 status) {
3495	   case NFS4_OK:
3496	           read_plus_res4  resok4;
3497	   default:
3498	           void;
3499	   };

3501	10.9.3.  DESCRIPTION

3503	   Over the given range, READ_PLUS will return all data and ADBs found
3504	   as an array of read_plus_content.  It is possible to have consecutive
3505	   ADBs in the array as either different definitions of ADBs are present
3506	   or as the guard pattern changes.

3508	   Edge cases exist for ABDs which either begin before the rpa_offset
3509	   requested by the READ_PLUS or end after the rpa_count requested -
3510	   both of which may occur as not all applications which access the file
3511	   are aware of the main application imposing a format on the file
3512	   contents, i.e., tar, dd, cp, etc.  READ_PLUS MUST retrieve whole
3513	   ADBs, but it need not retrieve an entire sequences of ADBs.

3515	   The server MUST return a whole ADB because if it does not, it must
3516	   expand that partial ADB before it sends it to the client.  E.g., if
3517	   an ADB had a block size of 64k and the READ_PLUS was for 128k
3518	   starting at an offset of 32k inside the ADB, then the first 32k would
3519	   be converted to data.

3521	11.  NFSv4.2 Callback Operations

3523	11.1.  Procedure 16: CB_ATTR_CHANGED - Notify Client that the File's
3524	       Attributes Changed

3526	11.1.1.  ARGUMENTS

3528	   struct CB_ATTR_CHANGED4args {
3529	           nfs_fh4         acca_fh;
3530	           bitmap4         acca_critical;
3531	           bitmap4         acca_info;
3532	   };

3534	11.1.2.  RESULTS

3536	   struct CB_ATTR_CHANGED4res {
3537	           nfsstat4        accr_status;
3538	   };

3540	11.1.3.  DESCRIPTION

3542	   The CB_ATTR_CHANGED callback operation is used by the server to
3543	   indicate to the client that the file's attributes have been modified
3544	   on the server.  The server does not convey how the attributes have
3545	   changed, just that they have been modified.  The server can inform
3546	   the client about both critical and informational attribute changes in
3547	   the bitmask arguments.  The client SHOULD query the server about all
3548	   attributes set in acca_critical.  For all changes reflected in
3549	   acca_info, the client can decide whether or not it wants to poll the
3550	   server.

3552	   The CB_ATTR_CHANGED callback operation with the FATTR4_SEC_LABEL set
3553	   in acca_critical is the method used by the server to indicate that
3554	   the MAC label for the file referenced by acca_fh has changed.  In
3555	   many ways, the server does not care about the result returned by the
3556	   client.

3558	11.2.  Operation 15: CB_COPY - Report results of a server-side copy
3559	11.2.1.  ARGUMENT

3561	   union copy_info4 switch (nfsstat4 cca_status) {
3562	           case NFS4_OK:
3563	                   void;
3564	           default:
3565	                   length4         cca_bytes_copied;
3566	   };

3568	   struct CB_COPY4args {
3569	           nfs_fh4         cca_fh;
3570	           stateid4        cca_stateid;
3571	           copy_info4      cca_copy_info;
3572	   };

3574	11.2.2.  RESULT

3576	   struct CB_COPY4res {
3577	           nfsstat4        ccr_status;
3578	   };

3580	11.2.3.  DESCRIPTION

3582	   CB_COPY is used for both intra- and inter-server asynchronous copies.
3583	   The CB_COPY callback informs the client of the result of an
3584	   asynchronous server-side copy.  This operation is sent by the
3585	   destination server to the client in a CB_COMPOUND request.  The copy
3586	   is identified by the filehandle and stateid arguments.  The result is
3587	   indicated by the status field.  If the copy failed, cca_bytes_copied
3588	   contains the number of bytes copied before the failure occurred.  The
3589	   cca_bytes_copied value indicates the number of bytes copied but not
3590	   which specific bytes have been copied.

3592	   In the absence of an established backchannel, the server cannot
3593	   signal the completion of the COPY via a CB_COPY callback.  The loss
3594	   of a callback channel would be indicated by the server setting the
3595	   SEQ4_STATUS_CB_PATH_DOWN flag in the sr_status_flags field of the
3596	   SEQUENCE operation.  The client must re-establish the callback
3597	   channel to receive the status of the COPY operation.  Prolonged loss
3598	   of the callback channel could result in the server dropping the COPY
3599	   operation state and invalidating the copy stateid.

3601	   If the client supports the COPY operation, the client is REQUIRED to
3602	   support the CB_COPY operation.

3604	   The CB_COPY operation may fail for the following reasons (this is a
3605	   partial list):

3607	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
3608	      NFS client receiving this request.

3610	12.  IANA Considerations

3612	   This section uses terms that are defined in [23].

3614	13.  References

3616	13.1.  Normative References

3618	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
3619	         Levels", March 1997.

3621	   [2]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
3622	         (NFS) Version 4 Minor Version 1 Protocol", RFC 5661,
3623	         January 2010.

3625	   [3]   Haynes, T., "Network File System (NFS) Version 4 Minor Version
3626	         2 External Data Representation Standard (XDR) Description",
3627	         March 2011.

3629	   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
3630	         Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
3631	         January 2005.

3633	   [5]   Haynes, T. and N. Williams, "Remote Procedure Call (RPC)
3634	         Security Version 3", draft-williams-rpcsecgssv3 (work in
3635	         progress), 2011.

3637	   [6]   Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
3638	         Specification", RFC 2203, September 1997.

3640	   [7]   Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel
3641	         NFS (pNFS) Operations", RFC 5664, January 2010.

3643	   [8]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
3644	         (NFS) Version 4 Minor Version 1 External Data Representation
3645	         Standard (XDR) Description", RFC 5662, January 2010.

3647	   [9]   Black, D., Glasgow, J., and S. Fridella, "Parallel NFS (pNFS)
3648	         Block/Volume Layout", RFC 5663, January 2010.

3650	13.2.  Informative References

3652	   [10]  Haynes, T. and D. Noveck, "Network File System (NFS) version 4
3653	         Protocol", draft-ietf-nfsv4-rfc3530bis-09 (Work In Progress),
3654	         March 2011.

3656	   [11]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3657	         "NSDB Protocol for Federated Filesystems",
3658	         draft-ietf-nfsv4-federated-fs-protocol (Work In Progress),
3659	         2010.

3661	   [12]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3662	         "Administration Protocol for Federated Filesystems",
3663	         draft-ietf-nfsv4-federated-fs-admin (Work In Progress), 2010.

3665	   [13]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
3666	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
3667	         HTTP/1.1", RFC 2616, June 1999.

3669	   [14]  Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9,
3670	         RFC 959, October 1985.

3672	   [15]  Simpson, W., "PPP Challenge Handshake Authentication Protocol
3673	         (CHAP)", RFC 1994, August 1996.

3675	   [16]  Strohm, R., "Chapter 2, Data Blocks, Extents, and Segments, of
3676	         Oracle Database Concepts 11g Release 1 (11.1)", January 2011.

3678	   [17]  Ashdown, L., "Chapter 15, Validating Database Files and
3679	         Backups, of Oracle Database Backup and Recovery User's Guide
3680	         11g Release 1 (11.1)", August 2008.

3682	   [18]  McDougall, R. and J. Mauro, "Section 11.4.3, Detecting Memory
3683	         Corruption of Solaris Internals", 2007.

3685	   [19]  Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-
3686	         Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data
3687	         Corruption in the Storage Stack", Proceedings of the 6th USENIX
3688	         Symposium on File and Storage Technologies (FAST '08) , 2008.

3690	   [20]  "Section 46.6. Multi-Level Security (MLS) of Deployment Guide:
3691	         Deployment, configuration and administration of Red Hat
3692	         Enterprise Linux 5, Edition 6", 2011.

3694	   [21]  Quigley, D. and J. Lu, "Registry Specification for MAC Security
3695	         Label Formats", draft-quigley-label-format-registry (work in
3696	         progress), 2011.

3698	   [22]  Eisler, M., "XDR: External Data Representation Standard",
3699	         RFC 4506, May 2006.

3701	   [23]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
3702	         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

3704	   [24]  Nowicki, B., "NFS: Network File System Protocol specification",
3705	         RFC 1094, March 1989.

3707	   [25]  Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3
3708	         Protocol Specification", RFC 1813, June 1995.

3710	   [26]  Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
3711	         RFC 1833, August 1995.

3713	   [27]  Eisler, M., "NFS Version 2 and Version 3 Security Issues and
3714	         the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5",
3715	         RFC 2623, June 1999.

3717	   [28]  Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997.

3719	   [29]  Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
3720	         June 1999.

3722	   [30]  Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On-
3723	         line Database", RFC 3232, January 2002.

3725	   [31]  Linn, J., "The Kerberos Version 5 GSS-API Mechanism", RFC 1964,
3726	         June 1996.

3728	   [32]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame,
3729	         C., Eisler, M., and D. Noveck, "Network File System (NFS)
3730	         version 4 Protocol", RFC 3530, April 2003.

3732	Appendix A.  Acknowledgments

3734	   For the pNFS Access Permissions Check, the original draft was by
3735	   Sorin Faibish, David Black, Mike Eisler, and Jason Glasgow.  The work
3736	   was influenced by discussions with Benny Halevy and Bruce Fields.  A
3737	   review was done by Tom Haynes.

3739	   For the Sharing change attribute implementation details with NFSv4
3740	   clients, the original draft was by Trond Myklebust.

3742	   For the NFS Server-side Copy, the original draft was by James
3743	   Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul
3744	   Iyer.  Talpey co-authored an unpublished version of that document.

3746	   It was also was reviewed by a number of individuals: Pranoop Erasani,
3747	   Tom Haynes, Arthur Lent, Trond Myklebust, Dave Noveck, Theresa
3748	   Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani, and Nico
3749	   Williams.

3751	   For the NFS space reservation operations, the original draft was by
3752	   Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer.

3754	   For the sparse file support, the original draft was by Dean
3755	   Hildebrand and Marc Eshel.  Valuable input and advice was received
3756	   from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and
3757	   Richard Scheffenegger.

3759	   For Labeled NFS, the original draft was by David Quigley, James
3760	   Morris, Jarret Lu, and Tom Haynes.  Peter Staubach, Trond Myklebust,
3761	   Sorrin Faibish, Nico Williams, and David Black also contributed in
3762	   the final push to get this accepted.

3764	Appendix B.  RFC Editor Notes

3766	   [RFC Editor: please remove this section prior to publishing this
3767	   document as an RFC]

3769	   [RFC Editor: prior to publishing this document as an RFC, please
3770	   replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
3771	   RFC number of this document]

3773	Author's Address

3775	   Thomas Haynes
3776	   NetApp
3777	   9110 E 66th St
3778	   Tulsa, OK  74133
3779	   USA

3781	   Phone: +1 918 307 1415
3782	   Email: thomas@netapp.com
3783	   URI:   http://www.tulsalabs.com