idnits 2.17.1 

draft-ietf-nfsv4-minorversion2-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  == There are 5 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Furthermore, each DS MUST not report to a client either a sparse
     ADB or data which belongs to another DS.  One implication of this
     requirement is that the app_data_block4's adb_block_size MUST be either
     be the stripe width or the stripe width must be an even multiple of it.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     When a data server chooses to return a READ_HOLE result, it has the
     option of returning hole information for the data stored on that data
     server (as defined by the data layout), but it MUST not return a
     nfs_readplusreshole structure with a byte range that includes data
     managed by another data server.

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 09, 2011) is 4736 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 1759, but not defined

  == Unused Reference: '7' is defined on line 3060, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 3064, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 3067, but no explicit reference
     was found in the text

  == Unused Reference: '22' is defined on line 3116, but no explicit
     reference was found in the text

  == Unused Reference: '23' is defined on line 3119, but no explicit
     reference was found in the text

  == Unused Reference: '24' is defined on line 3122, but no explicit
     reference was found in the text

  == Unused Reference: '25' is defined on line 3125, but no explicit
     reference was found in the text

  == Unused Reference: '26' is defined on line 3129, but no explicit
     reference was found in the text

  == Unused Reference: '27' is defined on line 3131, but no explicit
     reference was found in the text

  == Unused Reference: '28' is defined on line 3134, but no explicit
     reference was found in the text

  == Unused Reference: '29' is defined on line 3137, but no explicit
     reference was found in the text

  == Unused Reference: '30' is defined on line 3140, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 5661 (ref. '2') (Obsoleted by RFC 8881)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  == Outdated reference: A later version (-35) exists of
     draft-ietf-nfsv4-rfc3530bis-09

  -- Obsolete informational reference (is this intentional?): RFC 2616 (ref.
     '14') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC
     7235)

  -- Obsolete informational reference (is this intentional?): RFC 5226 (ref.
     '21') (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 3530 (ref.
     '30') (Obsoleted by RFC 7530)


     Summary: 1 error (**), 0 flaws (~~), 20 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4                                                          T. Haynes
3	Internet-Draft                                                    Editor
4	Intended status: Standards Track                            May 09, 2011
5	Expires: November 10, 2011

7	                     NFS Version 4 Minor Version 2
8	                 draft-ietf-nfsv4-minorversion2-02.txt

10	Abstract

12	   This Internet-Draft describes NFS version 4 minor version two,
13	   focusing mainly on the protocol extensions made from NFS version 4
14	   minor version 0 and NFS version 4 minor version 1.  Major extensions
15	   introduced in NFS version 4 minor version two include: Server-side
16	   Copy, Space Reservations, and Support for Sparse Files.

18	Requirements Language

20	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
21	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
22	   document are to be interpreted as described in RFC 2119 [1].

24	Status of this Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on November 10, 2011.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	   This document may contain material from IETF Documents or IETF
57	   Contributions published or made publicly available before November
58	   10, 2008.  The person(s) controlling the copyright in some of this
59	   material may not have granted the IETF Trust the right to allow
60	   modifications of such material outside the IETF Standards Process.
61	   Without obtaining an adequate license from the person(s) controlling
62	   the copyright in such materials, this document may not be modified
63	   outside the IETF Standards Process, and derivative works of it may
64	   not be created outside the IETF Standards Process, except to format
65	   it for publication as an RFC or to translate it into languages other
66	   than English.

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
71	     1.1.  The NFS Version 4 Minor Version 2 Protocol . . . . . . . .  5
72	     1.2.  Scope of This Document . . . . . . . . . . . . . . . . . .  5
73	     1.3.  NFSv4.2 Goals  . . . . . . . . . . . . . . . . . . . . . .  5
74	     1.4.  Overview of NFSv4.2 Features . . . . . . . . . . . . . . .  5
75	     1.5.  Differences from NFSv4.1 . . . . . . . . . . . . . . . . .  5
76	   2.  pNFS LAYOUTRETURN Error Handling . . . . . . . . . . . . . . .  5
77	     2.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . .  5
78	     2.2.  Changes to Operation 51: LAYOUTRETURN  . . . . . . . . . .  6
79	       2.2.1.  ARGUMENT . . . . . . . . . . . . . . . . . . . . . . .  6
80	       2.2.2.  RESULT . . . . . . . . . . . . . . . . . . . . . . . .  6
81	       2.2.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . .  6
82	       2.2.4.  IMPLEMENTATION . . . . . . . . . . . . . . . . . . . .  7
83	   3.  Sharing change attribute implementation details with NFSv4
84	       clients  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
85	     3.1.  Abstract . . . . . . . . . . . . . . . . . . . . . . . . .  8
86	     3.2.  Introduction . . . . . . . . . . . . . . . . . . . . . . .  9
87	     3.3.  Definition of the 'change_attr_type' per-file system
88	           attribute  . . . . . . . . . . . . . . . . . . . . . . . .  9
89	   4.  NFS Server-side Copy . . . . . . . . . . . . . . . . . . . . . 10
90	     4.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 11
91	     4.2.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . 11
92	       4.2.1.  Intra-Server Copy  . . . . . . . . . . . . . . . . . . 13
93	       4.2.2.  Inter-Server Copy  . . . . . . . . . . . . . . . . . . 14
94	       4.2.3.  Server-to-Server Copy Protocol . . . . . . . . . . . . 17
95	     4.3.  Operations . . . . . . . . . . . . . . . . . . . . . . . . 19
96	       4.3.1.  netloc4 - Network Locations  . . . . . . . . . . . . . 19
97	       4.3.2.  Operation 61: COPY_NOTIFY - Notify a source server
98	               of a future copy . . . . . . . . . . . . . . . . . . . 20
99	       4.3.3.  Operation 62: COPY_REVOKE - Revoke a destination
100	               server's copy privileges . . . . . . . . . . . . . . . 22
101	       4.3.4.  Operation 59: COPY - Initiate a server-side copy . . . 23
102	       4.3.5.  Operation 60: COPY_ABORT - Cancel a server-side
103	               copy . . . . . . . . . . . . . . . . . . . . . . . . . 31
104	       4.3.6.  Operation 63: COPY_STATUS - Poll for status of a
105	               server-side copy . . . . . . . . . . . . . . . . . . . 32
106	       4.3.7.  Operation 15: CB_COPY - Report results of a
107	               server-side copy . . . . . . . . . . . . . . . . . . . 33
108	       4.3.8.  Copy Offload Stateids  . . . . . . . . . . . . . . . . 35
109	     4.4.  Security Considerations  . . . . . . . . . . . . . . . . . 35
110	       4.4.1.  Inter-Server Copy Security . . . . . . . . . . . . . . 35
111	   5.  Application Data Block Support . . . . . . . . . . . . . . . . 43
112	     5.1.  Generic Framework  . . . . . . . . . . . . . . . . . . . . 44
113	       5.1.1.  Data Block Representation  . . . . . . . . . . . . . . 45
114	       5.1.2.  Data Content . . . . . . . . . . . . . . . . . . . . . 45
115	     5.2.  Operation 64: INITIALIZE . . . . . . . . . . . . . . . . . 45
116	       5.2.1.  ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 46
117	       5.2.2.  RESULT . . . . . . . . . . . . . . . . . . . . . . . . 46
118	       5.2.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 47
119	     5.3.  Operation 65: READ_PLUS  . . . . . . . . . . . . . . . . . 48
120	       5.3.1.  ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 48
121	       5.3.2.  RESULT . . . . . . . . . . . . . . . . . . . . . . . . 49
122	       5.3.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 49
123	     5.4.  pNFS Considerations  . . . . . . . . . . . . . . . . . . . 50
124	     5.5.  An Example of Detecting Corruption . . . . . . . . . . . . 50
125	     5.6.  Example of READ_PLUS . . . . . . . . . . . . . . . . . . . 52
126	     5.7.  Zero Filled Holes  . . . . . . . . . . . . . . . . . . . . 52
127	   6.  Space Reservation  . . . . . . . . . . . . . . . . . . . . . . 52
128	     6.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 52
129	     6.2.  Use Cases  . . . . . . . . . . . . . . . . . . . . . . . . 54
130	       6.2.1.  Space Reservation  . . . . . . . . . . . . . . . . . . 54
131	       6.2.2.  Space freed on deletes . . . . . . . . . . . . . . . . 54
132	       6.2.3.  Operations and attributes  . . . . . . . . . . . . . . 55
133	       6.2.4.  Attribute 77: space_reserved . . . . . . . . . . . . . 55
134	       6.2.5.  Attribute 78: space_freed  . . . . . . . . . . . . . . 56
135	       6.2.6.  Attribute 79: max_hole_punch . . . . . . . . . . . . . 56
136	       6.2.7.  Operation 64: HOLE_PUNCH - Zero and deallocate
137	               blocks backing the file in the specified range.  . . . 56
138	   7.  Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . . 57
139	     7.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 57
140	     7.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . 58
141	     7.3.  Applications and Sparse Files  . . . . . . . . . . . . . . 59
142	     7.4.  Overview of Sparse Files and NFSv4 . . . . . . . . . . . . 60
143	     7.5.  Operation 65: READ_PLUS  . . . . . . . . . . . . . . . . . 61
144	       7.5.1.  ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 61
145	       7.5.2.  RESULT . . . . . . . . . . . . . . . . . . . . . . . . 62
146	       7.5.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 62
147	       7.5.4.  IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . 64
148	       7.5.5.  READ_PLUS with Sparse Files Example  . . . . . . . . . 65
149	     7.6.  Related Work . . . . . . . . . . . . . . . . . . . . . . . 66
150	     7.7.  Other Proposed Designs . . . . . . . . . . . . . . . . . . 66
151	       7.7.1.  Multi-Data Server Hole Information . . . . . . . . . . 66
152	       7.7.2.  Data Result Array  . . . . . . . . . . . . . . . . . . 67
153	       7.7.3.  User-Defined Sparse Mask . . . . . . . . . . . . . . . 67
154	       7.7.4.  Allocated flag . . . . . . . . . . . . . . . . . . . . 67
155	       7.7.5.  Dense and Sparse pNFS File Layouts . . . . . . . . . . 68
156	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 68
157	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 68
158	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 68
159	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 68
160	     10.2. Informative References . . . . . . . . . . . . . . . . . . 69
161	   Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 70
162	   Appendix B.  RFC Editor Notes  . . . . . . . . . . . . . . . . . . 71
163	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 71

165	1.  Introduction

167	1.1.  The NFS Version 4 Minor Version 2 Protocol

169	   The NFS version 4 minor version 2 (NFSv4.2) protocol is the third
170	   minor version of the NFS version 4 (NFSv4) protocol.  The first minor
171	   version, NFSv4.0, is described in [10] and the second minor version,
172	   NFSv4.1, is described in [2].  It follows the guidelines for minor
173	   versioning that are listed in Section 11 of RFC 3530bis.

175	   As a minor version, NFSv4.2 is consistent with the overall goals for
176	   NFSv4, but extends the protocol so as to better meet those goals,
177	   based on experiences with NFSv4.1.  In addition, NFSv4.2 has adopted
178	   some additional goals, which motivate some of the major extensions in
179	   NFSv4.2.

181	1.2.  Scope of This Document

183	   This document describes the NFSv4.2 protocol.  With respect to
184	   NFSv4.0 and NFSv4.1, this document does not:

186	   o  describe the NFSv4.0 or NFSv4.1 protocols, except where needed to
187	      contrast with NFSv4.2.

189	   o  modify the specification of the NFSv4.0 or NFSv4.1 protocols.

191	   o  clarify the NFSv4.0 or NFSv4.1 protocols.

193	   The full XDR for NFSv4.2 is presented in [3].

195	1.3.  NFSv4.2 Goals

197	1.4.  Overview of NFSv4.2 Features

199	1.5.  Differences from NFSv4.1

201	2.  pNFS LAYOUTRETURN Error Handling

203	2.1.  Introduction

205	   In the pNFS description provided in [2], the client is not enabled to
206	   relay an error code from the DS to the MDS.  In the specification of
207	   the Objects-Based Layout protocol [4], use is made of the opaque
208	   lrf_body field of the LAYOUTRETURN argument to do such a relaying of
209	   error codes.  In this section, we define a new data structure to
210	   enable the passing of error codes back to the MDS and provide some
211	   guidelines on what both the client and MDS should expect in such
212	   circumstances.

214	   There are two broad classes of errors, transient and persistent.  The
215	   client SHOULD strive to only use this new mechanism to report
216	   persistent errors.  It MUST be able to deal with transient issues by
217	   itself.  Also, while the client might consider an issue to be
218	   persistent, it MUST be prepared for the MDS to consider such issues
219	   to be persistent.  A prime example of this is if the MDS fences off a
220	   client from either a stateid or a filehandle.  The client will get an
221	   error from the DS and might relay either NFS4ERR_ACCESS or
222	   NFS4ERR_STALE_STATEID back to the MDS, with the belief that this is a
223	   hard error.  The MDS on the other hand, is waiting for the client to
224	   report such an error.  For it, the mission is accomplished in that
225	   the client has returned a layout that the MDS had most likley
226	   recalled.

228	2.2.  Changes to Operation 51: LAYOUTRETURN

230	   The existing LAYOUTRETURN operation is extended by introducing a new
231	   data structure to report errors, layoutreturn_device_error4.  Also,
232	   layoutreturn_device_error4 is introduced to enable an array of errors
233	   to be reported.

235	2.2.1.  ARGUMENT

237	   The ARGUMENT specification of the LAYOUTRETURN operation in section
238	   18.44.1 of [2] is augmented by the following XDR code [11]:

240	   struct layoutreturn_device_error4 {
241	           deviceid4       lrde_deviceid;
242	           nfsstat4        lrde_status;
243	           nfs_opnum4      lrde_opnum;
244	   };

246	   struct layoutreturn_error_report4 {
247	           layoutreturn_device_error4      lrer_errors<>;
248	   };

250	2.2.2.  RESULT

252	   The RESULT of the LAYOUTRETURN operation is unchanged; see section
253	   18.44.2 of [2].

255	2.2.3.  DESCRIPTION

257	   The following text is added to the end of the LAYOUTRETURN operation
258	   DESCRIPTION in section 18.44.3 of [2].

260	   When a client used LAYOUTRETURN with a type of LAYOUTRETURN4_FILE,
261	   then if the lrf_body field is NULL, it indicates to the MDS that the
262	   client experienced no errors.  If lrf_body is non-NULL, then the
263	   field references error information which is layout type specific.
264	   I.e., the Objects-Based Layout protocol can continue to utilize
265	   lrf_body as specified in [4].  For both Files-Based Layouts, the
266	   field references a layoutreturn_device_error4, which contains an
267	   array of layoutreturn_device_error4.

269	   Each individual layoutreturn_device_error4 descibes a single error
270	   associated with a DS, which is identfied via lrde_deviceid.  The
271	   operation which returned the error is identified via lrde_opnum.
272	   Finally the NFS error value (nfsstat4) encountered is provided via
273	   lrde_status and may consist of the following error codes:

275	   NFS4_OKAY:  No issues were found for this device.

277	   NFS4ERR_NXIO:  The client was unable to establish any communication
278	      with the DS.

280	   NFS4ERR_*:  The client was able to establish communication with the
281	      DS and is returning one of the allowed error codes for the
282	      operation denoted by lrde_opnum.

284	2.2.4.  IMPLEMENTATION

286	   The following text is added to the end of the LAYOUTRETURN operation
287	   IMPLEMENTATION in section 18.4.4 of [2].

289	   A client that expects to use pNFS for a mounted filesystem SHOULD
290	   check for pNFS support at mount time.  This check SHOULD be performed
291	   by sending a GETDEVICELIST operation, followed by layout-type-
292	   specific checks for accessibility of each storage device returned by
293	   GETDEVICELIST.  If the NFS server does not support pNFS, the
294	   GETDEVICELIST operation will be rejected with an NFS4ERR_NOTSUPP
295	   error; in this situation it is up to the client to determine whether
296	   it is acceptable to proceed with NFS-only access.

298	   Clients are expected to tolerate transient storage device errors, and
299	   hence clients SHOULD NOT use the LAYOUTRETURN error handling for
300	   device access problems that may be transient.  The methods by which a
301	   client decides whether an access problem is transient vs. persistent
302	   are implementation-specific, but may include retrying I/Os to a data
303	   server under appropriate conditions.

305	   When an I/O fails to a storage device, the client SHOULD retry the
306	   failed I/O via the MDS.  In this situation, before retrying the I/O,
307	   the client SHOULD return the layout, or the affected portion thereof,
308	   and SHOULD indicate which storage device or devices was problematic.
309	   If the client does not do this, the MDS may issue a layout recall
310	   callback in order to perform the retried I/O.

312	   The client needs to be cognizant that since this error handling is
313	   optional in the MDS, the MDS may silently ignore this functionality.
314	   Also, as the MDS may consider some issues the client reports to be
315	   expected (see Section 2.1), the client might find it difficult to
316	   detect a MDS which has not implemented error handling via
317	   LAYOUTRETURN.

319	   If an MDS is aware that a storage device is proving problematic to a
320	   client, the MDS SHOULD NOT include that storage device in any pNFS
321	   layouts sent to that client.  If the MDS is aware that a storage
322	   device is affecting many clients, then the MDS SHOULD NOT include
323	   that storage device in any pNFS layouts sent out.  Clients must still
324	   be aware that the MDS might not have any choice in using the storage
325	   device, i.e., there might only be one possible layout for the system.

327	   Another interesting complication is that for existing files, the MDS
328	   might have no choice in which storage devices to hand out to clients.
329	   The MDS might try to restripe a file across a different storage
330	   device, but clients need to be aware that not all implementations
331	   have restriping support.

333	   An MDS SHOULD react to a client return of layouts with errors by not
334	   using the problematic storage devices in layouts for that client, but
335	   the MDS is not required to indefinitely retain per-client storage
336	   device error information.  An MDS is also not required to
337	   automatically reinstate use of a previously problematic storage
338	   device; administrative intervention may be required instead.

340	   A client MAY perform I/O via the MDS even when the client holds a
341	   layout that covers the I/O; servers MUST support this client
342	   behavior, and MAY recall layouts as needed to complete I/Os.

344	3.  Sharing change attribute implementation details with NFSv4 clients

346	3.1.  Abstract

348	   This document describes an extension to the NFSv4 protocol that
349	   allows the server to share information about the implementation of
350	   its change attribute with the client.  The aim is to improve the
351	   client's ability to determine the order in which parallel updates to
352	   the same file were processed.

354	3.2.  Introduction

356	   Although both the NFSv4 [10] and NFSv4.1 protocol [2], define the
357	   change attribute as being mandatory to implement, there is little in
358	   the way of guidance.  The only feature that is mandated by the spec
359	   is that the value must change whenever the file data or metadata
360	   change.

362	   While this allows for a wide range of implementations, it also leaves
363	   the client with a conundrum: how does it determine which is the most
364	   recent value for the change attribute in a case where several RPC
365	   calls have been issued in parallel?  In other words if two COMPOUNDs,
366	   both containing WRITE and GETATTR requests for the same file, have
367	   been issued in parallel, how does the client determine which of the
368	   two change attribute values returned in the replies to the GETATTR
369	   requests corresponds to the most recent state of the file?  In some
370	   cases, the only recourse may be to send another COMPOUND containing a
371	   third GETATTR that is fully serialised with the first two.

373	   In order to avoid this kind of inefficiency, we propose a method to
374	   allow the server to share details about how the change attribute is
375	   expected to evolve, so that the client may immediately determine
376	   which, out of the several change attribute values returned by the
377	   server, is the most recent.

379	3.3.  Definition of the 'change_attr_type' per-file system attribute

381	   enum change_attr_typeinfo {
382	              NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR         = 0,
383	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER        = 1,
384	              NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2,
385	              NFS4_CHANGE_TYPE_IS_TIME_METADATA          = 3,
386	              NFS4_CHANGE_TYPE_IS_UNDEFINED              = 4
387	   };

389	        +------------------+----+---------------------------+-----+
390	        | Name             | Id | Data Type                 | Acc |
391	        +------------------+----+---------------------------+-----+
392	        | change_attr_type | XX | enum change_attr_typeinfo | R   |
393	        +------------------+----+---------------------------+-----+

395	   The proposed solution is to enable the NFS server to provide
396	   additional information about how it expects the change attribute
397	   value to evolve after the file data or metadata has changed.  To do
398	   so, we define a new recommended attribute, 'change_attr_type', which
399	   may take values from enum change_attr_typeinfo as follows:

401	   NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR:  The change attribute value MUST
402	      monotonically increase for every atomic change to the file
403	      attributes, data or directory contents.

405	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER:  The change attribute value MUST
406	      be incremented by one unit for every atomic change to the file
407	      attributes, data or directory contents.  This property is
408	      preserved when writing to pNFS data servers.

410	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS:  The change attribute
411	      value MUST be incremented by one unit for every atomic change to
412	      the file attributes, data or directory contents.  In the case
413	      where the client is writing to pNFS data servers, the number of
414	      increments is not guaranteed to exactly match the number of
415	      writes.

417	   NFS4_CHANGE_TYPE_IS_TIME_METADATA:  The change attribute is
418	      implemented as suggested in the NFSv4 spec [10] in terms of the
419	      time_metadata attribute.

421	   NFS4_CHANGE_TYPE_IS_UNDEFINED:  The change attribute does not take
422	      values that fit into any of these categories.

424	   If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR,
425	   NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or
426	   NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at
427	   the very least that the change attribute is monotonically increasing,
428	   which is sufficient to resolve the question of which value is the
429	   most recent.

431	   If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then
432	   by inspecting the value of the 'time_delta' attribute it additionally
433	   has the option of detecting rogue server implementations that use
434	   time_metadata in violation of the spec.

436	   Finally, if the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it
437	   has the ability to predict what the resulting change attribute value
438	   should be after a COMPOUND containing a SETATTR, WRITE, or CREATE.
439	   This again allows it to detect changes made in parallel by another
440	   client.  The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits
441	   the same, but only if the client is not doing pNFS WRITEs.

443	4.  NFS Server-side Copy
444	4.1.  Introduction

446	   This document describes a server-side copy feature for the NFS
447	   protocol.

449	   The server-side copy feature provides a mechanism for the NFS client
450	   to perform a file copy on the server without the data being
451	   transmitted back and forth over the network.

453	   Without this feature, an NFS client copies data from one location to
454	   another by reading the data from the server over the network, and
455	   then writing the data back over the network to the server.  Using
456	   this server-side copy operation, the client is able to instruct the
457	   server to copy the data locally without the data being sent back and
458	   forth over the network unnecessarily.

460	   In general, this feature is useful whenever data is copied from one
461	   location to another on the server.  It is particularly useful when
462	   copying the contents of a file from a backup.  Backup-versions of a
463	   file are copied for a number of reasons, including restoring and
464	   cloning data.

466	   If the source object and destination object are on different file
467	   servers, the file servers will communicate with one another to
468	   perform the copy operation.  The server-to-server protocol by which
469	   this is accomplished is not defined in this document.

471	4.2.  Protocol Overview

473	   The server-side copy offload operations support both intra-server and
474	   inter-server file copies.  An intra-server copy is a copy in which
475	   the source file and destination file reside on the same server.  In
476	   an inter-server copy, the source file and destination file are on
477	   different servers.  In both cases, the copy may be performed
478	   synchronously or asynchronously.

480	   Throughout the rest of this document, we refer to the NFS server
481	   containing the source file as the "source server" and the NFS server
482	   to which the file is transferred as the "destination server".  In the
483	   case of an intra-server copy, the source server and destination
484	   server are the same server.  Therefore in the context of an intra-
485	   server copy, the terms source server and destination server refer to
486	   the single server performing the copy.

488	   The operations described below are designed to copy files.  Other
489	   file system objects can be copied by building on these operations or
490	   using other techniques.  For example if the user wishes to copy a
491	   directory, the client can synthesize a directory copy by first
492	   creating the destination directory and then copying the source
493	   directory's files to the new destination directory.  If the user
494	   wishes to copy a namespace junction [12] [13], the client can use the
495	   ONC RPC Federated Filesystem protocol [13] to perform the copy.
496	   Specifically the client can determine the source junction's
497	   attributes using the FEDFS_LOOKUP_FSN procedure and create a
498	   duplicate junction using the FEDFS_CREATE_JUNCTION procedure.

500	   For the inter-server copy protocol, the operations are defined to be
501	   compatible with a server-to-server copy protocol in which the
502	   destination server reads the file data from the source server.  This
503	   model in which the file data is pulled from the source by the
504	   destination has a number of advantages over a model in which the
505	   source pushes the file data to the destination.  The advantages of
506	   the pull model include:

508	   o  The pull model only requires a remote server (i.e. the destination
509	      server) to be granted read access.  A push model requires a remote
510	      server (i.e. the source server) to be granted write access, which
511	      is more privileged.

513	   o  The pull model allows the destination server to stop reading if it
514	      has run out of space.  In a push model, the destination server
515	      must flow control the source server in this situation.

517	   o  The pull model allows the destination server to easily flow
518	      control the data stream by adjusting the size of its read
519	      operations.  In a push model, the destination server does not have
520	      this ability.  The source server in a push model is capable of
521	      writing chunks larger than the destination server has requested in
522	      attributes and session parameters.  In theory, the destination
523	      server could perform a "short" write in this situation, but this
524	      approach is known to behave poorly in practice.

526	   The following operations are provided to support server-side copy:

528	   COPY_NOTIFY:  For inter-server copies, the client sends this
529	      operation to the source server to notify it of a future file copy
530	      from a given destination server for the given user.

532	   COPY_REVOKE:  Also for inter-server copies, the client sends this
533	      operation to the source server to revoke permission to copy a file
534	      for the given user.

536	   COPY:  Used by the client to request a file copy.

538	   COPY_ABORT:  Used by the client to abort an asynchronous file copy.

540	   COPY_STATUS:  Used by the client to poll the status of an
541	      asynchronous file copy.

543	   CB_COPY:  Used by the destination server to report the results of an
544	      asynchronous file copy to the client.

546	   These operations are described in detail in Section 4.3.  This
547	   section provides an overview of how these operations are used to
548	   perform server-side copies.

550	4.2.1.  Intra-Server Copy

552	   To copy a file on a single server, the client uses a COPY operation.
553	   The server may respond to the copy operation with the final results
554	   of the copy or it may perform the copy asynchronously and deliver the
555	   results using a CB_COPY operation callback.  If the copy is performed
556	   asynchronously, the client may poll the status of the copy using
557	   COPY_STATUS or cancel the copy using COPY_ABORT.

559	   A synchronous intra-server copy is shown in Figure 1.  In this
560	   example, the NFS server chooses to perform the copy synchronously.
561	   The copy operation is completed, either successfully or
562	   unsuccessfully, before the server replies to the client's request.
563	   The server's reply contains the final result of the operation.

565	     Client                                  Server
566	        +                                      +
567	        |                                      |
568	        |--- COPY ---------------------------->| Client requests
569	        |<------------------------------------/| a file copy
570	        |                                      |
571	        |                                      |

573	                Figure 1: A synchronous intra-server copy.

575	   An asynchronous intra-server copy is shown in Figure 2.  In this
576	   example, the NFS server performs the copy asynchronously.  The
577	   server's reply to the copy request indicates that the copy operation
578	   was initiated and the final result will be delivered at a later time.
579	   The server's reply also contains a copy stateid.  The client may use
580	   this copy stateid to poll for status information (as shown) or to
581	   cancel the copy using a COPY_ABORT.  When the server completes the
582	   copy, the server performs a callback to the client and reports the
583	   results.

585	     Client                                  Server
586	        +                                      +
587	        |                                      |
588	        |--- COPY ---------------------------->| Client requests
589	        |<------------------------------------/| a file copy
590	        |                                      |
591	        |                                      |
592	        |--- COPY_STATUS --------------------->| Client may poll
593	        |<------------------------------------/| for status
594	        |                                      |
595	        |                  .                   | Multiple COPY_STATUS
596	        |                  .                   | operations may be sent.
597	        |                  .                   |
598	        |                                      |
599	        |<-- CB_COPY --------------------------| Server reports results
600	        |\------------------------------------>|
601	        |                                      |

603	               Figure 2: An asynchronous intra-server copy.

605	4.2.2.  Inter-Server Copy

607	   A copy may also be performed between two servers.  The copy protocol
608	   is designed to accommodate a variety of network topologies.  As shown
609	   in Figure 3, the client and servers may be connected by multiple
610	   networks.  In particular, the servers may be connected by a
611	   specialized, high speed network (network 192.168.33.0/24 in the
612	   diagram) that does not include the client.  The protocol allows the
613	   client to setup the copy between the servers (over network
614	   10.11.78.0/24 in the diagram) and for the servers to communicate on
615	   the high speed network if they choose to do so.

617	                             192.168.33.0/24
618	                 +-------------------------------------+
619	                 |                                     |
620	                 |                                     |
621	                 | 192.168.33.18                       | 192.168.33.56
622	         +-------+------+                       +------+------+
623	         |     Source   |                       | Destination |
624	         +-------+------+                       +------+------+
625	                 | 10.11.78.18                         | 10.11.78.56
626	                 |                                     |
627	                 |                                     |
628	                 |             10.11.78.0/24           |
629	                 +------------------+------------------+
630	                                    |
631	                                    |
632	                                    | 10.11.78.243
633	                              +-----+-----+
634	                              |   Client  |
635	                              +-----------+

637	            Figure 3: An example inter-server network topology.

639	   For an inter-server copy, the client notifies the source server that
640	   a file will be copied by the destination server using a COPY_NOTIFY
641	   operation.  The client then initiates the copy by sending the COPY
642	   operation to the destination server.  The destination server may
643	   perform the copy synchronously or asynchronously.

645	   A synchronous inter-server copy is shown in Figure 4.  In this case,
646	   the destination server chooses to perform the copy before responding
647	   to the client's COPY request.

649	   An asynchronous copy is shown in Figure 5.  In this case, the
650	   destination server chooses to respond to the client's COPY request
651	   immediately and then perform the copy asynchronously.

653	     Client                Source         Destination
654	        +                    +                 +
655	        |                    |                 |
656	        |--- COPY_NOTIFY --->|                 |
657	        |<------------------/|                 |
658	        |                    |                 |
659	        |                    |                 |
660	        |--- COPY ---------------------------->|
661	        |                    |                 |
662	        |                    |                 |
663	        |                    |<----- read -----|
664	        |                    |\--------------->|
665	        |                    |                 |
666	        |                    |        .        | Multiple reads may
667	        |                    |        .        | be necessary
668	        |                    |        .        |
669	        |                    |                 |
670	        |                    |                 |
671	        |<------------------------------------/| Destination replies
672	        |                    |                 | to COPY

674	                Figure 4: A synchronous inter-server copy.

676	     Client                Source         Destination
677	        +                    +                 +
678	        |                    |                 |
679	        |--- COPY_NOTIFY --->|                 |
680	        |<------------------/|                 |
681	        |                    |                 |
682	        |                    |                 |
683	        |--- COPY ---------------------------->|
684	        |<------------------------------------/|
685	        |                    |                 |
686	        |                    |                 |
687	        |                    |<----- read -----|
688	        |                    |\--------------->|
689	        |                    |                 |
690	        |                    |        .        | Multiple reads may
691	        |                    |        .        | be necessary
692	        |                    |        .        |
693	        |                    |                 |
694	        |                    |                 |
695	        |--- COPY_STATUS --------------------->| Client may poll
696	        |<------------------------------------/| for status
697	        |                    |                 |
698	        |                    |        .        | Multiple COPY_STATUS
699	        |                    |        .        | operations may be sent
700	        |                    |        .        |
701	        |                    |                 |
702	        |                    |                 |
703	        |                    |                 |
704	        |<-- CB_COPY --------------------------| Destination reports
705	        |\------------------------------------>| results
706	        |                    |                 |

708	               Figure 5: An asynchronous inter-server copy.

710	4.2.3.  Server-to-Server Copy Protocol

712	   During an inter-server copy, the destination server reads the file
713	   data from the source server.  The source server and destination
714	   server are not required to use a specific protocol to transfer the
715	   file data.  The choice of what protocol to use is ultimately the
716	   destination server's decision.

718	4.2.3.1.  Using NFSv4.x as a Server-to-Server Copy Protocol

720	   The destination server MAY use standard NFSv4.x (where x >= 1) to
721	   read the data from the source server.  If NFSv4.x is used for the
722	   server-to-server copy protocol, the destination server can use the
723	   filehandle contained in the COPY request with standard NFSv4.x
724	   operations to read data from the source server.  Specifically, the
725	   destination server may use the NFSv4.x OPEN operation's CLAIM_FH
726	   facility to open the file being copied and obtain an open stateid.
727	   Using the stateid, the destination server may then use NFSv4.x READ
728	   operations to read the file.

730	4.2.3.2.  Using an alternative Server-to-Server Copy Protocol

732	   In a homogeneous environment, the source and destination servers
733	   might be able to perform the file copy extremely efficiently using
734	   specialized protocols.  For example the source and destination
735	   servers might be two nodes sharing a common file system format for
736	   the source and destination file systems.  Thus the source and
737	   destination are in an ideal position to efficiently render the image
738	   of the source file to the destination file by replicating the file
739	   system formats at the block level.  Another possibility is that the
740	   source and destination might be two nodes sharing a common storage
741	   area network, and thus there is no need to copy any data at all, and
742	   instead ownership of the file and its contents might simply be re-
743	   assigned to the destination.  To allow for these possibilities, the
744	   destination server is allowed to use a server-to-server copy protocol
745	   of its choice.

747	   In a heterogeneous environment, using a protocol other than NFSv4.x
748	   (e.g.  HTTP [14] or FTP [15]) presents some challenges.  In
749	   particular, the destination server is presented with the challenge of
750	   accessing the source file given only an NFSv4.x filehandle.

752	   One option for protocols that identify source files with path names
753	   is to use an ASCII hexadecimal representation of the source
754	   filehandle as the file name.

756	   Another option for the source server is to use URLs to direct the
757	   destination server to a specialized service.  For example, the
758	   response to COPY_NOTIFY could include the URL
759	   ftp://s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII
760	   hexadecimal representation of the source filehandle.  When the
761	   destination server receives the source server's URL, it would use
762	   "_FH/0x12345" as the file name to pass to the FTP server listening on
763	   port 9999 of s1.example.com.  On port 9999 there would be a special
764	   instance of the FTP service that understands how to convert NFS
765	   filehandles to an open file descriptor (in many operating systems,
766	   this would require a new system call, one which is the inverse of the
767	   makefh() function that the pre-NFSv4 MOUNT service needs).

769	   Authenticating and identifying the destination server to the source
770	   server is also a challenge.  Recommendations for how to accomplish
771	   this are given in Section 4.4.1.2.4 and Section 4.4.1.4.

773	4.3.  Operations

775	   In the sections that follow, several operations are defined that
776	   together provide the server-side copy feature.  These operations are
777	   intended to be OPTIONAL operations as defined in section 17 of [2].
778	   The COPY_NOTIFY, COPY_REVOKE, COPY, COPY_ABORT, and COPY_STATUS
779	   operations are designed to be sent within an NFSv4 COMPOUND
780	   procedure.  The CB_COPY operation is designed to be sent within an
781	   NFSv4 CB_COMPOUND procedure.

783	   Each operation is performed in the context of the user identified by
784	   the ONC RPC credential of its containing COMPOUND or CB_COMPOUND
785	   request.  For example, a COPY_ABORT operation issued by a given user
786	   indicates that a specified COPY operation initiated by the same user
787	   be canceled.  Therefore a COPY_ABORT MUST NOT interfere with a copy
788	   of the same file initiated by another user.

790	   An NFS server MAY allow an administrative user to monitor or cancel
791	   copy operations using an implementation specific interface.

793	4.3.1.  netloc4 - Network Locations

795	   The server-side copy operations specify network locations using the
796	   netloc4 data type shown below:

798	   enum netloc_type4 {
799	           NL4_NAME        = 0,
800	           NL4_URL         = 1,
801	           NL4_NETADDR     = 2
802	   };
803	   union netloc4 switch (netloc_type4 nl_type) {
804	           case NL4_NAME:          utf8str_cis nl_name;
805	           case NL4_URL:           utf8str_cis nl_url;
806	           case NL4_NETADDR:       netaddr4    nl_addr;
807	   };

809	   If the netloc4 is of type NL4_NAME, the nl_name field MUST be
810	   specified as a UTF-8 string.  The nl_name is expected to be resolved
811	   to a network address via DNS, LDAP, NIS, /etc/hosts, or some other
812	   means.  If the netloc4 is of type NL4_URL, a server URL [5]
813	   appropriate for the server-to-server copy operation is specified as a
814	   UTF-8 string.  If the netloc4 is of type NL4_NETADDR, the nl_addr
815	   field MUST contain a valid netaddr4 as defined in Section 3.3.9 of
816	   [2].

818	   When netloc4 values are used for an inter-server copy as shown in
819	   Figure 3, their values may be evaluated on the source server,
820	   destination server, and client.  The network environment in which
821	   these systems operate should be configured so that the netloc4 values
822	   are interpreted as intended on each system.

824	4.3.2.  Operation 61: COPY_NOTIFY - Notify a source server of a future
825	        copy

827	4.3.2.1.  ARGUMENT

829	   struct COPY_NOTIFY4args {
830	           /* CURRENT_FH: source file */
831	           netloc4         cna_destination_server;
832	   };

834	4.3.2.2.  RESULT

836	   struct COPY_NOTIFY4resok {
837	           nfstime4        cnr_lease_time;
838	           netloc4         cnr_source_server<>;
839	   };

841	   union COPY_NOTIFY4res switch (nfsstat4 cnr_status) {
842	           case NFS4_OK:
843	                   COPY_NOTIFY4resok       resok4;
844	           default:
845	                   void;
846	   };

848	4.3.2.3.  DESCRIPTION

850	   This operation is used for an inter-server copy.  A client sends this
851	   operation in a COMPOUND request to the source server to authorize a
852	   destination server identified by cna_destination_server to read the
853	   file specified by CURRENT_FH on behalf of the given user.

855	   The cna_destination_server MUST be specified using the netloc4
856	   network location format.  The server is not required to resolve the
857	   cna_destination_server address before completing this operation.

859	   If this operation succeeds, the source server will allow the
860	   cna_destination_server to copy the specified file on behalf of the
861	   given user.  If COPY_NOTIFY succeeds, the destination server is
862	   granted permission to read the file as long as both of the following
863	   conditions are met:

865	   o  The destination server begins reading the source file before the
866	      cnr_lease_time expires.  If the cnr_lease_time expires while the
867	      destination server is still reading the source file, the
868	      destination server is allowed to finish reading the file.

870	   o  The client has not issued a COPY_REVOKE for the same combination
871	      of user, filehandle, and destination server.

873	   The cnr_lease_time is chosen by the source server.  A cnr_lease_time
874	   of 0 (zero) indicates an infinite lease.  To renew the copy lease
875	   time the client should resend the same copy notification request to
876	   the source server.

878	   To avoid the need for synchronized clocks, copy lease times are
879	   granted by the server as a time delta.  However, there is a
880	   requirement that the client and server clocks do not drift
881	   excessively over the duration of the lease.  There is also the issue
882	   of propagation delay across the network which could easily be several
883	   hundred milliseconds as well as the possibility that requests will be
884	   lost and need to be retransmitted.

886	   To take propagation delay into account, the client should subtract it
887	   from copy lease times (e.g. if the client estimates the one-way
888	   propagation delay as 200 milliseconds, then it can assume that the
889	   lease is already 200 milliseconds old when it gets it).  In addition,
890	   it will take another 200 milliseconds to get a response back to the
891	   server.  So the client must send a lease renewal or send the copy
892	   offload request to the cna_destination_server at least 400
893	   milliseconds before the copy lease would expire.  If the propagation
894	   delay varies over the life of the lease (e.g. the client is on a
895	   mobile host), the client will need to continuously subtract the
896	   increase in propagation delay from the copy lease times.

898	   The server's copy lease period configuration should take into account
899	   the network distance of the clients that will be accessing the
900	   server's resources.  It is expected that the lease period will take
901	   into account the network propagation delays and other network delay
902	   factors for the client population.  Since the protocol does not allow
903	   for an automatic method to determine an appropriate copy lease
904	   period, the server's administrator may have to tune the copy lease
905	   period.

907	   A successful response will also contain a list of names, addresses,
908	   and URLs called cnr_source_server, on which the source is willing to
909	   accept connections from the destination.  These might not be
910	   reachable from the client and might be located on networks to which
911	   the client has no connection.

913	   If the client wishes to perform an inter-server copy, the client MUST
914	   send a COPY_NOTIFY to the source server.  Therefore, the source
915	   server MUST support COPY_NOTIFY.

917	   For a copy only involving one server (the source and destination are
918	   on the same server), this operation is unnecessary.

920	   The COPY_NOTIFY operation may fail for the following reasons (this is
921	   a partial list):

923	   NFS4ERR_MOVED:  The file system which contains the source file is not
924	      present on the source server.  The client can determine the
925	      correct location and reissue the operation with the correct
926	      location.

928	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
929	      NFS server receiving this request.

931	   NFS4ERR_WRONGSEC:  The security mechanism being used by the client
932	      does not match the server's security policy.

934	4.3.3.  Operation 62: COPY_REVOKE - Revoke a destination server's copy
935	        privileges

937	4.3.3.1.  ARGUMENT

939	   struct COPY_REVOKE4args {
940	           /* CURRENT_FH: source file */
941	           netloc4         cra_destination_server;
942	   };

944	4.3.3.2.  RESULT

946	   struct COPY_REVOKE4res {
947	           nfsstat4        crr_status;
948	   };

950	4.3.3.3.  DESCRIPTION

952	   This operation is used for an inter-server copy.  A client sends this
953	   operation in a COMPOUND request to the source server to revoke the
954	   authorization of a destination server identified by
955	   cra_destination_server from reading the file specified by CURRENT_FH
956	   on behalf of given user.  If the cra_destination_server has already
957	   begun copying the file, a successful return from this operation
958	   indicates that further access will be prevented.

960	   The cra_destination_server MUST be specified using the netloc4
961	   network location format.  The server is not required to resolve the
962	   cra_destination_server address before completing this operation.

964	   The COPY_REVOKE operation is useful in situations in which the source
965	   server granted a very long or infinite lease on the destination
966	   server's ability to read the source file and all copy operations on
967	   the source file have been completed.

969	   For a copy only involving one server (the source and destination are
970	   on the same server), this operation is unnecessary.

972	   If the server supports COPY_NOTIFY, the server is REQUIRED to support
973	   the COPY_REVOKE operation.

975	   The COPY_REVOKE operation may fail for the following reasons (this is
976	   a partial list):

978	   NFS4ERR_MOVED:  The file system which contains the source file is not
979	      present on the source server.  The client can determine the
980	      correct location and reissue the operation with the correct
981	      location.

983	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
984	      NFS server receiving this request.

986	4.3.4.  Operation 59: COPY - Initiate a server-side copy

988	4.3.4.1.  ARGUMENT

990	   const COPY4_GUARDED     = 0x00000001;
991	   const COPY4_METADATA    = 0x00000002;

993	   struct COPY4args {
994	           /* SAVED_FH: source file */
995	           /* CURRENT_FH: destination file or */
996	           /*             directory           */
997	           offset4         ca_src_offset;
998	           offset4         ca_dst_offset;
999	           length4         ca_count;
1000	           uint32_t        ca_flags;
1001	           component4      ca_destination;
1002	           netloc4         ca_source_server<>;
1003	   };

1005	4.3.4.2.  RESULT

1007	   union COPY4res switch (nfsstat4 cr_status) {
1008	           case NFS4_OK:
1009	                   stateid4        cr_callback_id<1>;
1010	           default:
1011	                   length4         cr_bytes_copied;
1012	   };

1014	4.3.4.3.  DESCRIPTION

1016	   The COPY operation is used for both intra- and inter-server copies.
1017	   In both cases, the COPY is always sent from the client to the
1018	   destination server of the file copy.  The COPY operation requests
1019	   that a file be copied from the location specified by the SAVED_FH
1020	   value to the location specified by the combination of CURRENT_FH and
1021	   ca_destination.

1023	   The SAVED_FH must be a regular file.  If SAVED_FH is not a regular
1024	   file, the operation MUST fail and return NFS4ERR_WRONG_TYPE.

1026	   In order to set SAVED_FH to the source file handle, the compound
1027	   procedure requesting the COPY will include a sub-sequence of
1028	   operations such as

1030	                           PUTFH source-fh
1031	                           SAVEFH

1033	   If the request is for a server-to-server copy, the source-fh is a
1034	   filehandle from the source server and the compound procedure is being
1035	   executed on the destination server.  In this case, the source-fh is a
1036	   foreign filehandle on the server receiving the COPY request.  If
1037	   either PUTFH or SAVEFH checked the validity of the filehandle, the
1038	   operation would likely fail and return NFS4ERR_STALE.

1040	   In order to avoid this problem, the minor version incorporating the
1041	   COPY operations will need to make a few small changes in the handling
1042	   of existing operations.  If a server supports the server-to-server
1043	   COPY feature, a PUTFH followed by a SAVEFH MUST NOT return
1044	   NFS4ERR_STALE for either operation.  These restrictions do not pose
1045	   substantial difficulties for servers.  The CURRENT_FH and SAVED_FH
1046	   may be validated in the context of the operation referencing them and
1047	   an NFS4ERR_STALE error returned for an invalid file handle at that
1048	   point.

1050	   The CURRENT_FH and ca_destination together specify the destination of
1051	   the copy operation.  If ca_destination is of 0 (zero) length, then
1052	   CURRENT_FH specifies the target file.  In this case, CURRENT_FH MUST
1053	   be a regular file and not a directory.  If ca_destination is not of 0
1054	   (zero) length, the ca_destination argument specifies the file name to
1055	   which the data will be copied within the directory identified by
1056	   CURRENT_FH.  In this case, CURRENT_FH MUST be a directory and not a
1057	   regular file.

1059	   If the file named by ca_destination does not exist and the operation
1060	   completes successfully, the file will be visible in the file system
1061	   namespace.  If the file does not exist and the operation fails, the
1062	   file MAY be visible in the file system namespace depending on when
1063	   the failure occurs and on the implementation of the NFS server
1064	   receiving the COPY operation.  If the ca_destination name cannot be
1065	   created in the destination file system (due to file name
1066	   restrictions, such as case or length), the operation MUST fail.

1068	   The ca_src_offset is the offset within the source file from which the
1069	   data will be read, the ca_dst_offset is the offset within the
1070	   destination file to which the data will be written, and the ca_count
1071	   is the number of bytes that will be copied.  An offset of 0 (zero)
1072	   specifies the start of the file.  A count of 0 (zero) requests that
1073	   all bytes from ca_src_offset through EOF be copied to the
1074	   destination.  If concurrent modifications to the source file overlap
1075	   with the source file region being copied, the data copied may include
1076	   all, some, or none of the modifications.  The client can use standard
1077	   NFS operations (e.g.  OPEN with OPEN4_SHARE_DENY_WRITE or mandatory
1078	   byte range locks) to protect against concurrent modifications if the
1079	   client is concerned about this.  If the source file's end of file is
1080	   being modified in parallel with a copy that specifies a count of 0
1081	   (zero) bytes, the amount of data copied is implementation dependent
1082	   (clients may guard against this case by specifying a non-zero count
1083	   value or preventing modification of the source file as mentioned
1084	   above).

1086	   If the source offset or the source offset plus count is greater than
1087	   or equal to the size of the source file, the operation will fail with
1088	   NFS4ERR_INVAL.  The destination offset or destination offset plus
1089	   count may be greater than the size of the destination file.  This
1090	   allows for the client to issue parallel copies to implement
1091	   operations such as "cat file1 file2 file3 file4 > dest".

1093	   If the destination file is created as a result of this command, the
1094	   destination file's size will be equal to the number of bytes
1095	   successfully copied.  If the destination file already existed, the
1096	   destination file's size may increase as a result of this operation
1097	   (e.g. if ca_dst_offset plus ca_count is greater than the
1098	   destination's initial size).

1100	   If the ca_source_server list is specified, then this is an inter-
1101	   server copy operation and the source file is on a remote server.  The
1102	   client is expected to have previously issued a successful COPY_NOTIFY
1103	   request to the remote source server.  The ca_source_server list
1104	   SHOULD be the same as the COPY_NOTIFY response's cnr_source_server
1105	   list.  If the client includes the entries from the COPY_NOTIFY
1106	   response's cnr_source_server list in the ca_source_server list, the
1107	   source server can indicate a specific copy protocol for the
1108	   destination server to use by returning a URL, which specifies both a
1109	   protocol service and server name.  Server-to-server copy protocol
1110	   considerations are described in Section 4.2.3 and Section 4.4.1.

1112	   The ca_flags argument allows the copy operation to be customized in
1113	   the following ways using the guarded flag (COPY4_GUARDED) and the
1114	   metadata flag (COPY4_METADATA).

1116	   [NOTE: Earlier versions of this document defined a
1117	   COPY4_SPACE_RESERVED flag for controlling space reservations on the
1118	   destination file.  This flag has been removed with the expectation
1119	   that the space_reserve attribute defined in XXX_TDH_XXX will be
1120	   adopted.]

1122	   If the guarded flag is set and the destination exists on the server,
1123	   this operation will fail with NFS4ERR_EXIST.

1125	   If the guarded flag is not set and the destination exists on the
1126	   server, the behavior is implementation dependent.

1128	   If the metadata flag is set and the client is requesting a whole file
1129	   copy (i.e. ca_count is 0 (zero)), a subset of the destination file's
1130	   attributes MUST be the same as the source file's corresponding
1131	   attributes and a subset of the destination file's attributes SHOULD
1132	   be the same as the source file's corresponding attributes.  The
1133	   attributes in the MUST and SHOULD copy subsets will be defined for
1134	   each NFS version.

1136	   For NFSv4.1, Table 1 and Table 2 list the REQUIRED and RECOMMENDED
1137	   attributes respectively.  A "MUST" in the "Copy to destination file?"
1138	   column indicates that the attribute is part of the MUST copy set.  A
1139	   "SHOULD" in the "Copy to destination file?" column indicates that the
1140	   attribute is part of the SHOULD copy set.

1142	          +--------------------+----+---------------------------+
1143	          | Name               | Id | Copy to destination file? |
1144	          +--------------------+----+---------------------------+
1145	          | supported_attrs    | 0  | no                        |
1146	          | type               | 1  | MUST                      |
1147	          | fh_expire_type     | 2  | no                        |
1148	          | change             | 3  | SHOULD                    |
1149	          | size               | 4  | MUST                      |
1150	          | link_support       | 5  | no                        |
1151	          | symlink_support    | 6  | no                        |
1152	          | named_attr         | 7  | no                        |
1153	          | fsid               | 8  | no                        |
1154	          | unique_handles     | 9  | no                        |
1155	          | lease_time         | 10 | no                        |
1156	          | rdattr_error       | 11 | no                        |
1157	          | filehandle         | 19 | no                        |
1158	          | suppattr_exclcreat | 75 | no                        |
1159	          +--------------------+----+---------------------------+

1161	                                  Table 1

1163	          +--------------------+----+---------------------------+
1164	          | Name               | Id | Copy to destination file? |
1165	          +--------------------+----+---------------------------+
1166	          | acl                | 12 | MUST                      |
1167	          | aclsupport         | 13 | no                        |
1168	          | archive            | 14 | no                        |
1169	          | cansettime         | 15 | no                        |
1170	          | case_insensitive   | 16 | no                        |
1171	          | case_preserving    | 17 | no                        |
1172	          | change_policy      | 60 | no                        |
1173	          | chown_restricted   | 18 | MUST                      |
1174	          | dacl               | 58 | MUST                      |
1175	          | dir_notif_delay    | 56 | no                        |
1176	          | dirent_notif_delay | 57 | no                        |
1177	          | fileid             | 20 | no                        |
1178	          | files_avail        | 21 | no                        |
1179	          | files_free         | 22 | no                        |
1180	          | files_total        | 23 | no                        |
1181	          | fs_charset_cap     | 76 | no                        |
1182	          | fs_layout_type     | 62 | no                        |
1183	          | fs_locations       | 24 | no                        |
1184	          | fs_locations_info  | 67 | no                        |
1185	          | fs_status          | 61 | no                        |
1186	          | hidden             | 25 | MUST                      |
1187	          | homogeneous        | 26 | no                        |
1188	          | layout_alignment   | 66 | no                        |
1189	          | layout_blksize     | 65 | no                        |
1190	          | layout_hint        | 63 | no                        |
1191	          | layout_type        | 64 | no                        |
1192	          | maxfilesize        | 27 | no                        |
1193	          | maxlink            | 28 | no                        |
1194	          | maxname            | 29 | no                        |
1195	          | maxread            | 30 | no                        |
1196	          | maxwrite           | 31 | no                        |
1197	          | mdsthreshold       | 68 | no                        |
1198	          | mimetype           | 32 | MUST                      |
1199	          | mode               | 33 | MUST                      |
1200	          | mode_set_masked    | 74 | no                        |
1201	          | mounted_on_fileid  | 55 | no                        |
1202	          | no_trunc           | 34 | no                        |
1203	          | numlinks           | 35 | no                        |
1204	          | owner              | 36 | MUST                      |
1205	          | owner_group        | 37 | MUST                      |
1206	          | quota_avail_hard   | 38 | no                        |
1207	          | quota_avail_soft   | 39 | no                        |
1208	          | quota_used         | 40 | no                        |
1209	          | rawdev             | 41 | no                        |
1210	          | retentevt_get      | 71 | MUST                      |
1211	          | retentevt_set      | 72 | no                        |
1212	          | retention_get      | 69 | MUST                      |
1213	          | retention_hold     | 73 | MUST                      |
1214	          | retention_set      | 70 | no                        |
1215	          | sacl               | 59 | MUST                      |
1216	          | space_avail        | 42 | no                        |
1217	          | space_free         | 43 | no                        |
1218	          | space_total        | 44 | no                        |
1219	          | space_used         | 45 | no                        |
1220	          | system             | 46 | MUST                      |
1221	          | time_access        | 47 | MUST                      |
1222	          | time_access_set    | 48 | no                        |
1223	          | time_backup        | 49 | no                        |
1224	          | time_create        | 50 | MUST                      |
1225	          | time_delta         | 51 | no                        |
1226	          | time_metadata      | 52 | SHOULD                    |
1227	          | time_modify        | 53 | MUST                      |
1228	          | time_modify_set    | 54 | no                        |
1229	          +--------------------+----+---------------------------+

1231	                                  Table 2

1233	   [NOTE: The space_reserve attribute XXX_TDH_XXX will be in the MUST
1234	   set.]

1236	   [NOTE: The source file's attribute values will take precedence over
1237	   any attribute values inherited by the destination file.]
1238	   In the case of an inter-server copy or an intra-server copy between
1239	   file systems, the attributes supported for the source file and
1240	   destination file could be different.  By definition,the REQUIRED
1241	   attributes will be supported in all cases.  If the metadata flag is
1242	   set and the source file has a RECOMMENDED attribute that is not
1243	   supported for the destination file, the copy MUST fail with
1244	   NFS4ERR_ATTRNOTSUPP.

1246	   Any attribute supported by the destination server that is not set on
1247	   the source file SHOULD be left unset.

1249	   Metadata attributes not exposed via the NFS protocol SHOULD be copied
1250	   to the destination file where appropriate.

1252	   The destination file's named attributes are not duplicated from the
1253	   source file.  After the copy process completes, the client MAY
1254	   attempt to duplicate named attributes using standard NFSv4
1255	   operations.  However, the destination file's named attribute
1256	   capabilities MAY be different from the source file's named attribute
1257	   capabilities.

1259	   If the metadata flag is not set and the client is requesting a whole
1260	   file copy (i.e. ca_count is 0 (zero)), the destination file's
1261	   metadata is implementation dependent.

1263	   If the client is requesting a partial file copy (i.e. ca_count is not
1264	   0 (zero)), the client SHOULD NOT set the metadata flag and the server
1265	   MUST ignore the metadata flag.

1267	   If the operation does not result in an immediate failure, the server
1268	   will return NFS4_OK, and the CURRENT_FH will remain the destination's
1269	   filehandle.

1271	   If an immediate failure does occur, cr_bytes_copied will be set to
1272	   the number of bytes copied to the destination file before the error
1273	   occurred.  The cr_bytes_copied value indicates the number of bytes
1274	   copied but not which specific bytes have been copied.

1276	   A return of NFS4_OK indicates that either the operation is complete
1277	   or the operation was initiated and a callback will be used to deliver
1278	   the final status of the operation.

1280	   If the cr_callback_id is returned, this indicates that the operation
1281	   was initiated and a CB_COPY callback will deliver the final results
1282	   of the operation.  The cr_callback_id stateid is termed a copy
1283	   stateid in this context.  The server is given the option of returning
1284	   the results in a callback because the data may require a relatively
1285	   long period of time to copy.

1287	   If no cr_callback_id is returned, the operation completed
1288	   synchronously and no callback will be issued by the server.  The
1289	   completion status of the operation is indicated by cr_status.

1291	   If the copy completes successfully, either synchronously or
1292	   asynchronously, the data copied from the source file to the
1293	   destination file MUST appear identical to the NFS client.  However,
1294	   the NFS server's on disk representation of the data in the source
1295	   file and destination file MAY differ.  For example, the NFS server
1296	   might encrypt, compress, deduplicate, or otherwise represent the on
1297	   disk data in the source and destination file differently.

1299	   In the event of a failure the state of the destination file is
1300	   implementation dependent.  The COPY operation may fail for the
1301	   following reasons (this is a partial list).

1303	   NFS4ERR_MOVED:  The file system which contains the source file, or
1304	      the destination file or directory is not present.  The client can
1305	      determine the correct location and reissue the operation with the
1306	      correct location.

1308	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
1309	      NFS server receiving this request.

1311	   NFS4ERR_PARTNER_NOTSUPP:  The remote server does not support the
1312	      server-to-server copy offload protocol.

1314	   NFS4ERR_PARTNER_NO_AUTH:  The remote server does not authorize a
1315	      server-to-server copy offload operation.  This may be due to the
1316	      client's failure to send the COPY_NOTIFY operation to the remote
1317	      server, the remote server receiving a server-to-server copy
1318	      offload request after the copy lease time expired, or for some
1319	      other permission problem.

1321	   NFS4ERR_FBIG:  The copy operation would have caused the file to grow
1322	      beyond the server's limit.

1324	   NFS4ERR_NOTDIR:  The CURRENT_FH is a file and ca_destination has non-
1325	      zero length.

1327	   NFS4ERR_WRONG_TYPE:  The SAVED_FH is not a regular file.

1329	   NFS4ERR_ISDIR:  The CURRENT_FH is a directory and ca_destination has
1330	      zero length.

1332	   NFS4ERR_INVAL:  The source offset or offset plus count are greater
1333	      than or equal to the size of the source file.

1335	   NFS4ERR_DELAY:  The server does not have the resources to perform the
1336	      copy operation at the current time.  The client should retry the
1337	      operation sometime in the future.

1339	   NFS4ERR_METADATA_NOTSUPP:  The destination file cannot support the
1340	      same metadata as the source file.

1342	   NFS4ERR_WRONGSEC:  The security mechanism being used by the client
1343	      does not match the server's security policy.

1345	4.3.5.  Operation 60: COPY_ABORT - Cancel a server-side copy

1347	4.3.5.1.  ARGUMENT

1349	   struct COPY_ABORT4args {
1350	           /* CURRENT_FH: desination file */
1351	           stateid4        caa_stateid;
1352	   };

1354	4.3.5.2.  RESULT

1356	   struct COPY_ABORT4res {
1357	           nfsstat4        car_status;
1358	   };

1360	4.3.5.3.  DESCRIPTION

1362	   COPY_ABORT is used for both intra- and inter-server asynchronous
1363	   copies.  The COPY_ABORT operation allows the client to cancel a
1364	   server-side copy operation that it initiated.  This operation is sent
1365	   in a COMPOUND request from the client to the destination server.
1366	   This operation may be used to cancel a copy when the application that
1367	   requested the copy exits before the operation is completed or for
1368	   some other reason.

1370	   The request contains the filehandle and copy stateid cookies that act
1371	   as the context for the previously initiated copy operation.

1373	   The result's car_status field indicates whether the cancel was
1374	   successful or not.  A value of NFS4_OK indicates that the copy
1375	   operation was canceled and no callback will be issued by the server.
1376	   A copy operation that is successfully canceled may result in none,
1377	   some, or all of the data copied.

1379	   If the server supports asynchronous copies, the server is REQUIRED to
1380	   support the COPY_ABORT operation.

1382	   The COPY_ABORT operation may fail for the following reasons (this is
1383	   a partial list):

1385	   NFS4ERR_NOTSUPP:  The abort operation is not supported by the NFS
1386	      server receiving this request.

1388	   NFS4ERR_RETRY:  The abort failed, but a retry at some time in the
1389	      future MAY succeed.

1391	   NFS4ERR_COMPLETE_ALREADY:  The abort failed, and a callback will
1392	      deliver the results of the copy operation.

1394	   NFS4ERR_SERVERFAULT:  An error occurred on the server that does not
1395	      map to a specific error code.

1397	4.3.6.  Operation 63: COPY_STATUS - Poll for status of a server-side
1398	        copy

1400	4.3.6.1.  ARGUMENT

1402	   struct COPY_STATUS4args {
1403	           /* CURRENT_FH: destination file */
1404	           stateid4        csa_stateid;
1405	   };

1407	4.3.6.2.  RESULT

1409	   struct COPY_STATUS4resok {
1410	           length4         csr_bytes_copied;
1411	           nfsstat4        csr_complete<1>;
1412	   };

1414	   union COPY_STATUS4res switch (nfsstat4 csr_status) {
1415	           case NFS4_OK:
1416	                   COPY_STATUS4resok       resok4;
1417	           default:
1418	                   void;
1419	   };

1421	4.3.6.3.  DESCRIPTION

1423	   COPY_STATUS is used for both intra- and inter-server asynchronous
1424	   copies.  The COPY_STATUS operation allows the client to poll the
1425	   server to determine the status of an asynchronous copy operation.
1426	   This operation is sent by the client to the destination server.

1428	   If this operation is successful, the number of bytes copied are
1429	   returned to the client in the csr_bytes_copied field.  The
1430	   csr_bytes_copied value indicates the number of bytes copied but not
1431	   which specific bytes have been copied.

1433	   If the optional csr_complete field is present, the copy has
1434	   completed.  In this case the status value indicates the result of the
1435	   asynchronous copy operation.  In all cases, the server will also
1436	   deliver the final results of the asynchronous copy in a CB_COPY
1437	   operation.

1439	   The failure of this operation does not indicate the result of the
1440	   asynchronous copy in any way.

1442	   If the server supports asynchronous copies, the server is REQUIRED to
1443	   support the COPY_STATUS operation.

1445	   The COPY_STATUS operation may fail for the following reasons (this is
1446	   a partial list):

1448	   NFS4ERR_NOTSUPP:  The copy status operation is not supported by the
1449	      NFS server receiving this request.

1451	   NFS4ERR_BAD_STATEID:  The stateid is not valid (see Section 4.3.8
1452	      below).

1454	   NFS4ERR_EXPIRED:  The stateid has expired (see Copy Offload Stateid
1455	      section below).

1457	4.3.7.  Operation 15: CB_COPY - Report results of a server-side copy
1458	4.3.7.1.  ARGUMENT

1460	   union copy_info4 switch (nfsstat4 cca_status) {
1461	           case NFS4_OK:
1462	                   void;
1463	           default:
1464	                   length4         cca_bytes_copied;
1465	   };

1467	   struct CB_COPY4args {
1468	           nfs_fh4         cca_fh;
1469	           stateid4        cca_stateid;
1470	           copy_info4      cca_copy_info;
1471	   };

1473	4.3.7.2.  RESULT

1475	   struct CB_COPY4res {
1476	           nfsstat4        ccr_status;
1477	   };

1479	4.3.7.3.  DESCRIPTION

1481	   CB_COPY is used for both intra- and inter-server asynchronous copies.
1482	   The CB_COPY callback informs the client of the result of an
1483	   asynchronous server-side copy.  This operation is sent by the
1484	   destination server to the client in a CB_COMPOUND request.  The copy
1485	   is identified by the filehandle and stateid arguments.  The result is
1486	   indicated by the status field.  If the copy failed, cca_bytes_copied
1487	   contains the number of bytes copied before the failure occurred.  The
1488	   cca_bytes_copied value indicates the number of bytes copied but not
1489	   which specific bytes have been copied.

1491	   In the absence of an established backchannel, the server cannot
1492	   signal the completion of the COPY via a CB_COPY callback.  The loss
1493	   of a callback channel would be indicated by the server setting the
1494	   SEQ4_STATUS_CB_PATH_DOWN flag in the sr_status_flags field of the
1495	   SEQUENCE operation.  The client must re-establish the callback
1496	   channel to receive the status of the COPY operation.  Prolonged loss
1497	   of the callback channel could result in the server dropping the COPY
1498	   operation state and invalidating the copy stateid.

1500	   If the client supports the COPY operation, the client is REQUIRED to
1501	   support the CB_COPY operation.

1503	   The CB_COPY operation may fail for the following reasons (this is a
1504	   partial list):

1506	   NFS4ERR_NOTSUPP:  The copy offload operation is not supported by the
1507	      NFS client receiving this request.

1509	4.3.8.  Copy Offload Stateids

1511	   A server may perform a copy offload operation asynchronously.  An
1512	   asynchronous copy is tracked using a copy offload stateid.  Copy
1513	   offload stateids are included in the COPY, COPY_ABORT, COPY_STATUS,
1514	   and CB_COPY operations.

1516	   Section 8.2.4 of [2] specifies that stateids are valid until either
1517	   (A) the client or server restart or (B) the client returns the
1518	   resource.

1520	   A copy offload stateid will be valid until either (A) the client or
1521	   server restart or (B) the client returns the resource by issuing a
1522	   COPY_ABORT operation or the client replies to a CB_COPY operation.

1524	   A copy offload stateid's seqid MUST NOT be 0 (zero).  In the context
1525	   of a copy offload operation, it is ambiguous to indicate the most
1526	   recent copy offload operation using a stateid with seqid of 0 (zero).
1527	   Therefore a copy offload stateid with seqid of 0 (zero) MUST be
1528	   considered invalid.

1530	4.4.  Security Considerations

1532	   The security considerations pertaining to NFSv4 [10] apply to this
1533	   document.

1535	   The standard security mechanisms provide by NFSv4 [10] may be used to
1536	   secure the protocol described in this document.

1538	   NFSv4 clients and servers supporting the the inter-server copy
1539	   operations described in this document are REQUIRED to implement [6],
1540	   including the RPCSEC_GSSv3 privileges copy_from_auth and
1541	   copy_to_auth.  If the server-to-server copy protocol is ONC RPC
1542	   based, the servers are also REQUIRED to implement the RPCSEC_GSSv3
1543	   privilege copy_confirm_auth.  These requirements to implement are not
1544	   requirements to use.  NFSv4 clients and servers are RECOMMENDED to
1545	   use [6] to secure server-side copy operations.

1547	4.4.1.  Inter-Server Copy Security

1549	4.4.1.1.  Requirements for Secure Inter-Server Copy

1551	   Inter-server copy is driven by several requirements:

1553	   o  The specification MUST NOT mandate an inter-server copy protocol.
1554	      There are many ways to copy data.  Some will be more optimal than
1555	      others depending on the identities of the source server and
1556	      destination server.  For example the source and destination
1557	      servers might be two nodes sharing a common file system format for
1558	      the source and destination file systems.  Thus the source and
1559	      destination are in an ideal position to efficiently render the
1560	      image of the source file to the destination file by replicating
1561	      the file system formats at the block level.  In other cases, the
1562	      source and destination might be two nodes sharing a common storage
1563	      area network, and thus there is no need to copy any data at all,
1564	      and instead ownership of the file and its contents simply gets re-
1565	      assigned to the destination.

1567	   o  The specification MUST provide guidance for using NFSv4.x as a
1568	      copy protocol.  For those source and destination servers willing
1569	      to use NFSv4.x there are specific security considerations that
1570	      this specification can and does address.

1572	   o  The specification MUST NOT mandate pre-configuration between the
1573	      source and destination server.  Requiring that the source and
1574	      destination first have a "copying relationship" increases the
1575	      administrative burden.  However the specification MUST NOT
1576	      preclude implementations that require pre-configuration.

1578	   o  The specification MUST NOT mandate a trust relationship between
1579	      the source and destination server.  The NFSv4 security model
1580	      requires mutual authentication between a principal on an NFS
1581	      client and a principal on an NFS server.  This model MUST continue
1582	      with the introduction of COPY.

1584	4.4.1.2.  Inter-Server Copy with RPCSEC_GSSv3

1586	   When the client sends a COPY_NOTIFY to the source server to expect
1587	   the destination to attempt to copy data from the source server, it is
1588	   expected that this copy is being done on behalf of the principal
1589	   (called the "user principal") that sent the RPC request that encloses
1590	   the COMPOUND procedure that contains the COPY_NOTIFY operation.  The
1591	   user principal is identified by the RPC credentials.  A mechanism
1592	   that allows the user principal to authorize the destination server to
1593	   perform the copy in a manner that lets the source server properly
1594	   authenticate the destination's copy, and without allowing the
1595	   destination to exceed its authorization is necessary.

1597	   An approach that sends delegated credentials of the client's user
1598	   principal to the destination server is not used for the following
1599	   reasons.  If the client's user delegated its credentials, the
1600	   destination would authenticate as the user principal.  If the
1601	   destination were using the NFSv4 protocol to perform the copy, then
1602	   the source server would authenticate the destination server as the
1603	   user principal, and the file copy would securely proceed.  However,
1604	   this approach would allow the destination server to copy other files.
1605	   The user principal would have to trust the destination server to not
1606	   do so.  This is counter to the requirements, and therefore is not
1607	   considered.  Instead an approach using RPCSEC_GSSv3 [6] privileges is
1608	   proposed.

1610	   One of the stated applications of the proposed RPCSEC_GSSv3 protocol
1611	   is compound client host and user authentication [+ privilege
1612	   assertion].  For inter-server file copy, we require compound NFS
1613	   server host and user authentication [+ privilege assertion].  The
1614	   distinction between the two is one without meaning.

1616	   RPCSEC_GSSv3 introduces the notion of privileges.  We define three
1617	   privileges:

1619	   copy_from_auth:  A user principal is authorizing a source principal
1620	      ("nfs@<source>") to allow a destination principal ("nfs@
1621	      <destination>") to copy a file from the source to the destination.
1622	      This privilege is established on the source server before the user
1623	      principal sends a COPY_NOTIFY operation to the source server.

1625	   struct copy_from_auth_priv {
1626	           secret4             cfap_shared_secret;
1627	           netloc4             cfap_destination;
1628	           /* the NFSv4 user name that the user principal maps to */
1629	           utf8str_mixed       cfap_username;
1630	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
1631	           unsigned int        cfap_seq_num;
1632	   };

1634	      cap_shared_secret is a secret value the user principal generates.

1636	   copy_to_auth:  A user principal is authorizing a destination
1637	      principal ("nfs@<destination>") to allow it to copy a file from
1638	      the source to the destination.  This privilege is established on
1639	      the destination server before the user principal sends a COPY
1640	      operation to the destination server.

1642	   struct copy_to_auth_priv {
1643	           /* equal to cfap_shared_secret */
1644	           secret4              ctap_shared_secret;
1645	           netloc4              ctap_source;
1646	           /* the NFSv4 user name that the user principal maps to */
1647	           utf8str_mixed        ctap_username;
1648	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
1649	           unsigned int         ctap_seq_num;
1650	   };

1652	      ctap_shared_secret is a secret value the user principal generated
1653	      and was used to establish the copy_from_auth privilege with the
1654	      source principal.

1656	   copy_confirm_auth:  A destination principal is confirming with the
1657	      source principal that it is authorized to copy data from the
1658	      source on behalf of the user principal.  When the inter-server
1659	      copy protocol is NFSv4, or for that matter, any protocol capable
1660	      of being secured via RPCSEC_GSSv3 (i.e. any ONC RPC protocol),
1661	      this privilege is established before the file is copied from the
1662	      source to the destination.

1664	   struct copy_confirm_auth_priv {
1665	           /* equal to GSS_GetMIC() of cfap_shared_secret */
1666	           opaque              ccap_shared_secret_mic<>;
1667	           /* the NFSv4 user name that the user principal maps to */
1668	           utf8str_mixed       ccap_username;
1669	           /* equal to seq_num of rpc_gss_cred_vers_3_t */
1670	           unsigned int        ccap_seq_num;
1671	   };

1673	4.4.1.2.1.  Establishing a Security Context

1675	   When the user principal wants to COPY a file between two servers, if
1676	   it has not established copy_from_auth and copy_to_auth privileges on
1677	   the servers, it establishes them:

1679	   o  The user principal generates a secret it will share with the two
1680	      servers.  This shared secret will be placed in the
1681	      cfap_shared_secret and ctap_shared_secret fields of the
1682	      appropriate privilege data types, copy_from_auth_priv and
1683	      copy_to_auth_priv.

1685	   o  An instance of copy_from_auth_priv is filled in with the shared
1686	      secret, the destination server, and the NFSv4 user id of the user
1687	      principal.  It will be sent with an RPCSEC_GSS3_CREATE procedure,
1688	      and so cfap_seq_num is set to the seq_num of the credential of the
1689	      RPCSEC_GSS3_CREATE procedure.  Because cfap_shared_secret is a
1690	      secret, after XDR encoding copy_from_auth_priv, GSS_Wrap() (with
1691	      privacy) is invoked on copy_from_auth_priv.  The
1692	      RPCSEC_GSS3_CREATE procedure's arguments are:

1694	           struct {
1695	               rpc_gss3_gss_binding    *compound_binding;
1696	               rpc_gss3_chan_binding   *chan_binding_mic;
1697	               rpc_gss3_assertion      assertions<>;
1698	               rpc_gss3_extension      extensions<>;
1699	           } rpc_gss3_create_args;

1701	      The string "copy_from_auth" is placed in assertions[0].privs.  The
1702	      output of GSS_Wrap() is placed in extensions[0].data.  The field
1703	      extensions[0].critical is set to TRUE.  The source server calls
1704	      GSS_Unwrap() on the privilege, and verifies that the seq_num
1705	      matches the credential.  It then verifies that the NFSv4 user id
1706	      being asserted matches the source server's mapping of the user
1707	      principal.  If it does, the privilege is established on the source
1708	      server as: <"copy_from_auth", user id, destination>.  The
1709	      successful reply to RPCSEC_GSS3_CREATE has:

1711	           struct {
1712	               opaque                  handle<>;
1713	               rpc_gss3_chan_binding   *chan_binding_mic;
1714	               rpc_gss3_assertion      granted_assertions<>;
1715	               rpc_gss3_assertion      server_assertions<>;
1716	               rpc_gss3_extension      extensions<>;
1717	           } rpc_gss3_create_res;

1719	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
1720	      use on COPY_NOTIFY requests involving the source and destination
1721	      server. granted_assertions[0].privs will be equal to
1722	      "copy_from_auth".  The server will return a GSS_Wrap() of
1723	      copy_to_auth_priv.

1725	   o  An instance of copy_to_auth_priv is filled in with the shared
1726	      secret, the source server, and the NFSv4 user id.  It will be sent
1727	      with an RPCSEC_GSS3_CREATE procedure, and so ctap_seq_num is set
1728	      to the seq_num of the credential of the RPCSEC_GSS3_CREATE
1729	      procedure.  Because ctap_shared_secret is a secret, after XDR
1730	      encoding copy_to_auth_priv, GSS_Wrap() is invoked on
1731	      copy_to_auth_priv.  The RPCSEC_GSS3_CREATE procedure's arguments
1732	      are:

1734	           struct {
1735	               rpc_gss3_gss_binding    *compound_binding;
1736	               rpc_gss3_chan_binding   *chan_binding_mic;
1737	               rpc_gss3_assertion      assertions<>;
1738	               rpc_gss3_extension      extensions<>;
1739	           } rpc_gss3_create_args;

1741	      The string "copy_to_auth" is placed in assertions[0].privs.  The
1742	      output of GSS_Wrap() is placed in extensions[0].data.  The field
1743	      extensions[0].critical is set to TRUE.  After unwrapping,
1744	      verifying the seq_num, and the user principal to NFSv4 user ID
1745	      mapping, the destination establishes a privilege of
1746	      <"copy_to_auth", user id, source>.  The successful reply to
1747	      RPCSEC_GSS3_CREATE has:

1749	           struct {
1750	               opaque                  handle<>;
1751	               rpc_gss3_chan_binding   *chan_binding_mic;
1752	               rpc_gss3_assertion      granted_assertions<>;
1753	               rpc_gss3_assertion      server_assertions<>;
1754	               rpc_gss3_extension      extensions<>;
1755	           } rpc_gss3_create_res;

1757	      The field "handle" is the RPCSEC_GSSv3 handle that the client will
1758	      use on COPY requests involving the source and destination server.
1759	      The field granted_assertions[0].privs will be equal to
1760	      "copy_to_auth".  The server will return a GSS_Wrap() of
1761	      copy_to_auth_priv.

1763	4.4.1.2.2.  Starting a Secure Inter-Server Copy

1765	   When the client sends a COPY_NOTIFY request to the source server, it
1766	   uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle.
1767	   cna_destination_server in COPY_NOTIFY MUST be the same as the name of
1768	   the destination server specified in copy_from_auth_priv.  Otherwise,
1769	   COPY_NOTIFY will fail with NFS4ERR_ACCESS.  The source server
1770	   verifies that the privilege <"copy_from_auth", user id, destination>
1771	   exists, and annotates it with the source filehandle, if the user
1772	   principal has read access to the source file, and if administrative
1773	   policies give the user principal and the NFS client read access to
1774	   the source file (i.e. if the ACCESS operation would grant read
1775	   access).  Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS.

1777	   When the client sends a COPY request to the destination server, it
1778	   uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle.
1779	   ca_source_server in COPY MUST be the same as the name of the source
1780	   server specified in copy_to_auth_priv.  Otherwise, COPY will fail
1781	   with NFS4ERR_ACCESS.  The destination server verifies that the
1782	   privilege <"copy_to_auth", user id, source> exists, and annotates it
1783	   with the source and destination filehandles.  If the client has
1784	   failed to establish the "copy_to_auth" policy it will reject the
1785	   request with NFS4ERR_PARTNER_NO_AUTH.

1787	   If the client sends a COPY_REVOKE to the source server to rescind the
1788	   destination server's copy privilege, it uses the privileged
1789	   "copy_from_auth" RPCSEC_GSSv3 handle and the cra_destination_server
1790	   in COPY_REVOKE MUST be the same as the name of the destination server
1791	   specified in copy_from_auth_priv.  The source server will then delete
1792	   the <"copy_from_auth", user id, destination> privilege and fail any
1793	   subsequent copy requests sent under the auspices of this privilege
1794	   from the destination server.

1796	4.4.1.2.3.  Securing ONC RPC Server-to-Server Copy Protocols

1798	   After a destination server has a "copy_to_auth" privilege established
1799	   on it, and it receives a COPY request, if it knows it will use an ONC
1800	   RPC protocol to copy data, it will establish a "copy_confirm_auth"
1801	   privilege on the source server, using nfs@<destination> as the
1802	   initiator principal, and nfs@<source> as the target principal.

1804	   The value of the field ccap_shared_secret_mic is a GSS_VerifyMIC() of
1805	   the shared secret passed in the copy_to_auth privilege.  The field
1806	   ccap_username is the mapping of the user principal to an NFSv4 user
1807	   name ("user"@"domain" form), and MUST be the same as ctap_username
1808	   and cfap_username.  The field ccap_seq_num is the seq_num of the
1809	   RPCSEC_GSSv3 credential used for the RPCSEC_GSS3_CREATE procedure the
1810	   destination will send to the source server to establish the
1811	   privilege.

1813	   The source server verifies the privilege, and establishes a
1814	   <"copy_confirm_auth", user id, destination> privilege.  If the source
1815	   server fails to verify the privilege, the COPY operation will be
1816	   rejected with NFS4ERR_PARTNER_NO_AUTH.  All subsequent ONC RPC
1817	   requests sent from the destination to copy data from the source to
1818	   the destination will use the RPCSEC_GSSv3 handle returned by the
1819	   source's RPCSEC_GSS3_CREATE response.

1821	   Note that the use of the "copy_confirm_auth" privilege accomplishes
1822	   the following:

1824	   o  if a protocol like NFS is being used, with export policies, export
1825	      policies can be overridden in case the destination server as-an-
1826	      NFS-client is not authorized

1828	   o  manual configuration to allow a copy relationship between the
1829	      source and destination is not needed.

1831	   If the attempt to establish a "copy_confirm_auth" privilege fails,
1832	   then when the user principal sends a COPY request to destination, the
1833	   destination server will reject it with NFS4ERR_PARTNER_NO_AUTH.

1835	4.4.1.2.4.  Securing Non ONC RPC Server-to-Server Copy Protocols

1837	   If the destination won't be using ONC RPC to copy the data, then the
1838	   source and destination are using an unspecified copy protocol.  The
1839	   destination could use the shared secret and the NFSv4 user id to
1840	   prove to the source server that the user principal has authorized the
1841	   copy.

1843	   For protocols that authenticate user names with passwords (e.g.  HTTP
1844	   [14] and FTP [15]), the nfsv4 user id could be used as the user name,
1845	   and an ASCII hexadecimal representation of the RPCSEC_GSSv3 shared
1846	   secret could be used as the user password or as input into non-
1847	   password authentication methods like CHAP [16].

1849	4.4.1.3.  Inter-Server Copy via ONC RPC but without RPCSEC_GSSv3

1851	   ONC RPC security flavors other than RPCSEC_GSSv3 MAY be used with the
1852	   server-side copy offload operations described in this document.  In
1853	   particular, host-based ONC RPC security flavors such as AUTH_NONE and
1854	   AUTH_SYS MAY be used.  If a host-based security flavor is used, a
1855	   minimal level of protection for the server-to-server copy protocol is
1856	   possible.

1858	   In the absence of strong security mechanisms such as RPCSEC_GSSv3,
1859	   the challenge is how the source server and destination server
1860	   identify themselves to each other, especially in the presence of
1861	   multi-homed source and destination servers.  In a multi-homed
1862	   environment, the destination server might not contact the source
1863	   server from the same network address specified by the client in the
1864	   COPY_NOTIFY.  This can be overcome using the procedure described
1865	   below.

1867	   When the client sends the source server the COPY_NOTIFY operation,
1868	   the source server may reply to the client with a list of target
1869	   addresses, names, and/or URLs and assign them to the unique triple:
1870	   <source fh, user ID, destination address Y>.  If the destination uses
1871	   one of these target netlocs to contact the source server, the source
1872	   server will be able to uniquely identify the destination server, even
1873	   if the destination server does not connect from the address specified
1874	   by the client in COPY_NOTIFY.

1876	   For example, suppose the network topology is as shown in Figure 3.
1877	   If the source filehandle is 0x12345, the source server may respond to
1878	   a COPY_NOTIFY for destination 10.11.78.56 with the URLs:

1880	      nfs://10.11.78.18//_COPY/10.11.78.56/_FH/0x12345

1882	      nfs://192.168.33.18//_COPY/10.11.78.56/_FH/0x12345

1884	   The client will then send these URLs to the destination server in the
1885	   COPY operation.  Suppose that the 192.168.33.0/24 network is a high
1886	   speed network and the destination server decides to transfer the file
1887	   over this network.  If the destination contacts the source server
1888	   from 192.168.33.56 over this network using NFSv4.1, it does the
1889	   following:

1891	   COMPOUND  { PUTROOTFH, LOOKUP "_COPY" ; LOOKUP "10.11.78.56"; LOOKUP
1892	      "_FH" ; OPEN "0x12345" ; GETFH }

1894	   The source server will therefore know that these NFSv4.1 operations
1895	   are being issued by the destination server identified in the
1896	   COPY_NOTIFY.

1898	4.4.1.4.  Inter-Server Copy without ONC RPC and RPCSEC_GSSv3

1900	   The same techniques as Section 4.4.1.3, using unique URLs for each
1901	   destination server, can be used for other protocols (e.g.  HTTP [14]
1902	   and FTP [15]) as well.

1904	5.  Application Data Block Support

1906	   At the OS level, files are contained on disk blocks.  Applications
1907	   are also free to impose structure on the data contained in a file and
1908	   we can define an Application Data Block (ADB) to be such a structure.
1909	   From the application's viewpoint, it only wants to handle ADBs and
1910	   not raw bytes (see [17]).  An ADB is typically comprised of two
1911	   sections: a header and data.  The header describes the
1912	   characteristics of the block and can provide a means to detect
1913	   corruption in the data payload.  The data section is typically
1914	   initialized to all zeros.

1916	   The format of the header is application specific, but there are two
1917	   main components typically encountered:

1919	   1.  An ADB Number (ADBN), which allows the application to determine
1920	       which data block is being referenced.  The ADBN is a logical
1921	       block number and is useful when the client is not storing the
1922	       blocks in contiguous memory.

1924	   2.  Fields to describe the state of the ADB and a means to detect
1925	       block corruption.  For both pieces of data, a useful property is
1926	       that allowed values be unique in that if passed across the
1927	       network, corruption due to translation between big and little
1928	       endian architectures are detectable.  For example, 0xF0DEDEF0 has
1929	       the same bit pattern in both architectures.

1931	   Applications already impose structures on files [17] and detect
1932	   corruption in data blocks [18].  What they are not able to do is
1933	   efficiently transfer and store ADBs.  To initialize a file with ADBs,
1934	   the client must send the full ADB to the server and that must be
1935	   stored on the server.  When the application is initializing a file to
1936	   have the ADB structure, it could compress the ADBs to just the
1937	   information to necessary to later reconstruct the header portion of
1938	   the ADB when the contents are read back.  Using sparse file
1939	   techniques, the disk blocks described by would not be allocated.
1940	   Unlike sparse file techniques, there would be a small cost to store
1941	   the compressed header data.

1943	   In this section, we are going to define a generic framework for an
1944	   ADB, present one approach to detecting corruption in a given ADB
1945	   implementation, and describe the model for how the client and server
1946	   can support efficient initialization of ADBs, reading of ADB holes,
1947	   punching holes in ADBs, and space reservation.  Further, we need to
1948	   be able to extend this model to applications which do not support
1949	   ADBs, but wish to be able to handle sparse files, hole punching, and
1950	   space reservation.

1952	5.1.  Generic Framework

1954	   We want the representation of the ADB to be flexible enough to
1955	   support many different applications.  The most basic approach is no
1956	   imposition of a block at all, which means we are working with the raw
1957	   bytes.  Such an approach would be useful for storing holes, punching
1958	   holes, etc.  In more complex deployments, a server might be
1959	   supporting multiple applications, each with their own definition of
1960	   the ADB.  One might store the ADBN at the start of the block and then
1961	   have a guard pattern to detect corruption [19].  The next might store
1962	   the ADBN at an offset of 100 bytes within the block and have no guard
1963	   pattern at all.  The point is that existing applications might
1964	   already have well defined formats for their data blocks.

1966	   The guard pattern can be used to represent the state of the block, to
1967	   protect against corruption, or both.  Again, it needs to be able to
1968	   be placed anywhere within the ADB.

1970	   We need to be able to represent the starting offset of the block and
1971	   the size of the block.  Note that nothing prevents the application
1972	   from defining different sized blocks in a file.

1974	5.1.1.  Data Block Representation

1976	   struct app_data_block4 {
1977	           offset4         adb_offset;
1978	           length4         adb_block_size;
1979	           length4         adb_block_count;
1980	           length4         adb_reloff_blocknum;
1981	           count4          adb_block_num;
1982	           length4         adb_reloff_pattern;
1983	           opaque          adb_pattern<>;
1984	   };

1986	   The app_data_block4 structure captures the abstraction presented for
1987	   the ADB.  The additional fields present are to allow the transmission
1988	   of adb_block_count ADBs at one time.  We also use adb_block_num to
1989	   convey the ADBN of the first block in the sequence.  Each ADB will
1990	   contain the same adb_pattern string.

1992	   As both adb_block_num and adb_pattern are optional, if either
1993	   adb_reloff_pattern or adb_reloff_blocknum is set to NFS4_UINT64_MAX,
1994	   then the corresponding field is not set in any of the ADB.

1996	5.1.2.  Data Content

1998	   /*
1999	    * Use an enum such that we can extend new types.
2000	    */
2001	   enum data_content4 {
2002	           NFS4_CONTENT_DATA = 0,
2003	           NFS4_CONTENT_APP_BLOCK = 1,
2004	           NFS4_CONTENT_HOLE = 2
2005	   };

2007	   New operations might need to differentiate between wanting to access
2008	   data versus an ADB.  Also, future minor versions might want to
2009	   introduce new data formats.  This enumeration allows that to occur.

2011	5.2.  Operation 64: INITIALIZE

2013	   The server has no concept of the structure imposed by the
2014	   application.  It is only when the application writes to a section of
2015	   the file does order get imposed.  In order to detect corruption even
2016	   before the application utilizes the file, the application will want
2017	   to initialize a range of ADBs.  It uses the INITIALIZE operation to
2018	   do so.

2020	5.2.1.  ARGUMENT

2022	   /*
2023	    * We use data_content4 in case we wish to
2024	    * extend new types later. Note that we
2025	    * are explicitly disallowing data.
2026	    */
2027	   union initialize_arg4 switch (data_content4 content) {
2028	   case NFS4_CONTENT_APP_BLOCK:
2029	           app_data_block4 ia_adb;
2030	   case NFS4_CONTENT_HOLE:
2031	           length4         ia_hole_length;
2032	   default:
2033	           void;
2034	   };

2036	   struct INITIALIZE4args {
2037	           /* CURRENT_FH: file */
2038	           stateid4        ia_stateid;
2039	           stable_how4     ia_stable;
2040	           offset4         ia_offset;
2041	           initialize_arg4 ia_data<>;
2042	   };

2044	5.2.2.  RESULT

2046	   struct INITIALIZE4resok {
2047	           count4          ir_count;
2048	           stable_how4     ir_committed;
2049	           verifier4       ir_writeverf;
2050	           data_content4   ir_sparse;
2051	   };

2053	   union INITIALIZE4res switch (nfsstat4 status) {
2054	   case NFS4_OK:
2055	           INITIALIZE4resok        resok4;
2056	   default:
2057	           void;
2058	   };

2060	5.2.3.  DESCRIPTION

2062	   When the client invokes the INITIALIZE operation, it has two desired
2063	   results:

2065	   1.  The structure described by the app_data_block4 be imposed on the
2066	       file.

2068	   2.  The contents described by the app_data_block4 be sparse.

2070	   If the server supports the INITIALIZE operation, it still might not
2071	   support sparse files.  So if it receives the INITIALIZE operation,
2072	   then it MUST populate the contents of the file with the initialized
2073	   ADBs.  In other words, if the server supports INITIALIZE, then it
2074	   supports the concept of ADBs.  [[Comment.1: Do we want to support an
2075	   asynchronous INITIALIZE?  Do we have to? --TH]]

2077	   If the data was already initialized, There are two interesting
2078	   scenarios:

2080	   1.  The data blocks are allocated.

2082	   2.  Initializing in the middle of an existing ADB.

2084	   If the data blocks were already allocated, then the INITIALIZE is a
2085	   hole punch operation.  If INITIALIZE supports sparse files, then the
2086	   data blocks are to be deallocated.  If not, then the data blocks are
2087	   to be rewritten in the indicated ADB format.  [[Comment.2: Need to
2088	   document interaction between space reservation and hole punching?
2089	   --TH]]

2091	   Since the server has no knowledge of ADBs, it should not report
2092	   misaligned creation of ADBs.  Even while it can detect them, it
2093	   cannot disallow them, as the application might be in the process of
2094	   changing the size of the ADBs.  Thus the server must be prepared to
2095	   handle an INITIALIZE into an existing ADB.

2097	   This document does not mandate the manner in which the server stores
2098	   ADBs sparsely for a file.  It does assume that if ADBs are stored
2099	   sparsely, then the server can detect when an INITIALIZE arrives that
2100	   will force a new ADB to start inside an existing ADB.  For example,
2101	   assume that ADBi has a adb_block_size of 4k and that an INITIALIZE
2102	   starts 1k inside ADBi.  The server should [[Comment.3: Need to flesh
2103	   this out. --TH]]

2105	5.3.  Operation 65: READ_PLUS

2107	   If the client sends a READ operation, it is explicitly stating that
2108	   it is not supporting sparse files.  So if a READ occurs on a sparse
2109	   ADB, then the server must expand such ADBs to be raw bytes.  If a
2110	   READ occurs in the middle of an ADB, the server can only send back
2111	   bytes starting from that offset.

2113	   Such an operation is inefficient for transfer of sparse sections of
2114	   the file.  As such, READ is marked as OBSOLETE in NFSv4.2.  Instead,
2115	   a client should issue READ_PLUS.  Note that as the client has no a
2116	   priori knowledge of whether an ADB is present or not, it should
2117	   always use READ_PLUS.

2119	5.3.1.  ARGUMENT

2121	   struct READ_PLUS4args {
2122	           /* CURRENT_FH: file */
2123	           stateid4        rpa_stateid;
2124	           offset4         rpa_offset;
2125	           count4          rpa_count;
2126	   };

2128	5.3.2.  RESULT

2130	   union read_plus_content switch (data_content4 content) {
2131	   case NFS4_CONTENT_DATA:
2132	           opaque          rpc_data<>;
2133	   case NFS4_CONTENT_APP_BLOCK:
2134	           app_data_block4 rpc_block;
2135	   case NFS4_CONTENT_HOLE:
2136	           length4         rpc_hole_length;
2137	   default:
2138	           void;
2139	   };

2141	   /*
2142	    * Allow a return of an array of contents.
2143	    */
2144	   struct read_plus_res4 {
2145	           bool                    rpr_eof;
2146	           read_plus_content       rpr_contents<>;
2147	   };

2149	   union READ_PLUS4res switch (nfsstat4 status) {
2150	   case NFS4_OK:
2151	           read_plus_res4  resok4;
2152	   default:
2153	           void;
2154	   };

2156	5.3.3.  DESCRIPTION

2158	   Over the given range, READ_PLUS will return all data and ADBs found
2159	   as an array of read_plus_content.  It is possible to have consecutive
2160	   ADBs in the array as either different definitions of ADBs are present
2161	   or as the guard pattern changes.

2163	   Edge cases exist for ABDs which either begin before the rpa_offset
2164	   requested by the READ_PLUS or end after the rpa_count requested -
2165	   both of which may occur as not all applications which access the file
2166	   are aware of the main application imposing a format on the file
2167	   contents, i.e., tar, dd, cp, etc.  READ_PLUS MUST retrieve whole
2168	   ADBs, but it need not retrieve an entire sequences of ADBs.

2170	   The server MUST return a whole ADB because if it does not, it must
2171	   expand that partial ADB before it sends it to the client.  E.g., if
2172	   an ADB had a block size of 64k and the READ_PLUS was for 128k
2173	   starting at an offset of 32k inside the ADB, then the first 32k would
2174	   be converted to data.

2176	5.4.  pNFS Considerations

2178	   While this document does not mandate how sparse ADBs are recorded on
2179	   the server, it does make the assumption that such information is not
2180	   in the file.  I.e., the information is metadata.  As such, the
2181	   INITIALIZE operation is defined to be not supported by the DS - it
2182	   must be issued to the MDS.  But since the client must not assume a
2183	   priori whether a read is sparse or not, the READ_PLUS operation MUST
2184	   be supported by both the DS and the MDS.  I.e., the client might
2185	   impose on the MDS to asynchronously read the data from the DS.

2187	   Furthermore, each DS MUST not report to a client either a sparse ADB
2188	   or data which belongs to another DS.  One implication of this
2189	   requirement is that the app_data_block4's adb_block_size MUST be
2190	   either be the stripe width or the stripe width must be an even
2191	   multiple of it.

2193	   The second implication here is that the DS must be able to use the
2194	   Control Protocol to determine from the MDS where the sparse ADBs
2195	   occur.  [[Comment.4: Need to discuss what happens if after the file
2196	   is being written to and an INITIALIZE occurs? --TH]] Perhaps instead
2197	   of the DS pulling from the MDS, the MDS pushes to the DS?  Thus an
2198	   INITIALIZE causes a new push?  [[Comment.5: Still need to consider
2199	   race cases of the DS getting a WRITE and the MDS getting an
2200	   INITIALIZE. --TH]]

2202	5.5.  An Example of Detecting Corruption

2204	   In this section, we define an ADB format in which corruption can be
2205	   detected.  Note that this is just one possible format and means to
2206	   detect corruption.

2208	   Consider a very basic implementation of an operating system's disk
2209	   blocks.  A block is either data or it is an indirect block which
2210	   allows for files to be larger than one block.  It is desired to be
2211	   able to initialize a block.  Lastly, to quickly unlink a file, a
2212	   block can be marked invalid.  The contents remain intact - which
2213	   would enable this OS application to undelete a file.

2215	   The application defines 4k sized data blocks, with an 8 byte block
2216	   counter occurring at offset 0 in the block, and with the guard
2217	   pattern occurring at offset 8 inside the block.  Furthermore, the
2218	   guard pattern can take one of four states:

2220	   0xfeedface -   This is the FREE state and indicates that the ADB
2221	      format has been applied.

2223	   0xcafedead -   This is the DATA state and indicates that real data
2224	      has been written to this block.

2226	   0xe4e5c001 -   This is the INDIRECT state and indicates that the
2227	      block contains block counter numbers that are chained off of this
2228	      block.

2230	   0xba1ed4a3 -   This is the INVALID state and indicates that the block
2231	      contains data whose contents are garbage.

2233	   Finally, it also defines an 8 byte checksum [20] starting at byte 16
2234	   which applies to the remaining contents of the block.  If the state
2235	   is FREE, then that checksum is trivially zero.  As such, the
2236	   application has no need to transfer the checksum implicitly inside
2237	   the ADB - it need not make the transfer layer aware of the fact that
2238	   there is a checksum (see [18] for an example of checksums used to
2239	   detect corruption in application data blocks).

2241	   Corruption in each ADB can be detected thusly:

2243	   o  If the guard pattern is anything other than one of the allowed
2244	      values, including all zeros.

2246	   o  If the guard pattern is FREE and any other byte in the remainder
2247	      of the ADB is anything other than zero.

2249	   o  If the guard pattern is anything other than FREE, then if the
2250	      stored checksum does not match the computed checksum.

2252	   o  If the guard pattern is INDIRECT and one of the stored indirect
2253	      block numbers has a value greater than the number of ADBs in the
2254	      file.

2256	   o  If the guard pattern is INDIRECT and one of the stored indirect
2257	      block numbers is a duplicate of another stored indirect block
2258	      number.

2260	   As can be seen, the application can detect errors based on the
2261	   combination of the guard pattern state and the checksum.  But also,
2262	   the application can detect corruption based on the state and the
2263	   contents of the ADB.  This last point is important in validating the
2264	   minimum amount of data we incorporated into our generic framework.
2265	   I.e., the guard pattern is sufficient in allowing applications to
2266	   design their own corruption detection.

2268	   Finally, it is important to note that none of these corruption checks
2269	   occur in the transport layer.  The server and client components are
2270	   totally unaware of the file format and might report everything as
2271	   being transferred correctly even in the case the application detects
2272	   corruption.

2274	5.6.  Example of READ_PLUS

2276	   The hypothetical application presented in Section 5.5 can be used to
2277	   illustrate how READ_PLUS would return an array of results.  A file is
2278	   created and initialized with 100 4k ADBs in the FREE state:

2280	      INITIALIZE {0, 4k, 100, 0, 0, 8, 0xfeedface}

2282	   Further, assume the application writes a single ADB at 16k, changing
2283	   the guard pattern to 0xcafedead, we would then have in memory:

2285	      0 -> (16k - 1)   : 4k, 4, 0, 0, 8, 0xfeedface
2286	      16k -> (20k - 1) : 00 00 00 05 ca fe de ad XX XX ... XX XX
2287	      20k -> 400k      : 4k, 95, 0, 6, 0xfeedface

2289	   And when the client did a READ_PLUS of 64k at the start of the file,
2290	   it would get back a result of an ADB, some data, and a final ADB:

2292	      ADB {0, 4, 0, 0, 8, 0xfeedface}
2293	      data 4k
2294	      ADB {20k, 4k, 59, 0, 6, 0xfeedface}

2296	5.7.  Zero Filled Holes

2298	   As applications are free to define the structure of an ADB, it is
2299	   trivial to define an ADB which supports zero filled holes.  Such a
2300	   case would encompass the traditional definitions of a sparse file and
2301	   hole punching.  For example, to punch a 64k hole, starting at 100M,
2302	   into an existing file which has no ADB structure:

2304	      INITIALIZE {100M, 64k, 1, NFS4_UINT64_MAX,
2305	                  0, NFS4_UINT64_MAX, 0x0}

2307	6.  Space Reservation

2309	6.1.  Introduction

2311	   This section describes a set of operations that allow applications
2312	   such as hypervisors to reserve space for a file, report the amount of
2313	   actual disk space a file occupies and freeup the backing space of a
2314	   file when it is not required.

2316	   In virtualized environments, virtual disk files are often stored on
2317	   NFS mounted volumes.  Since virtual disk files represent the hard
2318	   disks of virtual machines, hypervisors often have to guarantee
2319	   certain properties for the file.

2321	   One such example is space reservation.  When a hypervisor creates a
2322	   virtual disk file, it often tries to preallocate the space for the
2323	   file so that there are no future allocation related errors during the
2324	   operation of the virtual machine.  Such errors prevent a virtual
2325	   machine from continuing execution and result in downtime.

2327	   Another useful feature would be the ability to report the number of
2328	   blocks that would be freed when a file is deleted.  Currently, NFS
2329	   reports two size attributes:

2331	   size  The logical file size of the file.

2333	   space_used  The size in bytes that the file occupies on disk

2335	   While these attributes are sufficient for space accounting in
2336	   traditional filesystems, they prove to be inadequate in modern
2337	   filesystems that support block sharing.  Having a way to tell the
2338	   number of blocks that would be freed if the file was deleted would be
2339	   useful to applications that wish to migrate files when a volume is
2340	   low on space.

2342	   Since virtual disks represent a hard drive in a virtual machine, a
2343	   virtual disk can be viewed as a filesystem within a file.  Since not
2344	   all blocks within a filesystem are in use, there is an opportunity to
2345	   reclaim blocks that are no longer in use.  A call to deallocate
2346	   blocks could result in better space efficiency.  Lesser space MAY be
2347	   consumed for backups after block deallocation.

2349	   We propose the following operations and attributes for the
2350	   aforementioned use cases:

2352	   space_reserved  This attribute specifies whether the blocks backing
2353	      the file have been preallocated.

2355	   space_freed  This attribute specifies the space freed when a file is
2356	      deleted, taking block sharing into consideration.

2358	   max_hole_punch  This attribute specifies the maximum sized hole that
2359	      can be punched on the filesystem.

2361	   HOLE_PUNCH  This operation zeroes and/or deallocates the blocks
2362	      backing a region of the file.

2364	6.2.  Use Cases

2366	6.2.1.  Space Reservation

2368	   Some applications require that once a file of a certain size is
2369	   created, writes to that file never fail with an out of space
2370	   condition.  One such example is that of a hypervisor writing to a
2371	   virtual disk.  An out of space condition while writing to virtual
2372	   disks would mean that the virtual machine would need to be frozen.

2374	   Currently, in order to achieve such a guarantee, applications zero
2375	   the entire file.  The initial zeroing allocates the backing blocks
2376	   and all subsequent writes are overwrites of already allocated blocks.
2377	   This approach is not only inefficient in terms of the amount of I/O
2378	   done, it is also not guaranteed to work on filesystems that are log
2379	   structured or deduplicated.  An efficient way of guaranteeing space
2380	   reservation would be beneficial to such applications.

2382	   If the space_reserved attribute is set on a file, it is guaranteed
2383	   that writes that do not grow the file will not fail with
2384	   NFSERR_NOSPC.

2386	6.2.2.  Space freed on deletes

2388	   Currently, files in NFS have two size attributes:

2390	   size  The logical file size of the file.

2392	   space_used  The size in bytes that the file occupies on disk.

2394	   While these attributes are sufficient for space accounting in
2395	   traditional filesystems, they prove to be inadequate in modern
2396	   filesystems that support block sharing.  In such filesystems,
2397	   multiple inodes can point to a single block with a block reference
2398	   count to guard against premature freeing.

2400	   If space_used of a file is interpreted to mean the size in bytes of
2401	   all disk blocks pointed to by the inode of the file, then shared
2402	   blocks get double counted, over-reporting the space utilization.
2403	   This also has the adverse effect that the deletion of a file with
2404	   shared blocks frees up less than space_used bytes.

2406	   On the other hand, if space_used is interpreted to mean the size in
2407	   bytes of those disk blocks unique to the inode of the file, then
2408	   shared blocks are not counted in any file, resulting in under-
2409	   reporting of the space utilization.

2411	   For example, two files A and B have 10 blocks each.  Let 6 of these
2412	   blocks be shared between them.  Thus, the combined space utilized by
2413	   the two files is 14 * BLOCK_SIZE bytes.  In the former case, the
2414	   combined space utilization of the two files would be reported as 20 *
2415	   BLOCK_SIZE.  However, deleting either would only result in 4 *
2416	   BLOCK_SIZE being freed.  Conversely, the latter interpretation would
2417	   report that the space utilization is only 8 * BLOCK_SIZE.

2419	   Adding another size attribute, space_freed, is helpful in solving
2420	   this problem. space_freed is the number of blocks that are allocated
2421	   to the given file that would be freed on its deletion.  In the
2422	   example, both A and B would report space_freed as 4 * BLOCK_SIZE and
2423	   space_used as 10 * BLOCK_SIZE.  If A is deleted, B will report
2424	   space_freed as 10 * BLOCK_SIZE as the deletion of B would result in
2425	   the deallocation of all 10 blocks.

2427	   The addition of this problem doesn't solve the problem of space being
2428	   over-reported.  However, over-reporting is better than under-
2429	   reporting.

2431	6.2.3.  Operations and attributes

2433	   In the sections that follow, one operation and three attributes are
2434	   defined that together provide the space management facilities
2435	   outlined earlier in the document.  The operation is intended to be
2436	   OPTIONAL and the attributes RECOMMENDED as defined in section 17 of
2437	   [2].

2439	6.2.4.  Attribute 77: space_reserved

2441	   The space_reserve attribute is a read/write attribute of type
2442	   boolean.  It is a per file attribute.  When the space_reserved
2443	   attribute is set via SETATTR, the server must ensure that there is
2444	   disk space to accommodate every byte in the file before it can return
2445	   success.  If the server cannot guarantee this, it must return
2446	   NFS4ERR_NOSPC.

2448	   If the client tries to grow a file which has the space_reserved
2449	   attribute set, the server must guarantee that there is disk space to
2450	   accommodate every byte in the file with the new size before it can
2451	   return success.  If the server cannot guarantee this, it must return
2452	   NFS4ERR_NOSPC.

2454	   It is not required that the server allocate the space to the file
2455	   before returning success.  The allocation can be deferred, however,
2456	   it must be guaranteed that it will not fail for lack of space.

2458	   The value of space_reserved can be obtained at any time through
2459	   GETATTR.

2461	   In order to avoid ambiguity, the space_reserve bit cannot be set
2462	   along with the size bit in SETATTR.  Increasing the size of a file
2463	   with space_reserve set will fail if space reservation cannot be
2464	   guaranteed for the new size.  If the file size is decreased, space
2465	   reservation is only guaranteed for the new size and the extra blocks
2466	   backing the file can be released.

2468	6.2.5.  Attribute 78: space_freed

2470	   space_freed gives the number of bytes freed if the file is deleted.
2471	   This attribute is read only and is of type length4.  It is a per file
2472	   attribute.

2474	6.2.6.  Attribute 79: max_hole_punch

2476	   max_hole_punch specifies the maximum size of a hole that the
2477	   HOLE_PUNCH operation can handle.  This attribute is read only and of
2478	   type length4.  It is a per filesystem attribute.  This attribute MUST
2479	   be implemented if HOLE_PUNCH is implemented.

2481	6.2.7.  Operation 64: HOLE_PUNCH - Zero and deallocate blocks backing
2482	        the file in the specified range.

2484	   WARNING: Most of this section is now obsolete.  Parts of it need to
2485	   be scavanged for the ADB discussion, but for the most part, it cannot
2486	   be trusted.

2488	6.2.7.1.  DESCRIPTION

2490	   Whenever a client wishes to deallocate the blocks backing a
2491	   particular region in the file, it calls the HOLE_PUNCH operation with
2492	   the current filehandle set to the filehandle of the file in question,
2493	   start offset and length in bytes of the region set in hpa_offset and
2494	   hpa_count respectively.  All further reads to this region MUST return
2495	   zeros until overwritten.  The filehandle specified must be that of a
2496	   regular file.

2498	   Situations may arise where hpa_offset and/or hpa_offset + hpa_count
2499	   will not be aligned to a boundary that the server does allocations/
2500	   deallocations in.  For most filesystems, this is the block size of
2501	   the file system.  In such a case, the server can deallocate as many
2502	   bytes as it can in the region.  The blocks that cannot be deallocated
2503	   MUST be zeroed.  Except for the block deallocation and maximum hole
2504	   punching capability, a HOLE_PUNCH operation is to be treated similar
2505	   to a write of zeroes.

2507	   The server is not required to complete deallocating the blocks
2508	   specified in the operation before returning.  It is acceptable to
2509	   have the deallocation be deferred.  In fact, HOLE_PUNCH is merely a
2510	   hint; it is valid for a server to return success without ever doing
2511	   anything towards deallocating the blocks backing the region
2512	   specified.  However, any future reads to the region MUST return
2513	   zeroes.

2515	   HOLE_PUNCH will result in the space_used attribute being decreased by
2516	   the number of bytes that were deallocated.  The space_freed attribute
2517	   may or may not decrease, depending on the support and whether the
2518	   blocks backing the specified range were shared or not.  The size
2519	   attribute will remain unchanged.

2521	   The HOLE_PUNCH operation MUST NOT change the space reservation
2522	   guarantee of the file.  While the server can deallocate the blocks
2523	   specified by hpa_offset and hpa_count, future writes to this region
2524	   MUST NOT fail with NFSERR_NOSPC.

2526	   The HOLE_PUNCH operation may fail for the following reasons (this is
2527	   a partial list):

2529	   NFS4ERR_NOTSUPP  The Hole punch operations are not supported by the
2530	      NFS server receiving this request.

2532	   NFS4ERR_DIR  The current filehandle is of type NF4DIR.

2534	   NFS4ERR_SYMLINK  The current filehandle is of type NF4LNK.

2536	   NFS4ERR_WRONG_TYPE  The current filehandle does not designate an
2537	      ordinary file.

2539	7.  Sparse Files

2541	   WARNING: Most of this section needs to be reworked because of the
2542	   work going on in the ADB section.

2544	7.1.  Introduction

2546	   A sparse file is a common way of representing a large file without
2547	   having to utilize all of the disk space for it.  Consequently, a
2548	   sparse file uses less physical space than its size indicates.  This
2549	   means the file contains 'holes', byte ranges within the file that
2550	   contain no data.  Most modern file systems support sparse files,
2551	   including most UNIX file systems and NTFS, but notably not Apple's
2552	   HFS+.  Common examples of sparse files include Virtual Machine (VM)
2553	   OS/disk images, database files, log files, and even checkpoint
2554	   recovery files most commonly used by the HPC community.

2556	   If an application reads a hole in a sparse file, the file system must
2557	   returns all zeros to the application.  For local data access there is
2558	   little penalty, but with NFS these zeroes must be transferred back to
2559	   the client.  If an application uses the NFS client to read data into
2560	   memory, this wastes time and bandwidth as the application waits for
2561	   the zeroes to be transferred.

2563	   A sparse file is typically created by initializing the file to be all
2564	   zeros - nothing is written to the data in the file, instead the hole
2565	   is recorded in the metadata for the file.  So a 8G disk image might
2566	   be represented initially by a couple hundred bits in the inode and
2567	   nothing on the disk.  If the VM then writes 100M to a file in the
2568	   middle of the image, there would now be two holes represented in the
2569	   metadata and 100M in the data.

2571	   Other applications want to initialize a file to patterns other than
2572	   zero.  The problem with initializing to zero is that it is often
2573	   difficult to distinguish a byte-range of initialized to all zeroes
2574	   from data corruption, since a pattern of zeroes is a probable pattern
2575	   for corruption.  Instead, some applications, such as database
2576	   management systems, use pattern consisting of bytes or words of non-
2577	   zero values.

2579	   Besides reading sparse files and initializing them, applications
2580	   might want to hole punch, which is the deallocation of the data
2581	   blocks which back a region of the file.  At such time, the affected
2582	   blocks are reinitialized to a pattern.

2584	   This section introduces a new operation to read patterns from a file,
2585	   READ_PLUS, and a new operation to both initialize patterns and to
2586	   punch pattern holes into a file, WRITE_PLUS.  READ_PLUS supports all
2587	   the features of READ but includes an extension to support sparse
2588	   pattern files.  READ_PLUS is guaranteed to perform no worse than
2589	   READ, and can dramatically improve performance with sparse files.
2590	   READ_PLUS does not depend on pNFS protocol features, but can be used
2591	   by pNFS to support sparse files.

2593	7.2.  Terminology

2595	   Regular file:  An object of file type NF4REG or NF4NAMEDATTR.

2597	   Sparse file:  A Regular file that contains one or more Holes.

2599	   Hole:  A byte range within a Sparse file that contains regions of all
2600	      zeroes.  For block-based file systems, this could also be an
2601	      unallocated region of the file.

2603	   Hole Threshold  The minimum length of a Hole as determined by the
2604	      server.  If a server chooses to define a Hole Threshold, then it
2605	      would not return hole information (nfs_readplusreshole) with a
2606	      hole_offset and hole_length that specify a range shorter than the
2607	      Hole Threshold.

2609	7.3.  Applications and Sparse Files

2611	   Applications may cause an NFS client to read holes in a file for
2612	   several reasons.  This section describes three different application
2613	   workloads that cause the NFS client to transfer data unnecessarily.
2614	   These workloads are simply examples, and there are probably many more
2615	   workloads that are negatively impacted by sparse files.

2617	   The first workload that can cause holes to be read is sequential
2618	   reads within a sparse file.  When this happens, the NFS client may
2619	   perform read requests ("readahead") into sections of the file not
2620	   explicitly requested by the application.  Since the NFS client cannot
2621	   differentiate between holes and non-holes, the NFS client may
2622	   prefetch empty sections of the file.

2624	   This workload is exemplified by Virtual Machines and their associated
2625	   file system images, e.g., VMware .vmdk files, which are large sparse
2626	   files encapsulating an entire operating system.  If a VM reads files
2627	   within the file system image, this will translate to sequential NFS
2628	   read requests into the much larger file system image file.  Since NFS
2629	   does not understand the internals of the file system image, it ends
2630	   up performing readahead file holes.

2632	   The second workload is generated by copying a file from a directory
2633	   in NFS to either the same NFS server, to another file system, e.g.,
2634	   another NFS or Samba server, to a local ext3 file system, or even a
2635	   network socket.  In this case, bandwidth and server resources are
2636	   wasted as the entire file is transferred from the NFS server to the
2637	   NFS client.  Once a byte range of the file has been transferred to
2638	   the client, it is up to the client application, e.g., rsync, cp, scp,
2639	   on how it writes the data to the target location.  For example, cp
2640	   supports sparse files and will not write all zero regions, whereas
2641	   scp does not support sparse files and will transfer every byte of the
2642	   file.

2644	   The third workload is generated by applications that do not utilize
2645	   the NFS client cache, but instead use direct I/O and manage cached
2646	   data independently, e.g., databases.  These applications may perform
2647	   whole file caching with sparse files, which would mean that even the
2648	   holes will be transferred to the clients and cached.

2650	7.4.  Overview of Sparse Files and NFSv4

2652	   This proposal seeks to provide sparse file support to the largest
2653	   number of NFS client and server implementations, and as such proposes
2654	   to add a new return code to the mandatory NFSv4.1 READ_PLUS operation
2655	   instead of proposing additions or extensions of new or existing
2656	   optional features (such as pNFS).

2658	   As well, this document seeks to ensure that the proposed extensions
2659	   are simple and do not transfer data between the client and server
2660	   unnecessarily.  For example, one possible way to implement sparse
2661	   file read support would be to have the client, on the first hole
2662	   encountered or at OPEN time, request a Data Region Map from the
2663	   server.  A Data Region Map would specify all zero and non-zero
2664	   regions in a file.  While this option seems simple, it is less useful
2665	   and can become inefficient and cumbersome for several reasons:

2667	   o  Data Region Maps can be large, and transferring them can reduce
2668	      overall read performance.  For example, VMware's .vmdk files can
2669	      have a file size of over 100 GBs and have a map well over several
2670	      MBs.

2672	   o  Data Region Maps can change frequently, and become invalidated on
2673	      every write to the file.  NFSv4 has a single change attribute,
2674	      which means any change to any region of a file will invalidate all
2675	      Data Region Maps.  This can result in the map being transferred
2676	      multiple times with each update to the file.  For example, a VM
2677	      that updates a config file in its file system image would
2678	      invalidate the Data Region Map not only for itself, but for all
2679	      other clients accessing the same file system image.

2681	   o  Data Region Maps do not handle all zero-filled sections of the
2682	      file, reducing the effectiveness of the solution.  While it may be
2683	      possible to modify the maps to handle zero-filled sections (at
2684	      possibly great effort to the server), it is almost impossible with
2685	      pNFS.  With pNFS, the owner of the Data Region Map is the metadata
2686	      server, which is not in the data path and has no knowledge of the
2687	      contents of a data region.

2689	   Another way to handle holes is compression, but this not ideal since
2690	   it requires all implementations to agree on a single compression
2691	   algorithm and requires a fair amount of computational overhead.

2693	   Note that supporting writing to a sparse file does not require
2694	   changes to the protocol.  Applications and/or NFS implementations can
2695	   choose to ignore WRITE requests of all zeroes to the NFS server
2696	   without consequence.

2698	7.5.  Operation 65: READ_PLUS

2700	   The section introduces a new read operation, named READ_PLUS, which
2701	   allows NFS clients to avoid reading holes in a sparse file.
2702	   READ_PLUS is guaranteed to perform no worse than READ, and can
2703	   dramatically improve performance with sparse files.

2705	   READ_PLUS supports all the features of the existing NFSv4.1 READ
2706	   operation [2] and adds a simple yet significant extension to the
2707	   format of its response.  The change allows the client to avoid
2708	   returning all zeroes from a file hole, wasting computational and
2709	   network resources and reducing performance.  READ_PLUS uses a new
2710	   result structure that tells the client that the result is all zeroes
2711	   AND the byte-range of the hole in which the request was made.
2712	   Returning the hole's byte-range, and only upon request, avoids
2713	   transferring large Data Region Maps that may be soon invalidated and
2714	   contain information about a file that may not even be read in its
2715	   entirely.

2717	   A new read operation is required due to NFSv4.1 minor versioning
2718	   rules that do not allow modification of existing operation's
2719	   arguments or results.  READ_PLUS is designed in such a way to allow
2720	   future extensions to the result structure.  The same approach could
2721	   be taken to extend the argument structure, but a good use case is
2722	   first required to make such a change.

2724	7.5.1.  ARGUMENT

2726	   struct READ_PLUS4args {
2727	           /* CURRENT_FH: file */
2728	           stateid4        rpa_stateid;
2729	           offset4         rpa_offset;
2730	           count4          rpa_count;
2731	   };

2733	7.5.2.  RESULT

2735	   union read_plus_content switch (data_content4 content) {
2736	   case NFS4_CONTENT_DATA:
2737	           opaque          rpc_data<>;
2738	   case NFS4_CONTENT_APP_BLOCK:
2739	           app_data_block4 rpc_block;
2740	   case NFS4_CONTENT_HOLE:
2741	           length4         rpc_hole_length;
2742	   default:
2743	           void;
2744	   };

2746	   /*
2747	    * Allow a return of an array of contents.
2748	    */
2749	   struct read_plus_res4 {
2750	           bool                    rpr_eof;
2751	           read_plus_content       rpr_contents<>;
2752	   };

2754	   union READ_PLUS4res switch (nfsstat4 status) {
2755	   case NFS4_OK:
2756	           read_plus_res4  resok4;
2757	   default:
2758	           void;
2759	   };

2761	7.5.3.  DESCRIPTION

2763	   The READ_PLUS operation is based upon the NFSv4.1 READ operation [2],
2764	   and similarly reads data from the regular file identified by the
2765	   current filehandle.

2767	   The client provides an offset of where the READ_PLUS is to start and
2768	   a count of how many bytes are to be read.  An offset of zero means to
2769	   read data starting at the beginning of the file.  If offset is
2770	   greater than or equal to the size of the file, the status NFS4_OK is
2771	   returned with nfs_readplusrestype4 set to READ_OK, data length set to
2772	   zero, and eof set to TRUE.  The READ_PLUS is subject to access
2773	   permissions checking.

2775	   If the client specifies a count value of zero, the READ_PLUS succeeds
2776	   and returns zero bytes of data, again subject to access permissions
2777	   checking.  In all situations, the server may choose to return fewer
2778	   bytes than specified by the client.  The client needs to check for
2779	   this condition and handle the condition appropriately.

2781	   If the client specifies an offset and count value that is entirely
2782	   contained within a hole of the file, the status NFS4_OK is returned
2783	   with nfs_readplusresok4 set to READ_HOLE, and if information is
2784	   available regarding the hole, a nfs_readplusreshole structure
2785	   containing the offset and range of the entire hole.  The
2786	   nfs_readplusreshole structure is considered valid until the file is
2787	   changed (detected via the change attribute).  The server MUST provide
2788	   the same semantics for nfs_readplusreshole as if the client read the
2789	   region and received zeroes; the implied holes contents lifetime MUST
2790	   be exactly the same as any other read data.

2792	   If the client specifies an offset and count value that begins in a
2793	   non-hole of the file but extends into hole the server should return a
2794	   short read with status NFS4_OK, nfs_readplusresok4 set to READ_OK,
2795	   and data length set to the number of bytes returned.  The client will
2796	   then issue another READ_PLUS for the remaining bytes, which the
2797	   server will respond with information about the hole in the file.

2799	   If the server knows that the requested byte range is into a hole of
2800	   the file, but has no further information regarding the hole, it
2801	   returns a nfs_readplusreshole structure with holeres4 set to
2802	   HOLE_NOINFO.

2804	   If hole information is available and can be returned to the client,
2805	   the server returns a nfs_readplusreshole structure with the value of
2806	   holeres4 to HOLE_INFO.  The values of hole_offset and hole_length
2807	   define the byte-range for the current hole in the file.  These values
2808	   represent the information known to the server and may describe a
2809	   byte-range smaller than the true size of the hole.

2811	   Except when special stateids are used, the stateid value for a
2812	   READ_PLUS request represents a value returned from a previous byte-
2813	   range lock or share reservation request or the stateid associated
2814	   with a delegation.  The stateid identifies the associated owners if
2815	   any and is used by the server to verify that the associated locks are
2816	   still valid (e.g., have not been revoked).

2818	   If the read ended at the end-of-file (formally, in a correctly formed
2819	   READ_PLUS operation, if offset + count is equal to the size of the
2820	   file), or the READ_PLUS operation extends beyond the size of the file
2821	   (if offset + count is greater than the size of the file), eof is
2822	   returned as TRUE; otherwise, it is FALSE.  A successful READ_PLUS of
2823	   an empty file will always return eof as TRUE.

2825	   If the current filehandle is not an ordinary file, an error will be
2826	   returned to the client.  In the case that the current filehandle
2827	   represents an object of type NF4DIR, NFS4ERR_ISDIR is returned.  If
2828	   the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is
2829	   returned.  In all other cases, NFS4ERR_WRONG_TYPE is returned.

2831	   For a READ_PLUS with a stateid value of all bits equal to zero, the
2832	   server MAY allow the READ_PLUS to be serviced subject to mandatory
2833	   byte-range locks or the current share deny modes for the file.  For a
2834	   READ_PLUS with a stateid value of all bits equal to one, the server
2835	   MAY allow READ_PLUS operations to bypass locking checks at the
2836	   server.

2838	   On success, the current filehandle retains its value.

2840	7.5.4.  IMPLEMENTATION

2842	   If the server returns a "short read" (i.e., fewer data than requested
2843	   and eof is set to FALSE), the client should send another READ_PLUS to
2844	   get the remaining data.  A server may return less data than requested
2845	   under several circumstances.  The file may have been truncated by
2846	   another client or perhaps on the server itself, changing the file
2847	   size from what the requesting client believes to be the case.  This
2848	   would reduce the actual amount of data available to the client.  It
2849	   is possible that the server reduce the transfer size and so return a
2850	   short read result.  Server resource exhaustion may also occur in a
2851	   short read.

2853	   If mandatory byte-range locking is in effect for the file, and if the
2854	   byte-range corresponding to the data to be read from the file is
2855	   WRITE_LT locked by an owner not associated with the stateid, the
2856	   server will return the NFS4ERR_LOCKED error.  The client should try
2857	   to get the appropriate READ_LT via the LOCK operation before re-
2858	   attempting the READ_PLUS.  When the READ_PLUS completes, the client
2859	   should release the byte-range lock via LOCKU.  In addition, the
2860	   server MUST return a nfs_readplusreshole structure with values of
2861	   hole_offset and hole_length that are within the owner's locked byte
2862	   range.

2864	   If another client has an OPEN_DELEGATE_WRITE delegation for the file
2865	   being read, the delegation must be recalled, and the operation cannot
2866	   proceed until that delegation is returned or revoked.  Except where
2867	   this happens very quickly, one or more NFS4ERR_DELAY errors will be
2868	   returned to requests made while the delegation remains outstanding.
2869	   Normally, delegations will not be recalled as a result of a READ_PLUS
2870	   operation since the recall will occur as a result of an earlier OPEN.
2871	   However, since it is possible for a READ_PLUS to be done with a
2872	   special stateid, the server needs to check for this case even though
2873	   the client should have done an OPEN previously.

2875	7.5.4.1.  Additional pNFS Implementation Information

2877	   With pNFS, the semantics of using READ_PLUS remains the same.  Any
2878	   data server MAY return a READ_HOLE result for a READ_PLUS request
2879	   that it receives.

2881	   When a data server chooses to return a READ_HOLE result, it has the
2882	   option of returning hole information for the data stored on that data
2883	   server (as defined by the data layout), but it MUST not return a
2884	   nfs_readplusreshole structure with a byte range that includes data
2885	   managed by another data server.

2887	   1.  Data servers that cannot determine hole information SHOULD return
2888	       HOLE_NOINFO.

2890	   2.  Data servers that can obtain hole information for the parts of
2891	       the file stored on that data server, the data server SHOULD
2892	       return HOLE_INFO and the byte range of the hole stored on that
2893	       data server.

2895	   A data server should do its best to return as much information about
2896	   a hole as is feasible without having to contact the metadata server.
2897	   If communication with the metadata server is required, then every
2898	   attempt should be taken to minimize the number of requests.

2900	   If mandatory locking is enforced, then the data server must also
2901	   ensure that to return only information for a Hole that is within the
2902	   owner's locked byte range.

2904	7.5.5.  READ_PLUS with Sparse Files Example

2906	   To see how the return value READ_HOLE will work, the following table
2907	   describes a sparse file.  For each byte range, the file contains
2908	   either non-zero data or a hole.  In addition, the server in this
2909	   example uses a hole threshold of 32K.

2911	                        +-------------+----------+
2912	                        | Byte-Range  | Contents |
2913	                        +-------------+----------+
2914	                        | 0-15999     | Hole     |
2915	                        | 16K-31999   | Non-Zero |
2916	                        | 32K-255999  | Hole     |
2917	                        | 256K-287999 | Non-Zero |
2918	                        | 288K-353999 | Hole     |
2919	                        | 354K-417999 | Non-Zero |
2920	                        +-------------+----------+

2922	                                  Table 3

2924	   Under the given circumstances, if a client was to read the file from
2925	   beginning to end with a max read size of 64K, the following will be
2926	   the result.  This assumes the client has already opened the file and
2927	   acquired a valid stateid and just needs to issue READ_PLUS requests.

2929	   1.  READ_PLUS(s, 0, 64K) --> NFS_OK, readplusrestype4 = READ_OK, eof
2930	       = false, data<>[32K].  Return a short read, as the last half of
2931	       the request was all zeroes.  Note that the first hole is read
2932	       back as all zeros as it is below the hole threshhold.

2934	   2.  READ_PLUS(s, 32K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE,
2935	       nfs_readplusreshole(HOLE_INFO)(32K, 224K).  The requested range
2936	       was all zeros, and the current hole begins at offset 32K and is
2937	       224K in length.

2939	   3.  READ_PLUS(s, 256K, 64K) --> NFS_OK, readplusrestype4 = READ_OK,
2940	       eof = false, data<>[32K].  Return a short read, as the last half
2941	       of the request was all zeroes.

2943	   4.  READ_PLUS(s, 288K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE,
2944	       nfs_readplusreshole(HOLE_INFO)(288K, 66K).

2946	   5.  READ_PLUS(s, 354K, 64K) --> NFS_OK, readplusrestype4 = READ_OK,
2947	       eof = true, data<>[64K].

2949	7.6.  Related Work

2951	   Solaris and ZFS support an extension to lseek(2) that allows
2952	   applications to discover holes in a file.  The values, SEEK_HOLE and
2953	   SEEK_DATA, allow clients to seek to the next hole or beginning of
2954	   data, respectively.

2956	   XFS supports the XFS_IOC_GETBMAP extended attribute, which returns
2957	   the Data Region Map for a file.  Clients can then use this
2958	   information to avoid reading holes in a file.

2960	   NTFS and CIFS support the FSCTL_SET_SPARSE attribute, which allows
2961	   applications to control whether empty regions of the file are
2962	   preallocated and filled in with zeros or simply left unallocated.

2964	7.7.  Other Proposed Designs

2966	7.7.1.  Multi-Data Server Hole Information

2968	   The current design prohibits pnfs data servers from returning hole
2969	   information for regions of a file that are not stored on that data
2970	   server.  Having data servers return information regarding other data
2971	   servers changes the fundamental principal that all metadata
2972	   information comes from the metadata server.

2974	   Here is a brief description if we did choose to support multi-data
2975	   server hole information:

2977	   For a data server that can obtain hole information for the entire
2978	   file without severe performance impact, it MAY return HOLE_INFO and
2979	   the byte range of the entire file hole.  When a pNFS client receives
2980	   a READ_HOLE result and a non-empty nfs_readplusreshole structure, it
2981	   MAY use this information in conjunction with a valid layout for the
2982	   file to determine the next data server for the next region of data
2983	   that is not in a hole.

2985	7.7.2.  Data Result Array

2987	   If a single read request contains one or more Holes with a length
2988	   greater than the Sparse Threshold, the current design would return
2989	   results indicating a short read to the client.  A client would then
2990	   send a series of read requests to the server to retrieve information
2991	   for the Holes and the remaining data.  To avoid turning a single read
2992	   request into several exchanges between the client and server, the
2993	   server may need to choose a relatively large Sparse Threshold in
2994	   order to decrease the number of short reads it creates.  A large
2995	   Sparse Threshold may miss many smaller holes, which in turn may
2996	   negate the benefits of sparse read support.

2998	   To avoid this situation, one option is to have the READ_PLUS
2999	   operation return information for multiple holes in a single return
3000	   value.  This would allow several small holes to be described in a
3001	   single read response without requiring multliple exchanges between
3002	   the client and server.

3004	   One important item to consider with returning an array of data chunks
3005	   is its impact on RDMA, which may use different block sizes on the
3006	   client and server (among other things).

3008	7.7.3.  User-Defined Sparse Mask

3010	   Add mask (instead of just zeroes).  Specified by server or client?

3012	7.7.4.  Allocated flag

3014	   A Hole on the server may be an allocated byte-range consisting of all
3015	   zeroes or may not be allocated at all.  To ensure this information is
3016	   properly communicated to the client, it may be beneficial to add a
3017	   'alloc' flag to the HOLE_INFO section of nfs_readplusreshole.  This
3018	   would allow an NFS client to copy a file from one file system to
3019	   another and have it more closely resemble the original.

3021	7.7.5.  Dense and Sparse pNFS File Layouts

3023	   The hole information returned form a data server must be understood
3024	   by pNFS clients using both Dense or Sparse file layout types.  Does
3025	   the current READ_PLUS return value work for both layout types?  Does
3026	   the data server know if it is using dense or sparse so that it can
3027	   return the correct hole_offset and hole_length values?

3029	8.  Security Considerations

3031	9.  IANA Considerations

3033	   This section uses terms that are defined in [21].

3035	10.  References

3037	10.1.  Normative References

3039	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
3040	         Levels", March 1997.

3042	   [2]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
3043	         (NFS) Version 4 Minor Version 1 Protocol", RFC 5661,
3044	         January 2010.

3046	   [3]   Haynes, T., "Network File System (NFS) Version 4 Minor Version
3047	         2 External Data Representation Standard (XDR) Description",
3048	         March 2011.

3050	   [4]   Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel
3051	         NFS (pNFS) Operations", RFC 5664, January 2010.

3053	   [5]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
3054	         Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
3055	         January 2005.

3057	   [6]   Williams, N., "Remote Procedure Call (RPC) Security Version 3",
3058	         draft-williams-rpcsecgssv3 (work in progress), 2008.

3060	   [7]   Shepler, S., Eisler, M., and D. Noveck, "Network File System
3061	         (NFS) Version 4 Minor Version 1 External Data Representation
3062	         Standard (XDR) Description", RFC 5662, January 2010.

3064	   [8]   Black, D., Glasgow, J., and S. Fridella, "Parallel NFS (pNFS)
3065	         Block/Volume Layout", RFC 5663, January 2010.

3067	   [9]   Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
3068	         Specification", RFC 2203, September 1997.

3070	10.2.  Informative References

3072	   [10]  Haynes, T. and D. Noveck, "Network File System (NFS) version 4
3073	         Protocol", draft-ietf-nfsv4-rfc3530bis-09 (Work In Progress),
3074	         March 2011.

3076	   [11]  Eisler, M., "XDR: External Data Representation Standard",
3077	         RFC 4506, May 2006.

3079	   [12]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3080	         "NSDB Protocol for Federated Filesystems",
3081	         draft-ietf-nfsv4-federated-fs-protocol (Work In Progress),
3082	         2010.

3084	   [13]  Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik,
3085	         "Administration Protocol for Federated Filesystems",
3086	         draft-ietf-nfsv4-federated-fs-admin (Work In Progress), 2010.

3088	   [14]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
3089	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
3090	         HTTP/1.1", RFC 2616, June 1999.

3092	   [15]  Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9,
3093	         RFC 959, October 1985.

3095	   [16]  Simpson, W., "PPP Challenge Handshake Authentication Protocol
3096	         (CHAP)", RFC 1994, August 1996.

3098	   [17]  Strohm, R., "Chapter 2, Data Blocks, Extents, and Segments, of
3099	         Oracle Database Concepts 11g Release 1 (11.1)", January 2011.

3101	   [18]  Ashdown, L., "Chapter 15, Validating Database Files and
3102	         Backups, of Oracle Database Backup and Recovery User's Guide
3103	         11g Release 1 (11.1)", August 2008.

3105	   [19]  McDougall, R. and J. Mauro, "Section 11.4.3, Detecting Memory
3106	         Corruption of Solaris Internals", 2007.

3108	   [20]  Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-
3109	         Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data
3110	         Corruption in the Storage Stack", Proceedings of the 6th USENIX
3111	         Symposium on File and Storage Technologies (FAST '08) , 2008.

3113	   [21]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
3114	         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

3116	   [22]  Nowicki, B., "NFS: Network File System Protocol specification",
3117	         RFC 1094, March 1989.

3119	   [23]  Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3
3120	         Protocol Specification", RFC 1813, June 1995.

3122	   [24]  Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
3123	         RFC 1833, August 1995.

3125	   [25]  Eisler, M., "NFS Version 2 and Version 3 Security Issues and
3126	         the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5",
3127	         RFC 2623, June 1999.

3129	   [26]  Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997.

3131	   [27]  Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
3132	         June 1999.

3134	   [28]  Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On-
3135	         line Database", RFC 3232, January 2002.

3137	   [29]  Linn, J., "The Kerberos Version 5 GSS-API Mechanism", RFC 1964,
3138	         June 1996.

3140	   [30]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame,
3141	         C., Eisler, M., and D. Noveck, "Network File System (NFS)
3142	         version 4 Protocol", RFC 3530, April 2003.

3144	Appendix A.  Acknowledgments

3146	   For the pNFS Access Permissions Check, the original draft was by
3147	   Sorin Faibish, David Black, Mike Eisler, and Jason Glasgow.  The work
3148	   was influenced by discussions with Benny Halevy and Bruce Fields.  A
3149	   review was done by Tom Haynes.

3151	   For the Sharing change attribute implementation details with NFSv4
3152	   clients, the original draft was by Trond Myklebust.

3154	   For the NFS Server-side Copy, the original draft was by James
3155	   Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul
3156	   Iyer.  Talpey co-authored an unpublished version of that document.
3157	   It was also was reviewed by a number of individuals: Pranoop Erasani,
3158	   Tom Haynes, Arthur Lent, Trond Myklebust, Dave Noveck, Theresa
3159	   Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani, and Nico
3160	   Williams.

3162	   For the NFS space reservation operations, the original draft was by
3163	   Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer.

3165	   For the sparse file support, the original draft was by Dean
3166	   Hildebrand and Marc Eshel.  Valuable input and advice was received
3167	   from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and
3168	   Richard Scheffenegger.

3170	Appendix B.  RFC Editor Notes

3172	   [RFC Editor: please remove this section prior to publishing this
3173	   document as an RFC]

3175	   [RFC Editor: prior to publishing this document as an RFC, please
3176	   replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
3177	   RFC number of this document]

3179	Author's Address

3181	   Thomas Haynes
3182	   NetApp
3183	   9110 E 66th St
3184	   Tulsa, OK  74133
3185	   USA

3187	   Phone: +1 918 307 1415
3188	   Email: thomas@netapp.com
3189	   URI:   http://www.tulsalabs.com