idnits 2.17.1 draft-haynes-nfsv4-minorversion2-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 5 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 07, 2011) is 4771 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 1933, but not defined == Unused Reference: '7' is defined on line 2763, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 2767, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 2771, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 2805, but no explicit reference was found in the text == Unused Reference: '19' is defined on line 2808, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 2811, but no explicit reference was found in the text == Unused Reference: '21' is defined on line 2814, but no explicit reference was found in the text == Unused Reference: '22' is defined on line 2818, but no explicit reference was found in the text == Unused Reference: '23' is defined on line 2820, but no explicit reference was found in the text == Unused Reference: '24' is defined on line 2823, but no explicit reference was found in the text == Unused Reference: '25' is defined on line 2826, but no explicit reference was found in the text == Unused Reference: '26' is defined on line 2829, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 5661 (ref. '2') (Obsoleted by RFC 8881) -- Possible downref: Non-RFC (?) normative reference: ref. '8' == Outdated reference: A later version (-35) exists of draft-ietf-nfsv4-rfc3530bis-09 -- Obsolete informational reference (is this intentional?): RFC 2616 (ref. '14') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 5226 (ref. '17') (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 3530 (ref. '26') (Obsoleted by RFC 7530) Summary: 2 errors (**), 0 flaws (~~), 18 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Haynes 3 Internet-Draft Editor 4 Intended status: Standards Track March 07, 2011 5 Expires: September 8, 2011 7 NFS Version 4 Minor Version 2 8 draft-haynes-nfsv4-minorversion2-00.txt 10 Abstract 12 This Internet-Draft describes NFS version 4 minor version two, 13 focusing mainly on the protocol extensions made from NFS version 4 14 minor version 0 and NFS version 4 minor version 1. Major extensions 15 introduced in NFS version 4 minor version two include: Server-side 16 Copy, Space Reservations, and Support for Sparse Files. 18 Requirements Language 20 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 21 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 22 document are to be interpreted as described in RFC 2119 [1]. 24 Status of this Memo 26 This Internet-Draft is submitted to IETF in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/ietf/1id-abstracts.txt. 42 The list of Internet-Draft Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html. 45 This Internet-Draft will expire on September 8, 2011. 47 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the BSD License. 61 This document may contain material from IETF Documents or IETF 62 Contributions published or made publicly available before November 63 10, 2008. The person(s) controlling the copyright in some of this 64 material may not have granted the IETF Trust the right to allow 65 modifications of such material outside the IETF Standards Process. 66 Without obtaining an adequate license from the person(s) controlling 67 the copyright in such materials, this document may not be modified 68 outside the IETF Standards Process, and derivative works of it may 69 not be created outside the IETF Standards Process, except to format 70 it for publication as an RFC or to translate it into languages other 71 than English. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 1.1. The NFS Version 4 Minor Version 2 Protocol . . . . . . . . 4 77 1.2. Scope of This Document . . . . . . . . . . . . . . . . . . 4 78 1.3. NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . . 4 79 1.4. Overview of NFSv4.2 Features . . . . . . . . . . . . . . . 4 80 1.5. Differences from NFSv4.1 . . . . . . . . . . . . . . . . . 4 81 2. pNFS Access Permissions Check . . . . . . . . . . . . . . . . 4 82 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 4 83 2.2. Changes to Operation 51: LAYOUTRETURN (RFC 5661) . . . . . 6 84 2.2.1. ARGUMENT (18.44.1) . . . . . . . . . . . . . . . . . . 7 85 2.2.2. RESULT (18.44.2) . . . . . . . . . . . . . . . . . . . 8 86 2.2.3. DESCRIPTION (18.44.3) . . . . . . . . . . . . . . . . 8 87 2.2.4. IMPLEMENTATION (18.44.4) . . . . . . . . . . . . . . . 9 88 2.3. Change to NFS4ERR_NXIO Usage . . . . . . . . . . . . . . . 11 89 2.4. Security Considerations . . . . . . . . . . . . . . . . . 11 90 2.5. IANA Considerations . . . . . . . . . . . . . . . . . . . 11 91 3. Sharing change attribute implementation details with NFSv4 92 clients . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 93 3.1. Abstract . . . . . . . . . . . . . . . . . . . . . . . . . 11 94 3.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12 95 3.3. Definition of the 'change_attr_type' per-file system 96 attribute . . . . . . . . . . . . . . . . . . . . . . . . 12 97 4. NFS Server-side Copy . . . . . . . . . . . . . . . . . . . . . 13 98 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 14 99 4.2. Protocol Overview . . . . . . . . . . . . . . . . . . . . 14 100 4.2.1. Intra-Server Copy . . . . . . . . . . . . . . . . . . 16 101 4.2.2. Inter-Server Copy . . . . . . . . . . . . . . . . . . 17 102 4.2.3. Server-to-Server Copy Protocol . . . . . . . . . . . . 20 103 4.3. Operations . . . . . . . . . . . . . . . . . . . . . . . . 22 104 4.3.1. netloc4 - Network Locations . . . . . . . . . . . . . 22 105 4.3.2. Operation 61: COPY_NOTIFY - Notify a source server 106 of a future copy . . . . . . . . . . . . . . . . . . . 23 107 4.3.3. Operation 62: COPY_REVOKE - Revoke a destination 108 server's copy privileges . . . . . . . . . . . . . . . 25 109 4.3.4. Operation 59: COPY - Initiate a server-side copy . . . 26 110 4.3.5. Operation 60: COPY_ABORT - Cancel a server-side 111 copy . . . . . . . . . . . . . . . . . . . . . . . . . 34 112 4.3.6. Operation 63: COPY_STATUS - Poll for status of a 113 server-side copy . . . . . . . . . . . . . . . . . . . 35 114 4.3.7. Operation 15: CB_COPY - Report results of a 115 server-side copy . . . . . . . . . . . . . . . . . . . 36 116 4.3.8. Copy Offload Stateids . . . . . . . . . . . . . . . . 37 117 4.4. Security Considerations . . . . . . . . . . . . . . . . . 38 118 4.4.1. Inter-Server Copy Security . . . . . . . . . . . . . . 38 119 4.5. IANA Considerations . . . . . . . . . . . . . . . . . . . 46 120 5. Space Reservation . . . . . . . . . . . . . . . . . . . . . . 46 121 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 46 122 5.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 47 123 5.2.1. Space Reservation . . . . . . . . . . . . . . . . . . 47 124 5.2.2. Space freed on deletes . . . . . . . . . . . . . . . . 48 125 5.2.3. Operations and attributes . . . . . . . . . . . . . . 49 126 5.2.4. Attribute 77: space_reserve . . . . . . . . . . . . . 49 127 5.2.5. Attribute 78: space_freed . . . . . . . . . . . . . . 49 128 5.2.6. Attribute 79: max_hole_punch . . . . . . . . . . . . . 49 129 5.2.7. Operation 64: HOLE_PUNCH - Zero and deallocate 130 blocks backing the file in the specified range. . . . 50 131 5.3. Security Considerations . . . . . . . . . . . . . . . . . 51 132 5.4. IANA Considerations . . . . . . . . . . . . . . . . . . . 51 133 6. Simple and Efficient Read Support for Sparse Files . . . . . . 51 134 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 51 135 6.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 52 136 6.3. Applications and Sparse Files . . . . . . . . . . . . . . 52 137 6.4. Overview of Sparse Files and NFSv4 . . . . . . . . . . . . 53 138 6.5. Operation 65: READPLUS . . . . . . . . . . . . . . . . . . 54 139 6.5.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . 55 140 6.5.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . 55 141 6.5.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 55 142 6.5.4. IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . 57 143 6.5.5. READPLUS with Sparse Files Example . . . . . . . . . . 58 144 6.6. Related Work . . . . . . . . . . . . . . . . . . . . . . . 59 145 6.7. Security Considerations . . . . . . . . . . . . . . . . . 59 146 6.8. IANA Considerations . . . . . . . . . . . . . . . . . . . 59 147 7. Security Considerations . . . . . . . . . . . . . . . . . . . 60 148 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 149 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 60 150 9.1. Normative References . . . . . . . . . . . . . . . . . . . 60 151 9.2. Informative References . . . . . . . . . . . . . . . . . . 60 152 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 62 153 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 62 154 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 62 156 1. Introduction 158 1.1. The NFS Version 4 Minor Version 2 Protocol 160 The NFS version 4 minor version 2 (NFSv4.2) protocol is the third 161 minor version of the NFS version 4 (NFSv4) protocol. The first minor 162 version, NFSv4.0, is described in [10] and the second minor version, 163 NFSv4.1, is described in [2]. It follows the guidelines for minor 164 versioning that are listed in Section 11 of RFC 3530bis. 166 As a minor version, NFSv4.2 is consistent with the overall goals for 167 NFSv4, but extends the protocol so as to better meet those goals, 168 based on experiences with NFSv4.1. In addition, NFSv4.2 has adopted 169 some additional goals, which motivate some of the major extensions in 170 NFSv4.2. 172 1.2. Scope of This Document 174 This document describes the NFSv4.2 protocol. With respect to 175 NFSv4.0 and NFSv4.1, this document does not: 177 o describe the NFSv4.0 or NFSv4.1 protocols, except where needed to 178 contrast with NFSv4.2. 180 o modify the specification of the NFSv4.0 or NFSv4.1 protocols. 182 o clarify the NFSv4.0 or NFSv4.1 protocols. 184 1.3. NFSv4.2 Goals 186 1.4. Overview of NFSv4.2 Features 188 1.5. Differences from NFSv4.1 190 2. pNFS Access Permissions Check 192 2.1. Introduction 194 Figure 1 shows the overall architecture of a Parallel NFS (pNFS) 195 system: 197 +-----------+ 198 |+-----------+ +-----------+ 199 ||+-----------+ | | 200 ||| | NFSv4.1 + pNFS | | 201 +|| Clients |<------------------------------>| MDS | 202 +| | | | 203 +-----------+ | | 204 ||| +-----------+ 205 ||| | 206 ||| | 207 ||| Storage +-----------+ | 208 ||| Protocol |+-----------+ | 209 ||+----------------||+-----------+ Control | 210 |+-----------------||| | Protocol | 211 +------------------+|| Storage |------------+ 212 +| Devices | 213 +-----------+ 215 Figure 1: pNFS Architecture 217 In this document, "storage device" is used as a general term for a 218 data server and/or storage server for the file, block, or object pNFS 219 layouts. 221 The current pNFS protocol [2] assumes that a client can access every 222 storage device (SD) included in a valid layout sent by the MDS 223 server, and provides no means to communicate client access failures 224 to the MDS. Access failures can impair pNFS performance scaling and 225 allow significant errors to go unreported. If the MDS can access all 226 the storage devices involved, but the client doesn't have sufficient 227 access rights to some storage devices, the client may choose to fall 228 back to accessing the file system using NFSV4.1 without pNFS support; 229 there are environments in which this behavior is undesirable, 230 especially if it occurs silently. An important example is addition 231 of a new storage device to which a large population of pNFS clients 232 (e.g., 1000s) lacks access permission. Layouts granted that use this 233 new device, result in client errors, requiring that all I/Os to that 234 new storage device be served by the MDS server. This creates a 235 performance and scalability bottleneck that may be difficult to 236 detect based on I/O behavior because the other storage devices are 237 functioning correctly. 239 The preferable approach to this scenario is to report the access 240 failures before any client attempts to issue any I/Os that can only 241 be serviced by the MDS server. This makes the problem explicit, 242 rather than forcing the MDS, or a system administrator, to diagnose 243 the performance problem caused by client I/O using NFS instead of 244 pNFS. There are limits to this approach because complex mount 245 structures may prevent a client from detecting this situation at 246 mount time, but at a minimum, access problems involving the root of 247 the mount structure can be detected. 249 The most suitable time for the client to report inability to access a 250 storage device is at mount time, but this is not always possible. If 251 the application uses a special tag or a switch to the mount command 252 (e.g., -pnfs) and syscall to declare its intention to use pNFS, at 253 the client, the client can check for both pNFS support and device 254 accessibility. 256 This document introduces an error reporting mechanism that is an 257 extension to the return of a pNFS layout; a pNFS client MAY use this 258 mechanism to inform the MDS that the layout is being returned because 259 one or more data servers are not accessible to the client. Error 260 reporting at I/O time is not affected because the result of an 261 inaccessible data server may not be an I/O error if a subsequent 262 retry of the operation via the MDS is successful. 264 There is a related problem scenario involving an MDS that cannot 265 access some storage devices and hence cannot perform I/Os on behalf 266 of a client. In the case of the block layout [3] if the MDS lacks 267 access to a storage device (e.g., LUN), MDS implementations generally 268 do not export any filesystem using that storage device. In contrast 269 to the block layout, MDSs for the file [2] and object [4] layouts may 270 be unable to access the storage devices that store data for an 271 exported filesystem. This enables a file or object layout MDS to 272 provide layouts that contain client-inaccessible devices. For the 273 specific case of adding a new storage device to a filesystem, MDS 274 issuance of test I/Os to the newly added device before using it in 275 layouts avoids this problem scenario, but does not cover loss of 276 access to existing storage devices at a later time. 278 In addition, [2] states that a client can write through or read from 279 the MDS, even if it has a layout; this assumes that the MDS can 280 access all the storage devices. This document makes that assumed 281 access an explicit requirement. 283 2.2. Changes to Operation 51: LAYOUTRETURN (RFC 5661) 285 The existing LAYOUTRETURN operation is extended by introducing three 286 new layout return types that correspond to the existing types: 288 o LAYOUT4_RET_REC_FILE_NO_ACCESS at file scope; 290 o LAYOUT4_RET_REC_FSID_NO_ACCESS at fsid scope; and 291 o LAYOUT4_RET_REC_ALL_NO_ACCESS at client scope. 293 The first return type returns the layout for an individual file and 294 informs the server that the reason for the return is a storage device 295 connectivity problem. The second return type performs that function 296 for all layouts held by the client for the filesystem that 297 corresponds to the current filehandle used for the LAYOUTRETURN 298 operation. The third return type performs that function for all 299 layouts held by the client; it is intended for situations in which a 300 device is shared across all or most of the filesystems from a server 301 for which the client has layouts. 303 2.2.1. ARGUMENT (18.44.1) 305 The ARGUMENT specification of the LAYOUTRETURN operation in section 306 18.44.1 of [2] is replaced by the following XDR code [11]: 308 /* Constants used for new LAYOUTRETURN and CB_LAYOUTRECALL */ 309 const LAYOUT4_RET_REC_FILE = 1; 310 const LAYOUT4_RET_REC_FSID = 2; 311 const LAYOUT4_RET_REC_ALL = 3; 312 const LAYOUT4_RET_REC_FILE_NO_ACCESS = 4; 313 const LAYOUT4_RET_REC_FSID_NO_ACESSS = 5; 314 const LAYOUT4_RET_REC_ALL_NO_ACCESS = 6; 316 enum layoutreturn_type4 { 317 LAYOUTRETURN4_FILE = LAYOUT4_RET_REC_FILE, 318 LAYOUTRETURN4_FSID = LAYOUT4_RET_REC_FSID, 319 LAYOUTRETURN4_ALL = LAYOUT4_RET_REC_ALL, 320 LAYOUTRETURN4_FILE_NO_ACCESS = LAYOUT4_RET_REC_FILE_NO_ACCESS, 321 LAYOUTRETURN4_FSID_NO_ACCESS = LAYOUT4_RET_REC_FSID_NO_ACCESS, 322 LAYOUTRETURN4_ALL_NO_ACCESS = LAYOUT4_RET_REC_ALL_NO_ACCESS 323 }; 325 struct layoutreturn_file4 { 326 offset4 lrf_offset; 327 length4 lrf_length; 328 stateid4 lrf_stateid; 329 /* layouttype4 specific data */ 330 opaque lrf_body<>; 331 }; 333 struct layoutreturn_device_no_access4 { 334 deviceid4 lrdna_deviceid; 335 nfsstat4 lrdna_status; 336 }; 338 struct layoutreturn_file_no_access4 { 339 offset4 lrfna_offset; 340 length4 lrfna_length; 341 stateid4 lrfna_stateid; 342 deviceid4 lrfna_deviceid; 343 nfsstat4 lrfna_status; 344 /* layouttype4 specific data */ 345 opaque lrfna_body<>; 346 }; 348 union layoutreturn4 switch(layoutreturn_type4 lr_returntype) { 349 case LAYOUTRETURN4_FILE: 350 layoutreturn_file4 lr_layout; 351 case LAYOUTRETURN4_FILE_NO_ACCESS: 352 layoutreturn_file_no_access4 lr_layout_na; 353 case LAYOUTRETURN4_FSID_NO_ACCESS: 354 case LAYOUTRETURN4_ALL_NO_ACCESS: 355 layoutreturn_device_no_access4 lr_device<>; 356 default: 357 void; 358 }; 360 2.2.2. RESULT (18.44.2) 362 The RESULT of the LAYOUTRETURN operation is unchanged; see section 363 18.44.2 of [2] 365 2.2.3. DESCRIPTION (18.44.3) 367 The following text is added to the end of the LAYOUTRETURN operation 368 DESCRIPTION in section 18.44.3 of [2] 370 There are three NO_ACCESS layoutreturn_type4 values that indicate a 371 persistent lack of client ability to access storage device(s), 372 LAYOUT4_RET_REC_FILE_NO_ACCESS, LAYOUT4_RET_REC_FSID_NO_ACCESS and 373 LAYOUT4_RET_REC_ALL_NO_ACCESS. A client uses these return types to 374 return a layout (or portion thereof) for a file, return all layouts 375 for an FSID or all layouts from that server held by the client, and 376 in all cases to inform the server that the reason for the return is 377 the client's inability to access one or more storage devices. The 378 same stateid may be used or the client MAY force use of a new stateid 379 in order to report a new error. 381 An NFS error value (nfsstat4) is included for each device for these 382 three NO_ACCESS return types to provide additional information on the 383 cause. The allowed NFS errors are those that are valid for an NFS 384 READ or WRITE operation, and NFS4ERR_NXIO is also allowed to report 385 an inaccessible device. The server SHOULD log the received NFS error 386 value, but that error value does not affect server processing of the 387 LAYOUTRETURN operation. All uses of the NO_ACCESS layout return 388 types that report NFS errors SHOULD be logged by the client. 390 The client MAY use the new LAYOUT4_RET_REC_FILE_NO_ACCESS when only 391 one file, or a small number of files are affected. If the access 392 problem affects multiple devices, the client may use multiple file 393 layout return operations; each return operation SHOULD return a 394 layout extent obtained from the device for which an error is being 395 reported. In contrast, both LAYOUT4_RET_REC_FSID_NO_ACCESS and 396 LAYOUT4_RET_REC_ALL_NO_ACCESS include an array of 397 pairs to enable a single operation to report errors for multiple 398 devices in a single operation. 400 2.2.4. IMPLEMENTATION (18.44.4) 402 The following text is added to the end of the LAYOUTRETURN operation 403 IMPLEMENTATION in section 18.4.4 of [2] 405 A client that expects to use pNFS for a mounted filesystem SHOULD 406 check for pNFS support at mount time. This check SHOULD be performed 407 by sending a GETDEVICELIST operation, followed by layout-type- 408 specific checks for accessibility of each storage device returned by 409 GETDEVICELIST. If the NFS server does not support pNFS, the 410 GETDEVICELIST operation will be rejected with an NFS4ERR_NOTSUPP 411 error; in this situation it is up to the client to determine whether 412 it is acceptable to proceed with NFS-only access. 414 Clients are expected to tolerate transient storage device errors, and 415 hence clients SHOULD NOT use the NO_ACCESS layout return types for 416 device access problems that may be transient. The methods by which a 417 client decides whether an access problem is transient vs. persistent 418 are implementation-specific, but may include retrying I/Os to a data 419 server under appropriate conditions. 421 When an I/O fails because a storage device is inaccessible, the 422 client SHOULD retry the failed I/O via the MDS. In this situation, 423 before retrying the I/O, the client SHOULD return the layout, or 424 inaccessible portion thereof, and SHOULD indicate which storage 425 device or devices was or were inaccessible. If the client does not 426 do this, the MDS may issue a layout recall callback in order to 427 perform the retried I/O. 429 Backwards compatibility may require a client to perform two layout 430 return operations to deal with servers that don't implement the 431 NO_ACCESS layoutreturn_type4 values and hence respond to them with 432 NFS4ERR_INVAL. In this situation, the client SHOULD perform an 433 ordinary layout return operation and remember that the new layout 434 NO_ACCESS return types are not to be used with that server. 436 The metadata server (MDS) SHOULD NOT use storage devices in pNFS 437 layouts that are not accessible to the MDS. At a minimum, the server 438 SHOULD check its own storage device accessibility before exporting a 439 filesystem that supports pNFS and when the device configuration for 440 such an exported filesystem is changed (e.g., to add a storage 441 device). 443 If an MDS is aware that a storage device is inaccessible to a client, 444 the MDS SHOULD NOT include that storage device in any pNFS layouts 445 sent to that client. An MDS SHOULD react to a client return of 446 inaccessible layouts by not using the inaccessible storage devices in 447 layouts for that client, but the MDS is not required to indefinitely 448 retain per-client storage device inaccessibility information. An MDS 449 is also not required to automatically reinstate use of a previously 450 inaccessible storage device; administrative intervention may be 451 required instead. 453 A client MAY perform I/O via the MDS even when the client holds a 454 layout that covers the I/O; servers MUST support this client 455 behavior, and MAY recall layouts as needed to complete I/Os. 457 2.2.4.1. Storage Device Error Mapping (18.44.4.1, new) 459 The following text is added as new subsection 18.44.4.1 of [2] 461 An NFS error value is sent for each device that the client reports as 462 inaccessible via a NO_ACCESS layout return type. In general: 464 o If the client is unable to access the storage device, NFS4ERR_NXIO 465 SHOULD be used. 467 o If the client is able to access the storage device, but permission 468 is denied, NFS4ERR_ACCESS SHOULD be used. 470 Beyond these two rules, error code usage is layout-type specific: 472 o For the pNFS file layout, an indicative NFS error from a failed 473 read or write operation on the inaccessible device SHOULD be used. 475 o For the pNFS block layout, other errors from the Storage Protocol 476 SHOULD be mapped to NFS4ERR_IO. In addition, the client SHOULD 477 log information about the actual storage protocol error (e.g., 478 SCSI status and sense data), but that information is not sent to 479 the pNFS server. 481 o For the pNFS object layout, occurrences of the object error types 482 specified in [4] SHOULD be mapped to the following NFS errors for 483 use in LAYOUTRETURN: 485 * PNFS_OSD_ERR_EIO -> NFS4ERR_IO 487 * PNFS_OSD_ERR_NOT_FOUND -> NFS4ERR_STALE 489 * PNFS_OSD_ERR_NO_SPACE -> NFS4ERR_NOSPC 491 * PNFS_OSD_ERR_BAD_CRED -> NFS4ERR_INVAL 493 * PNFS_OSD_ERR_NO_ACCESS -> NFS4ERR_ACCESS 495 * PNFS_OSD_ERR_UNREACHABLE -> NFS4ERR_NXIO 497 * PNFS_OSD_ERR_RESOURCE -> NFS4ERR_SERVERFAULT 499 The LAYOUTRETURN NO_ACCESS return types are used for persistent 500 device errors; they do not replace other error reporting mechanisms 501 that also apply to transient errors (e.g., as specified for the 502 object layout in [4]). 504 2.3. Change to NFS4ERR_NXIO Usage 506 This document specifies that the NFS4ERR_NXIO error SHOULD be used to 507 report an inaccessible storage device. To enable that usage, this 508 document updates [2] to allow use of the currently obsolete 509 NFS4ERR_NXIO error in the ARGUMENT of LAYOUTRETURN; NFS4ERR_NXIO 510 remains obsolete for all other uses of NFS errors. 512 2.4. Security Considerations 514 This section adds a small extension to the NFSv4 LAYOUTRETURN 515 operation. The NFS and pNFS security considerations in [2], [3], and 516 [4] apply to the extended LAYOUTRETURN operation. 518 2.5. IANA Considerations 520 There are no additional IANA considerations in this section beyond 521 the IANA Considerations covered in [2] 523 3. Sharing change attribute implementation details with NFSv4 clients 525 3.1. Abstract 527 This document describes an extension to the NFSv4 protocol that 528 allows the server to share information about the implementation of 529 its change attribute with the client. The aim is to improve the 530 client's ability to determine the order in which parallel updates to 531 the same file were processed. 533 3.2. Introduction 535 Although both the NFSv4 [10] and NFSv4.1 protocol [2], define the 536 change attribute as being mandatory to implement, there is little in 537 the way of guidance. The only feature that is mandated by the spec 538 is that the value must change whenever the file data or metadata 539 change. 541 While this allows for a wide range of implementations, it also leaves 542 the client with a conundrum: how does it determine which is the most 543 recent value for the change attribute in a case where several RPC 544 calls have been issued in parallel? In other words if two COMPOUNDs, 545 both containing WRITE and GETATTR requests for the same file, have 546 been issued in parallel, how does the client determine which of the 547 two change attribute values returned in the replies to the GETATTR 548 requests corresponds to the most recent state of the file? In some 549 cases, the only recourse may be to send another COMPOUND containing a 550 third GETATTR that is fully serialised with the first two. 552 In order to avoid this kind of inefficiency, we propose a method to 553 allow the server to share details about how the change attribute is 554 expected to evolve, so that the client may immediately determine 555 which, out of the several change attribute values returned by the 556 server, is the most recent. 558 3.3. Definition of the 'change_attr_type' per-file system attribute 560 enum change_attr_typeinfo = { 561 NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR = 0, 562 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER = 1, 563 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2, 564 NFS4_CHANGE_TYPE_IS_TIME_METADATA = 3, 565 NFS4_CHANGE_TYPE_IS_UNDEFINED = 4 566 }; 568 +------------------+----+---------------------------+-----+ 569 | Name | Id | Data Type | Acc | 570 +------------------+----+---------------------------+-----+ 571 | change_attr_type | XX | enum change_attr_typeinfo | R | 572 +------------------+----+---------------------------+-----+ 574 The proposed solution is to enable the NFS server to provide 575 additional information about how it expects the change attribute 576 value to evolve after the file data or metadata has changed. To do 577 so, we define a new recommended attribute, 'change_attr_type', which 578 may take values from enum change_attr_typeinfo as follows: 580 NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR: The change attribute value MUST 581 monotonically increase for every atomic change to the file 582 attributes, data or directory contents. 584 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER: The change attribute value MUST 585 be incremented by one unit for every atomic change to the file 586 attributes, data or directory contents. This property is 587 preserved when writing to pNFS data servers. 589 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS: The change attribute 590 value MUST be incremented by one unit for every atomic change to 591 the file attributes, data or directory contents. In the case 592 where the client is writing to pNFS data servers, the number of 593 increments is not guaranteed to exactly match the number of 594 writes. 596 NFS4_CHANGE_TYPE_IS_TIME_METADATA: The change attribute is 597 implemented as suggested in the NFSv4 spec [10] in terms of the 598 time_metadata attribute. 600 NFS4_CHANGE_TYPE_IS_UNDEFINED: The change attribute does not take 601 values that fit into any of these categories. 603 If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR, 604 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or 605 NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at 606 the very least that the change attribute is monotonically increasing, 607 which is sufficient to resolve the question of which value is the 608 most recent. 610 If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then 611 by inspecting the value of the 'time_delta' attribute it additionally 612 has the option of detecting rogue server implementations that use 613 time_metadata in violation of the spec. 615 Finally, if the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it 616 has the ability to predict what the resulting change attribute value 617 should be after a COMPOUND containing a SETATTR, WRITE, or CREATE. 618 This again allows it to detect changes made in parallel by another 619 client. The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits 620 the same, but only if the client is not doing pNFS WRITEs. 622 4. NFS Server-side Copy 623 4.1. Introduction 625 This document describes a server-side copy feature for the NFS 626 protocol. 628 The server-side copy feature provides a mechanism for the NFS client 629 to perform a file copy on the server without the data being 630 transmitted back and forth over the network. 632 Without this feature, an NFS client copies data from one location to 633 another by reading the data from the server over the network, and 634 then writing the data back over the network to the server. Using 635 this server-side copy operation, the client is able to instruct the 636 server to copy the data locally without the data being sent back and 637 forth over the network unnecessarily. 639 In general, this feature is useful whenever data is copied from one 640 location to another on the server. It is particularly useful when 641 copying the contents of a file from a backup. Backup-versions of a 642 file are copied for a number of reasons, including restoring and 643 cloning data. 645 If the source object and destination object are on different file 646 servers, the file servers will communicate with one another to 647 perform the copy operation. The server-to-server protocol by which 648 this is accomplished is not defined in this document. 650 4.2. Protocol Overview 652 The server-side copy offload operations support both intra-server and 653 inter-server file copies. An intra-server copy is a copy in which 654 the source file and destination file reside on the same server. In 655 an inter-server copy, the source file and destination file are on 656 different servers. In both cases, the copy may be performed 657 synchronously or asynchronously. 659 Throughout the rest of this document, we refer to the NFS server 660 containing the source file as the "source server" and the NFS server 661 to which the file is transferred as the "destination server". In the 662 case of an intra-server copy, the source server and destination 663 server are the same server. Therefore in the context of an intra- 664 server copy, the terms source server and destination server refer to 665 the single server performing the copy. 667 The operations described below are designed to copy files. Other 668 file system objects can be copied by building on these operations or 669 using other techniques. For example if the user wishes to copy a 670 directory, the client can synthesize a directory copy by first 671 creating the destination directory and then copying the source 672 directory's files to the new destination directory. If the user 673 wishes to copy a namespace junction [12] [13], the client can use the 674 ONC RPC Federated Filesystem protocol [13] to perform the copy. 675 Specifically the client can determine the source junction's 676 attributes using the FEDFS_LOOKUP_FSN procedure and create a 677 duplicate junction using the FEDFS_CREATE_JUNCTION procedure. 679 For the inter-server copy protocol, the operations are defined to be 680 compatible with a server-to-server copy protocol in which the 681 destination server reads the file data from the source server. This 682 model in which the file data is pulled from the source by the 683 destination has a number of advantages over a model in which the 684 source pushes the file data to the destination. The advantages of 685 the pull model include: 687 o The pull model only requires a remote server (i.e. the destination 688 server) to be granted read access. A push model requires a remote 689 server (i.e. the source server) to be granted write access, which 690 is more privileged. 692 o The pull model allows the destination server to stop reading if it 693 has run out of space. In a push model, the destination server 694 must flow control the source server in this situation. 696 o The pull model allows the destination server to easily flow 697 control the data stream by adjusting the size of its read 698 operations. In a push model, the destination server does not have 699 this ability. The source server in a push model is capable of 700 writing chunks larger than the destination server has requested in 701 attributes and session parameters. In theory, the destination 702 server could perform a "short" write in this situation, but this 703 approach is known to behave poorly in practice. 705 The following operations are provided to support server-side copy: 707 COPY_NOTIFY: For inter-server copies, the client sends this 708 operation to the source server to notify it of a future file copy 709 from a given destination server for the given user. 711 COPY_REVOKE: Also for inter-server copies, the client sends this 712 operation to the source server to revoke permission to copy a file 713 for the given user. 715 COPY: Used by the client to request a file copy. 717 COPY_ABORT: Used by the client to abort an asynchronous file copy. 719 COPY_STATUS: Used by the client to poll the status of an 720 asynchronous file copy. 722 CB_COPY: Used by the destination server to report the results of an 723 asynchronous file copy to the client. 725 These operations are described in detail in Section 4.3. This 726 section provides an overview of how these operations are used to 727 perform server-side copies. 729 4.2.1. Intra-Server Copy 731 To copy a file on a single server, the client uses a COPY operation. 732 The server may respond to the copy operation with the final results 733 of the copy or it may perform the copy asynchronously and deliver the 734 results using a CB_COPY operation callback. If the copy is performed 735 asynchronously, the client may poll the status of the copy using 736 COPY_STATUS or cancel the copy using COPY_ABORT. 738 A synchronous intra-server copy is shown in Figure 2. In this 739 example, the NFS server chooses to perform the copy synchronously. 740 The copy operation is completed, either successfully or 741 unsuccessfully, before the server replies to the client's request. 742 The server's reply contains the final result of the operation. 744 Client Server 745 + + 746 | | 747 |--- COPY ---------------------------->| Client requests 748 |<------------------------------------/| a file copy 749 | | 750 | | 752 Figure 2: A synchronous intra-server copy. 754 An asynchronous intra-server copy is shown in Figure 3. In this 755 example, the NFS server performs the copy asynchronously. The 756 server's reply to the copy request indicates that the copy operation 757 was initiated and the final result will be delivered at a later time. 758 The server's reply also contains a copy stateid. The client may use 759 this copy stateid to poll for status information (as shown) or to 760 cancel the copy using a COPY_ABORT. When the server completes the 761 copy, the server performs a callback to the client and reports the 762 results. 764 Client Server 765 + + 766 | | 767 |--- COPY ---------------------------->| Client requests 768 |<------------------------------------/| a file copy 769 | | 770 | | 771 |--- COPY_STATUS --------------------->| Client may poll 772 |<------------------------------------/| for status 773 | | 774 | . | Multiple COPY_STATUS 775 | . | operations may be sent. 776 | . | 777 | | 778 |<-- CB_COPY --------------------------| Server reports results 779 |\------------------------------------>| 780 | | 782 Figure 3: An asynchronous intra-server copy. 784 4.2.2. Inter-Server Copy 786 A copy may also be performed between two servers. The copy protocol 787 is designed to accommodate a variety of network topologies. As shown 788 in Figure 4, the client and servers may be connected by multiple 789 networks. In particular, the servers may be connected by a 790 specialized, high speed network (network 192.168.33.0/24 in the 791 diagram) that does not include the client. The protocol allows the 792 client to setup the copy between the servers (over network 793 10.11.78.0/24 in the diagram) and for the servers to communicate on 794 the high speed network if they choose to do so. 796 192.168.33.0/24 797 +-------------------------------------+ 798 | | 799 | | 800 | 192.168.33.18 | 192.168.33.56 801 +-------+------+ +------+------+ 802 | Source | | Destination | 803 +-------+------+ +------+------+ 804 | 10.11.78.18 | 10.11.78.56 805 | | 806 | | 807 | 10.11.78.0/24 | 808 +------------------+------------------+ 809 | 810 | 811 | 10.11.78.243 812 +-----+-----+ 813 | Client | 814 +-----------+ 816 Figure 4: An example inter-server network topology. 818 For an inter-server copy, the client notifies the source server that 819 a file will be copied by the destination server using a COPY_NOTIFY 820 operation. The client then initiates the copy by sending the COPY 821 operation to the destination server. The destination server may 822 perform the copy synchronously or asynchronously. 824 A synchronous inter-server copy is shown in Figure 5. In this case, 825 the destination server chooses to perform the copy before responding 826 to the client's COPY request. 828 An asynchronous copy is shown in Figure 6. In this case, the 829 destination server chooses to respond to the client's COPY request 830 immediately and then perform the copy asynchronously. 832 Client Source Destination 833 + + + 834 | | | 835 |--- COPY_NOTIFY --->| | 836 |<------------------/| | 837 | | | 838 | | | 839 |--- COPY ---------------------------->| 840 | | | 841 | | | 842 | |<----- read -----| 843 | |\--------------->| 844 | | | 845 | | . | Multiple reads may 846 | | . | be necessary 847 | | . | 848 | | | 849 | | | 850 |<------------------------------------/| Destination replies 851 | | | to COPY 853 Figure 5: A synchronous inter-server copy. 855 Client Source Destination 856 + + + 857 | | | 858 |--- COPY_NOTIFY --->| | 859 |<------------------/| | 860 | | | 861 | | | 862 |--- COPY ---------------------------->| 863 |<------------------------------------/| 864 | | | 865 | | | 866 | |<----- read -----| 867 | |\--------------->| 868 | | | 869 | | . | Multiple reads may 870 | | . | be necessary 871 | | . | 872 | | | 873 | | | 874 |--- COPY_STATUS --------------------->| Client may poll 875 |<------------------------------------/| for status 876 | | | 877 | | . | Multiple COPY_STATUS 878 | | . | operations may be sent 879 | | . | 880 | | | 881 | | | 882 | | | 883 |<-- CB_COPY --------------------------| Destination reports 884 |\------------------------------------>| results 885 | | | 887 Figure 6: An asynchronous inter-server copy. 889 4.2.3. Server-to-Server Copy Protocol 891 During an inter-server copy, the destination server reads the file 892 data from the source server. The source server and destination 893 server are not required to use a specific protocol to transfer the 894 file data. The choice of what protocol to use is ultimately the 895 destination server's decision. 897 4.2.3.1. Using NFSv4.x as a Server-to-Server Copy Protocol 899 The destination server MAY use standard NFSv4.x (where x >= 1) to 900 read the data from the source server. If NFSv4.x is used for the 901 server-to-server copy protocol, the destination server can use the 902 filehandle contained in the COPY request with standard NFSv4.x 903 operations to read data from the source server. Specifically, the 904 destination server may use the NFSv4.x OPEN operation's CLAIM_FH 905 facility to open the file being copied and obtain an open stateid. 906 Using the stateid, the destination server may then use NFSv4.x READ 907 operations to read the file. 909 4.2.3.2. Using an alternative Server-to-Server Copy Protocol 911 In a homogeneous environment, the source and destination servers 912 might be able to perform the file copy extremely efficiently using 913 specialized protocols. For example the source and destination 914 servers might be two nodes sharing a common file system format for 915 the source and destination file systems. Thus the source and 916 destination are in an ideal position to efficiently render the image 917 of the source file to the destination file by replicating the file 918 system formats at the block level. Another possibility is that the 919 source and destination might be two nodes sharing a common storage 920 area network, and thus there is no need to copy any data at all, and 921 instead ownership of the file and its contents might simply be re- 922 assigned to the destination. To allow for these possibilities, the 923 destination server is allowed to use a server-to-server copy protocol 924 of its choice. 926 In a heterogeneous environment, using a protocol other than NFSv4.x 927 (e.g. HTTP [14] or FTP [15]) presents some challenges. In 928 particular, the destination server is presented with the challenge of 929 accessing the source file given only an NFSv4.x filehandle. 931 One option for protocols that identify source files with path names 932 is to use an ASCII hexadecimal representation of the source 933 filehandle as the file name. 935 Another option for the source server is to use URLs to direct the 936 destination server to a specialized service. For example, the 937 response to COPY_NOTIFY could include the URL 938 ftp://s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII 939 hexadecimal representation of the source filehandle. When the 940 destination server receives the source server's URL, it would use 941 "_FH/0x12345" as the file name to pass to the FTP server listening on 942 port 9999 of s1.example.com. On port 9999 there would be a special 943 instance of the FTP service that understands how to convert NFS 944 filehandles to an open file descriptor (in many operating systems, 945 this would require a new system call, one which is the inverse of the 946 makefh() function that the pre-NFSv4 MOUNT service needs). 948 Authenticating and identifying the destination server to the source 949 server is also a challenge. Recommendations for how to accomplish 950 this are given in Section 4.4.1.2.4 and Section 4.4.1.4. 952 4.3. Operations 954 In the sections that follow, several operations are defined that 955 together provide the server-side copy feature. These operations are 956 intended to be OPTIONAL operations as defined in section 17 of [2]. 957 The COPY_NOTIFY, COPY_REVOKE, COPY, COPY_ABORT, and COPY_STATUS 958 operations are designed to be sent within an NFSv4 COMPOUND 959 procedure. The CB_COPY operation is designed to be sent within an 960 NFSv4 CB_COMPOUND procedure. 962 Each operation is performed in the context of the user identified by 963 the ONC RPC credential of its containing COMPOUND or CB_COMPOUND 964 request. For example, a COPY_ABORT operation issued by a given user 965 indicates that a specified COPY operation initiated by the same user 966 be canceled. Therefore a COPY_ABORT MUST NOT interfere with a copy 967 of the same file initiated by another user. 969 An NFS server MAY allow an administrative user to monitor or cancel 970 copy operations using an implementation specific interface. 972 4.3.1. netloc4 - Network Locations 974 The server-side copy operations specify network locations using the 975 netloc4 data type shown below: 977 enum netloc_type4 { 978 NL4_NAME = 0, 979 NL4_URL = 1, 980 NL4_NETADDR = 2 981 }; 982 union netloc4 switch (netloc_type4 nl_type) { 983 case NL4_NAME: utf8str_cis nl_name; 984 case NL4_URL: utf8str_cis nl_url; 985 case NL4_NETADDR: netaddr4 nl_addr; 986 }; 988 If the netloc4 is of type NL4_NAME, the nl_name field MUST be 989 specified as a UTF-8 string. The nl_name is expected to be resolved 990 to a network address via DNS, LDAP, NIS, /etc/hosts, or some other 991 means. If the netloc4 is of type NL4_URL, a server URL [5] 992 appropriate for the server-to-server copy operation is specified as a 993 UTF-8 string. If the netloc4 is of type NL4_NETADDR, the nl_addr 994 field MUST contain a valid netaddr4 as defined in Section 3.3.9 of 995 [2]. 997 When netloc4 values are used for an inter-server copy as shown in 998 Figure 4, their values may be evaluated on the source server, 999 destination server, and client. The network environment in which 1000 these systems operate should be configured so that the netloc4 values 1001 are interpreted as intended on each system. 1003 4.3.2. Operation 61: COPY_NOTIFY - Notify a source server of a future 1004 copy 1006 4.3.2.1. ARGUMENT 1008 struct COPY_NOTIFY4args { 1009 /* CURRENT_FH: source file */ 1010 netloc4 cna_destination_server; 1011 }; 1013 4.3.2.2. RESULT 1015 union COPY_NOTIFY4res switch (nfsstat4 cnr_status) { 1016 case NFS4_OK: 1017 nfstime4 cnr_lease_time; 1018 netloc4 cnr_source_server<>; 1019 default: 1020 void; 1021 }; 1023 4.3.2.3. DESCRIPTION 1025 This operation is used for an inter-server copy. A client sends this 1026 operation in a COMPOUND request to the source server to authorize a 1027 destination server identified by cna_destination_server to read the 1028 file specified by CURRENT_FH on behalf of the given user. 1030 The cna_destination_server MUST be specified using the netloc4 1031 network location format. The server is not required to resolve the 1032 cna_destination_server address before completing this operation. 1034 If this operation succeeds, the source server will allow the 1035 cna_destination_server to copy the specified file on behalf of the 1036 given user. If COPY_NOTIFY succeeds, the destination server is 1037 granted permission to read the file as long as both of the following 1038 conditions are met: 1040 o The destination server begins reading the source file before the 1041 cnr_lease_time expires. If the cnr_lease_time expires while the 1042 destination server is still reading the source file, the 1043 destination server is allowed to finish reading the file. 1045 o The client has not issued a COPY_REVOKE for the same combination 1046 of user, filehandle, and destination server. 1048 The cnr_lease_time is chosen by the source server. A cnr_lease_time 1049 of 0 (zero) indicates an infinite lease. To renew the copy lease 1050 time the client should resend the same copy notification request to 1051 the source server. 1053 To avoid the need for synchronized clocks, copy lease times are 1054 granted by the server as a time delta. However, there is a 1055 requirement that the client and server clocks do not drift 1056 excessively over the duration of the lease. There is also the issue 1057 of propagation delay across the network which could easily be several 1058 hundred milliseconds as well as the possibility that requests will be 1059 lost and need to be retransmitted. 1061 To take propagation delay into account, the client should subtract it 1062 from copy lease times (e.g. if the client estimates the one-way 1063 propagation delay as 200 milliseconds, then it can assume that the 1064 lease is already 200 milliseconds old when it gets it). In addition, 1065 it will take another 200 milliseconds to get a response back to the 1066 server. So the client must send a lease renewal or send the copy 1067 offload request to the cna_destination_server at least 400 1068 milliseconds before the copy lease would expire. If the propagation 1069 delay varies over the life of the lease (e.g. the client is on a 1070 mobile host), the client will need to continuously subtract the 1071 increase in propagation delay from the copy lease times. 1073 The server's copy lease period configuration should take into account 1074 the network distance of the clients that will be accessing the 1075 server's resources. It is expected that the lease period will take 1076 into account the network propagation delays and other network delay 1077 factors for the client population. Since the protocol does not allow 1078 for an automatic method to determine an appropriate copy lease 1079 period, the server's administrator may have to tune the copy lease 1080 period. 1082 A successful response will also contain a list of names, addresses, 1083 and URLs called cnr_source_server, on which the source is willing to 1084 accept connections from the destination. These might not be 1085 reachable from the client and might be located on networks to which 1086 the client has no connection. 1088 If the client wishes to perform an inter-server copy, the client MUST 1089 send a COPY_NOTIFY to the source server. Therefore, the source 1090 server MUST support COPY_NOTIFY. 1092 For a copy only involving one server (the source and destination are 1093 on the same server), this operation is unnecessary. 1095 The COPY_NOTIFY operation may fail for the following reasons (this is 1096 a partial list): 1098 NFS4ERR_MOVED: The file system which contains the source file is not 1099 present on the source server. The client can determine the 1100 correct location and reissue the operation with the correct 1101 location. 1103 NFS4ERR_NOTSUPP: The copy offload operation is not supported by the 1104 NFS server receiving this request. 1106 NFS4ERR_WRONGSEC: The security mechanism being used by the client 1107 does not match the server's security policy. 1109 4.3.3. Operation 62: COPY_REVOKE - Revoke a destination server's copy 1110 privileges 1112 4.3.3.1. ARGUMENT 1114 struct COPY_REVOKE4args { 1115 /* CURRENT_FH: source file */ 1116 netloc4 cra_destination_server; 1117 }; 1119 4.3.3.2. RESULT 1121 struct COPY_REVOKE4res { 1122 nfsstat4 crr_status; 1123 }; 1125 4.3.3.3. DESCRIPTION 1127 This operation is used for an inter-server copy. A client sends this 1128 operation in a COMPOUND request to the source server to revoke the 1129 authorization of a destination server identified by 1130 cra_destination_server from reading the file specified by CURRENT_FH 1131 on behalf of given user. If the cra_destination_server has already 1132 begun copying the file, a successful return from this operation 1133 indicates that further access will be prevented. 1135 The cra_destination_server MUST be specified using the netloc4 1136 network location format. The server is not required to resolve the 1137 cra_destination_server address before completing this operation. 1139 The COPY_REVOKE operation is useful in situations in which the source 1140 server granted a very long or infinite lease on the destination 1141 server's ability to read the source file and all copy operations on 1142 the source file have been completed. 1144 For a copy only involving one server (the source and destination are 1145 on the same server), this operation is unnecessary. 1147 If the server supports COPY_NOTIFY, the server is REQUIRED to support 1148 the COPY_REVOKE operation. 1150 The COPY_REVOKE operation may fail for the following reasons (this is 1151 a partial list): 1153 NFS4ERR_MOVED: The file system which contains the source file is not 1154 present on the source server. The client can determine the 1155 correct location and reissue the operation with the correct 1156 location. 1158 NFS4ERR_NOTSUPP: The copy offload operation is not supported by the 1159 NFS server receiving this request. 1161 4.3.4. Operation 59: COPY - Initiate a server-side copy 1163 4.3.4.1. ARGUMENT 1165 const COPY4_GUARDED = 0x00000001; 1166 const COPY4_METADATA = 0x00000002; 1168 struct COPY4args { 1169 /* SAVED_FH: source file */ 1170 /* CURRENT_FH: destination file or */ 1171 /* directory */ 1172 offset4 ca_src_offset; 1173 offset4 ca_dst_offset; 1174 length4 ca_count; 1175 uint32_t ca_flags; 1176 component4 ca_destination; 1177 netloc4 ca_source_server<>; 1178 }; 1180 4.3.4.2. RESULT 1182 union COPY4res switch (nfsstat4 cr_status) { 1183 /* CURRENT_FH: destination file */ 1185 case NFS4_OK: 1186 stateid4 cr_callback_id<1>; 1187 default: 1188 length4 cr_bytes_copied; 1189 }; 1191 4.3.4.3. DESCRIPTION 1193 The COPY operation is used for both intra- and inter-server copies. 1194 In both cases, the COPY is always sent from the client to the 1195 destination server of the file copy. The COPY operation requests 1196 that a file be copied from the location specified by the SAVED_FH 1197 value to the location specified by the combination of CURRENT_FH and 1198 ca_destination. 1200 The SAVED_FH must be a regular file. If SAVED_FH is not a regular 1201 file, the operation MUST fail and return NFS4ERR_WRONG_TYPE. 1203 In order to set SAVED_FH to the source file handle, the compound 1204 procedure requesting the COPY will include a sub-sequence of 1205 operations such as 1207 PUTFH source-fh 1208 SAVEFH 1210 If the request is for a server-to-server copy, the source-fh is a 1211 filehandle from the source server and the compound procedure is being 1212 executed on the destination server. In this case, the source-fh is a 1213 foreign filehandle on the server receiving the COPY request. If 1214 either PUTFH or SAVEFH checked the validity of the filehandle, the 1215 operation would likely fail and return NFS4ERR_STALE. 1217 In order to avoid this problem, the minor version incorporating the 1218 COPY operations will need to make a few small changes in the handling 1219 of existing operations. If a server supports the server-to-server 1220 COPY feature, a PUTFH followed by a SAVEFH MUST NOT return 1221 NFS4ERR_STALE for either operation. These restrictions do not pose 1222 substantial difficulties for servers. The CURRENT_FH and SAVED_FH 1223 may be validated in the context of the operation referencing them and 1224 an NFS4ERR_STALE error returned for an invalid file handle at that 1225 point. 1227 The CURRENT_FH and ca_destination together specify the destination of 1228 the copy operation. If ca_destination is of 0 (zero) length, then 1229 CURRENT_FH specifies the target file. In this case, CURRENT_FH MUST 1230 be a regular file and not a directory. If ca_destination is not of 0 1231 (zero) length, the ca_destination argument specifies the file name to 1232 which the data will be copied within the directory identified by 1233 CURRENT_FH. In this case, CURRENT_FH MUST be a directory and not a 1234 regular file. 1236 If the file named by ca_destination does not exist and the operation 1237 completes successfully, the file will be visible in the file system 1238 namespace. If the file does not exist and the operation fails, the 1239 file MAY be visible in the file system namespace depending on when 1240 the failure occurs and on the implementation of the NFS server 1241 receiving the COPY operation. If the ca_destination name cannot be 1242 created in the destination file system (due to file name 1243 restrictions, such as case or length), the operation MUST fail. 1245 The ca_src_offset is the offset within the source file from which the 1246 data will be read, the ca_dst_offset is the offset within the 1247 destination file to which the data will be written, and the ca_count 1248 is the number of bytes that will be copied. An offset of 0 (zero) 1249 specifies the start of the file. A count of 0 (zero) requests that 1250 all bytes from ca_src_offset through EOF be copied to the 1251 destination. If concurrent modifications to the source file overlap 1252 with the source file region being copied, the data copied may include 1253 all, some, or none of the modifications. The client can use standard 1254 NFS operations (e.g. OPEN with OPEN4_SHARE_DENY_WRITE or mandatory 1255 byte range locks) to protect against concurrent modifications if the 1256 client is concerned about this. If the source file's end of file is 1257 being modified in parallel with a copy that specifies a count of 0 1258 (zero) bytes, the amount of data copied is implementation dependent 1259 (clients may guard against this case by specifying a non-zero count 1260 value or preventing modification of the source file as mentioned 1261 above). 1263 If the source offset or the source offset plus count is greater than 1264 or equal to the size of the source file, the operation will fail with 1265 NFS4ERR_INVAL. The destination offset or destination offset plus 1266 count may be greater than the size of the destination file. This 1267 allows for the client to issue parallel copies to implement 1268 operations such as "cat file1 file2 file3 file4 > dest". 1270 If the destination file is created as a result of this command, the 1271 destination file's size will be equal to the number of bytes 1272 successfully copied. If the destination file already existed, the 1273 destination file's size may increase as a result of this operation 1274 (e.g. if ca_dst_offset plus ca_count is greater than the 1275 destination's initial size). 1277 If the ca_source_server list is specified, then this is an inter- 1278 server copy operation and the source file is on a remote server. The 1279 client is expected to have previously issued a successful COPY_NOTIFY 1280 request to the remote source server. The ca_source_server list 1281 SHOULD be the same as the COPY_NOTIFY response's cnr_source_server 1282 list. If the client includes the entries from the COPY_NOTIFY 1283 response's cnr_source_server list in the ca_source_server list, the 1284 source server can indicate a specific copy protocol for the 1285 destination server to use by returning a URL, which specifies both a 1286 protocol service and server name. Server-to-server copy protocol 1287 considerations are described in Section 4.2.3 and Section 4.4.1. 1289 The ca_flags argument allows the copy operation to be customized in 1290 the following ways using the guarded flag (COPY4_GUARDED) and the 1291 metadata flag (COPY4_METADATA). 1293 [NOTE: Earlier versions of this document defined a 1294 COPY4_SPACE_RESERVED flag for controlling space reservations on the 1295 destination file. This flag has been removed with the expectation 1296 that the space_reserve attribute defined in XXX_TDH_XXX will be 1297 adopted.] 1299 If the guarded flag is set and the destination exists on the server, 1300 this operation will fail with NFS4ERR_EXIST. 1302 If the guarded flag is not set and the destination exists on the 1303 server, the behavior is implementation dependent. 1305 If the metadata flag is set and the client is requesting a whole file 1306 copy (i.e. ca_count is 0 (zero)), a subset of the destination file's 1307 attributes MUST be the same as the source file's corresponding 1308 attributes and a subset of the destination file's attributes SHOULD 1309 be the same as the source file's corresponding attributes. The 1310 attributes in the MUST and SHOULD copy subsets will be defined for 1311 each NFS version. 1313 For NFSv4.1, Table 1 and Table 2 list the REQUIRED and RECOMMENDED 1314 attributes respectively. A "MUST" in the "Copy to destination file?" 1315 column indicates that the attribute is part of the MUST copy set. A 1316 "SHOULD" in the "Copy to destination file?" column indicates that the 1317 attribute is part of the SHOULD copy set. 1319 +--------------------+----+---------------------------+ 1320 | Name | Id | Copy to destination file? | 1321 +--------------------+----+---------------------------+ 1322 | supported_attrs | 0 | no | 1323 | type | 1 | MUST | 1324 | fh_expire_type | 2 | no | 1325 | change | 3 | SHOULD | 1326 | size | 4 | MUST | 1327 | link_support | 5 | no | 1328 | symlink_support | 6 | no | 1329 | named_attr | 7 | no | 1330 | fsid | 8 | no | 1331 | unique_handles | 9 | no | 1332 | lease_time | 10 | no | 1333 | rdattr_error | 11 | no | 1334 | filehandle | 19 | no | 1335 | suppattr_exclcreat | 75 | no | 1336 +--------------------+----+---------------------------+ 1338 Table 1 1340 +--------------------+----+---------------------------+ 1341 | Name | Id | Copy to destination file? | 1342 +--------------------+----+---------------------------+ 1343 | acl | 12 | MUST | 1344 | aclsupport | 13 | no | 1345 | archive | 14 | no | 1346 | cansettime | 15 | no | 1347 | case_insensitive | 16 | no | 1348 | case_preserving | 17 | no | 1349 | change_policy | 60 | no | 1350 | chown_restricted | 18 | MUST | 1351 | dacl | 58 | MUST | 1352 | dir_notif_delay | 56 | no | 1353 | dirent_notif_delay | 57 | no | 1354 | fileid | 20 | no | 1355 | files_avail | 21 | no | 1356 | files_free | 22 | no | 1357 | files_total | 23 | no | 1358 | fs_charset_cap | 76 | no | 1359 | fs_layout_type | 62 | no | 1360 | fs_locations | 24 | no | 1361 | fs_locations_info | 67 | no | 1362 | fs_status | 61 | no | 1363 | hidden | 25 | MUST | 1364 | homogeneous | 26 | no | 1365 | layout_alignment | 66 | no | 1366 | layout_blksize | 65 | no | 1367 | layout_hint | 63 | no | 1368 | layout_type | 64 | no | 1369 | maxfilesize | 27 | no | 1370 | maxlink | 28 | no | 1371 | maxname | 29 | no | 1372 | maxread | 30 | no | 1373 | maxwrite | 31 | no | 1374 | mdsthreshold | 68 | no | 1375 | mimetype | 32 | MUST | 1376 | mode | 33 | MUST | 1377 | mode_set_masked | 74 | no | 1378 | mounted_on_fileid | 55 | no | 1379 | no_trunc | 34 | no | 1380 | numlinks | 35 | no | 1381 | owner | 36 | MUST | 1382 | owner_group | 37 | MUST | 1383 | quota_avail_hard | 38 | no | 1384 | quota_avail_soft | 39 | no | 1385 | quota_used | 40 | no | 1386 | rawdev | 41 | no | 1387 | retentevt_get | 71 | MUST | 1388 | retentevt_set | 72 | no | 1389 | retention_get | 69 | MUST | 1390 | retention_hold | 73 | MUST | 1391 | retention_set | 70 | no | 1392 | sacl | 59 | MUST | 1393 | space_avail | 42 | no | 1394 | space_free | 43 | no | 1395 | space_total | 44 | no | 1396 | space_used | 45 | no | 1397 | system | 46 | MUST | 1398 | time_access | 47 | MUST | 1399 | time_access_set | 48 | no | 1400 | time_backup | 49 | no | 1401 | time_create | 50 | MUST | 1402 | time_delta | 51 | no | 1403 | time_metadata | 52 | SHOULD | 1404 | time_modify | 53 | MUST | 1405 | time_modify_set | 54 | no | 1406 +--------------------+----+---------------------------+ 1408 Table 2 1410 [NOTE: The space_reserve attribute XXX_TDH_XXX will be in the MUST 1411 set.] 1413 [NOTE: The source file's attribute values will take precedence over 1414 any attribute values inherited by the destination file.] 1415 In the case of an inter-server copy or an intra-server copy between 1416 file systems, the attributes supported for the source file and 1417 destination file could be different. By definition,the REQUIRED 1418 attributes will be supported in all cases. If the metadata flag is 1419 set and the source file has a RECOMMENDED attribute that is not 1420 supported for the destination file, the copy MUST fail with 1421 NFS4ERR_ATTRNOTSUPP. 1423 Any attribute supported by the destination server that is not set on 1424 the source file SHOULD be left unset. 1426 Metadata attributes not exposed via the NFS protocol SHOULD be copied 1427 to the destination file where appropriate. 1429 The destination file's named attributes are not duplicated from the 1430 source file. After the copy process completes, the client MAY 1431 attempt to duplicate named attributes using standard NFSv4 1432 operations. However, the destination file's named attribute 1433 capabilities MAY be different from the source file's named attribute 1434 capabilities. 1436 If the metadata flag is not set and the client is requesting a whole 1437 file copy (i.e. ca_count is 0 (zero)), the destination file's 1438 metadata is implementation dependent. 1440 If the client is requesting a partial file copy (i.e. ca_count is not 1441 0 (zero)), the client SHOULD NOT set the metadata flag and the server 1442 MUST ignore the metadata flag. 1444 If the operation does not result in an immediate failure, the server 1445 will return NFS4_OK, and the CURRENT_FH will remain the destination's 1446 filehandle. 1448 If an immediate failure does occur, cr_bytes_copied will be set to 1449 the number of bytes copied to the destination file before the error 1450 occurred. The cr_bytes_copied value indicates the number of bytes 1451 copied but not which specific bytes have been copied. 1453 A return of NFS4_OK indicates that either the operation is complete 1454 or the operation was initiated and a callback will be used to deliver 1455 the final status of the operation. 1457 If the cr_callback_id is returned, this indicates that the operation 1458 was initiated and a CB_COPY callback will deliver the final results 1459 of the operation. The cr_callback_id stateid is termed a copy 1460 stateid in this context. The server is given the option of returning 1461 the results in a callback because the data may require a relatively 1462 long period of time to copy. 1464 If no cr_callback_id is returned, the operation completed 1465 synchronously and no callback will be issued by the server. The 1466 completion status of the operation is indicated by cr_status. 1468 If the copy completes successfully, either synchronously or 1469 asynchronously, the data copied from the source file to the 1470 destination file MUST appear identical to the NFS client. However, 1471 the NFS server's on disk representation of the data in the source 1472 file and destination file MAY differ. For example, the NFS server 1473 might encrypt, compress, deduplicate, or otherwise represent the on 1474 disk data in the source and destination file differently. 1476 In the event of a failure the state of the destination file is 1477 implementation dependent. The COPY operation may fail for the 1478 following reasons (this is a partial list). 1480 NFS4ERR_MOVED: The file system which contains the source file, or 1481 the destination file or directory is not present. The client can 1482 determine the correct location and reissue the operation with the 1483 correct location. 1485 NFS4ERR_NOTSUPP: The copy offload operation is not supported by the 1486 NFS server receiving this request. 1488 NFS4ERR_PARTNER_NOTSUPP: The remote server does not support the 1489 server-to-server copy offload protocol. 1491 NFS4ERR_PARTNER_NO_AUTH: The remote server does not authorize a 1492 server-to-server copy offload operation. This may be due to the 1493 client's failure to send the COPY_NOTIFY operation to the remote 1494 server, the remote server receiving a server-to-server copy 1495 offload request after the copy lease time expired, or for some 1496 other permission problem. 1498 NFS4ERR_FBIG: The copy operation would have caused the file to grow 1499 beyond the server's limit. 1501 NFS4ERR_NOTDIR: The CURRENT_FH is a file and ca_destination has non- 1502 zero length. 1504 NFS4ERR_WRONG_TYPE: The SAVED_FH is not a regular file. 1506 NFS4ERR_ISDIR: The CURRENT_FH is a directory and ca_destination has 1507 zero length. 1509 NFS4ERR_INVAL: The source offset or offset plus count are greater 1510 than or equal to the size of the source file. 1512 NFS4ERR_DELAY: The server does not have the resources to perform the 1513 copy operation at the current time. The client should retry the 1514 operation sometime in the future. 1516 NFS4ERR_METADATA_NOTSUPP: The destination file cannot support the 1517 same metadata as the source file. 1519 NFS4ERR_WRONGSEC: The security mechanism being used by the client 1520 does not match the server's security policy. 1522 4.3.5. Operation 60: COPY_ABORT - Cancel a server-side copy 1524 4.3.5.1. ARGUMENT 1526 struct COPY_ABORT4args { 1527 /* CURRENT_FH: desination file */ 1528 stateid4 caa_stateid; 1529 }; 1531 4.3.5.2. RESULT 1533 struct COPY_ABORT4res { 1534 nfsstat4 car_status; 1535 }; 1537 4.3.5.3. DESCRIPTION 1539 COPY_ABORT is used for both intra- and inter-server asynchronous 1540 copies. The COPY_ABORT operation allows the client to cancel a 1541 server-side copy operation that it initiated. This operation is sent 1542 in a COMPOUND request from the client to the destination server. 1543 This operation may be used to cancel a copy when the application that 1544 requested the copy exits before the operation is completed or for 1545 some other reason. 1547 The request contains the filehandle and copy stateid cookies that act 1548 as the context for the previously initiated copy operation. 1550 The result's car_status field indicates whether the cancel was 1551 successful or not. A value of NFS4_OK indicates that the copy 1552 operation was canceled and no callback will be issued by the server. 1553 A copy operation that is successfully canceled may result in none, 1554 some, or all of the data copied. 1556 If the server supports asynchronous copies, the server is REQUIRED to 1557 support the COPY_ABORT operation. 1559 The COPY_ABORT operation may fail for the following reasons (this is 1560 a partial list): 1562 NFS4ERR_NOTSUPP: The abort operation is not supported by the NFS 1563 server receiving this request. 1565 NFS4ERR_RETRY: The abort failed, but a retry at some time in the 1566 future MAY succeed. 1568 NFS4ERR_COMPLETE_ALREADY: The abort failed, and a callback will 1569 deliver the results of the copy operation. 1571 NFS4ERR_SERVERFAULT: An error occurred on the server that does not 1572 map to a specific error code. 1574 4.3.6. Operation 63: COPY_STATUS - Poll for status of a server-side 1575 copy 1577 4.3.6.1. ARGUMENT 1579 struct COPY_STATUS4args { 1580 /* CURRENT_FH: destination file */ 1581 stateid4 csa_stateid; 1582 }; 1584 4.3.6.2. RESULT 1586 union COPY_STATUS4res switch (nfsstat4 csr_status) { 1587 case NFS4_OK: 1588 length4 csr_bytes_copied; 1589 nfsstat4 csr_complete<1>; 1590 default: 1591 void; 1592 }; 1594 4.3.6.3. DESCRIPTION 1596 COPY_STATUS is used for both intra- and inter-server asynchronous 1597 copies. The COPY_STATUS operation allows the client to poll the 1598 server to determine the status of an asynchronous copy operation. 1599 This operation is sent by the client to the destination server. 1601 If this operation is successful, the number of bytes copied are 1602 returned to the client in the csr_bytes_copied field. The 1603 csr_bytes_copied value indicates the number of bytes copied but not 1604 which specific bytes have been copied. 1606 If the optional csr_complete field is present, the copy has 1607 completed. In this case the status value indicates the result of the 1608 asynchronous copy operation. In all cases, the server will also 1609 deliver the final results of the asynchronous copy in a CB_COPY 1610 operation. 1612 The failure of this operation does not indicate the result of the 1613 asynchronous copy in any way. 1615 If the server supports asynchronous copies, the server is REQUIRED to 1616 support the COPY_STATUS operation. 1618 The COPY_STATUS operation may fail for the following reasons (this is 1619 a partial list): 1621 NFS4ERR_NOTSUPP: The copy status operation is not supported by the 1622 NFS server receiving this request. 1624 NFS4ERR_BAD_STATEID: The stateid is not valid (see Section 4.3.8 1625 below). 1627 NFS4ERR_EXPIRED: The stateid has expired (see Copy Offload Stateid 1628 section below). 1630 4.3.7. Operation 15: CB_COPY - Report results of a server-side copy 1632 4.3.7.1. ARGUMENT 1634 union copy_info4 switch (nfsstat4 cca_status) { 1635 case NFS4_OK: 1636 void; 1637 default: 1638 length4 cca_bytes_copied; 1639 }; 1641 struct CB_COPY4args { 1642 nfs_fh4 cca_fh; 1643 stateid4 cca_stateid; 1644 copy_info4 cca_copy_info; 1645 }; 1647 4.3.7.2. RESULT 1649 struct CB_COPY4res { 1650 nfsstat4 ccr_status; 1651 }; 1653 4.3.7.3. DESCRIPTION 1655 CB_COPY is used for both intra- and inter-server asynchronous copies. 1656 The CB_COPY callback informs the client of the result of an 1657 asynchronous server-side copy. This operation is sent by the 1658 destination server to the client in a CB_COMPOUND request. The copy 1659 is identified by the filehandle and stateid arguments. The result is 1660 indicated by the status field. If the copy failed, cca_bytes_copied 1661 contains the number of bytes copied before the failure occurred. The 1662 cca_bytes_copied value indicates the number of bytes copied but not 1663 which specific bytes have been copied. 1665 In the absence of an established backchannel, the server cannot 1666 signal the completion of the COPY via a CB_COPY callback. The loss 1667 of a callback channel would be indicated by the server setting the 1668 SEQ4_STATUS_CB_PATH_DOWN flag in the sr_status_flags field of the 1669 SEQUENCE operation. The client must re-establish the callback 1670 channel to receive the status of the COPY operation. Prolonged loss 1671 of the callback channel could result in the server dropping the COPY 1672 operation state and invalidating the copy stateid. 1674 If the client supports the COPY operation, the client is REQUIRED to 1675 support the CB_COPY operation. 1677 The CB_COPY operation may fail for the following reasons (this is a 1678 partial list): 1680 NFS4ERR_NOTSUPP: The copy offload operation is not supported by the 1681 NFS client receiving this request. 1683 4.3.8. Copy Offload Stateids 1685 A server may perform a copy offload operation asynchronously. An 1686 asynchronous copy is tracked using a copy offload stateid. Copy 1687 offload stateids are included in the COPY, COPY_ABORT, COPY_STATUS, 1688 and CB_COPY operations. 1690 Section 8.2.4 of [2] specifies that stateids are valid until either 1691 (A) the client or server restart or (B) the client returns the 1692 resource. 1694 A copy offload stateid will be valid until either (A) the client or 1695 server restart or (B) the client returns the resource by issuing a 1696 COPY_ABORT operation or the client replies to a CB_COPY operation. 1698 A copy offload stateid's seqid MUST NOT be 0 (zero). In the context 1699 of a copy offload operation, it is ambiguous to indicate the most 1700 recent copy offload operation using a stateid with seqid of 0 (zero). 1701 Therefore a copy offload stateid with seqid of 0 (zero) MUST be 1702 considered invalid. 1704 4.4. Security Considerations 1706 The security considerations pertaining to NFSv4 [10] apply to this 1707 document. 1709 The standard security mechanisms provide by NFSv4 [10] may be used to 1710 secure the protocol described in this document. 1712 NFSv4 clients and servers supporting the the inter-server copy 1713 operations described in this document are REQUIRED to implement [6], 1714 including the RPCSEC_GSSv3 privileges copy_from_auth and 1715 copy_to_auth. If the server-to-server copy protocol is ONC RPC 1716 based, the servers are also REQUIRED to implement the RPCSEC_GSSv3 1717 privilege copy_confirm_auth. These requirements to implement are not 1718 requirements to use. NFSv4 clients and servers are RECOMMENDED to 1719 use [6] to secure server-side copy operations. 1721 4.4.1. Inter-Server Copy Security 1723 4.4.1.1. Requirements for Secure Inter-Server Copy 1725 Inter-server copy is driven by several requirements: 1727 o The specification MUST NOT mandate an inter-server copy protocol. 1728 There are many ways to copy data. Some will be more optimal than 1729 others depending on the identities of the source server and 1730 destination server. For example the source and destination 1731 servers might be two nodes sharing a common file system format for 1732 the source and destination file systems. Thus the source and 1733 destination are in an ideal position to efficiently render the 1734 image of the source file to the destination file by replicating 1735 the file system formats at the block level. In other cases, the 1736 source and destination might be two nodes sharing a common storage 1737 area network, and thus there is no need to copy any data at all, 1738 and instead ownership of the file and its contents simply gets re- 1739 assigned to the destination. 1741 o The specification MUST provide guidance for using NFSv4.x as a 1742 copy protocol. For those source and destination servers willing 1743 to use NFSv4.x there are specific security considerations that 1744 this specification can and does address. 1746 o The specification MUST NOT mandate pre-configuration between the 1747 source and destination server. Requiring that the source and 1748 destination first have a "copying relationship" increases the 1749 administrative burden. However the specification MUST NOT 1750 preclude implementations that require pre-configuration. 1752 o The specification MUST NOT mandate a trust relationship between 1753 the source and destination server. The NFSv4 security model 1754 requires mutual authentication between a principal on an NFS 1755 client and a principal on an NFS server. This model MUST continue 1756 with the introduction of COPY. 1758 4.4.1.2. Inter-Server Copy with RPCSEC_GSSv3 1760 When the client sends a COPY_NOTIFY to the source server to expect 1761 the destination to attempt to copy data from the source server, it is 1762 expected that this copy is being done on behalf of the principal 1763 (called the "user principal") that sent the RPC request that encloses 1764 the COMPOUND procedure that contains the COPY_NOTIFY operation. The 1765 user principal is identified by the RPC credentials. A mechanism 1766 that allows the user principal to authorize the destination server to 1767 perform the copy in a manner that lets the source server properly 1768 authenticate the destination's copy, and without allowing the 1769 destination to exceed its authorization is necessary. 1771 An approach that sends delegated credentials of the client's user 1772 principal to the destination server is not used for the following 1773 reasons. If the client's user delegated its credentials, the 1774 destination would authenticate as the user principal. If the 1775 destination were using the NFSv4 protocol to perform the copy, then 1776 the source server would authenticate the destination server as the 1777 user principal, and the file copy would securely proceed. However, 1778 this approach would allow the destination server to copy other files. 1779 The user principal would have to trust the destination server to not 1780 do so. This is counter to the requirements, and therefore is not 1781 considered. Instead an approach using RPCSEC_GSSv3 [6] privileges is 1782 proposed. 1784 One of the stated applications of the proposed RPCSEC_GSSv3 protocol 1785 is compound client host and user authentication [+ privilege 1786 assertion]. For inter-server file copy, we require compound NFS 1787 server host and user authentication [+ privilege assertion]. The 1788 distinction between the two is one without meaning. 1790 RPCSEC_GSSv3 introduces the notion of privileges. We define three 1791 privileges: 1793 copy_from_auth: A user principal is authorizing a source principal 1794 ("nfs@") to allow a destination principal ("nfs@ 1795 ") to copy a file from the source to the destination. 1796 This privilege is established on the source server before the user 1797 principal sends a COPY_NOTIFY operation to the source server. 1799 struct copy_from_auth_priv { 1800 secret4 cfap_shared_secret; 1801 netloc4 cfap_destination; 1802 /* the NFSv4 user name that the user principal maps to */ 1803 utf8str_mixed cfap_username; 1804 /* equal to seq_num of rpc_gss_cred_vers_3_t */ 1805 unsigned int cfap_seq_num; 1806 }; 1808 cap_shared_secret is a secret value the user principal generates. 1810 copy_to_auth: A user principal is authorizing a destination 1811 principal ("nfs@") to allow it to copy a file from 1812 the source to the destination. This privilege is established on 1813 the destination server before the user principal sends a COPY 1814 operation to the destination server. 1816 struct copy_to_auth_priv { 1817 /* equal to cfap_shared_secret */ 1818 secret4 ctap_shared_secret; 1819 netloc4 ctap_source; 1820 /* the NFSv4 user name that the user principal maps to */ 1821 utf8str_mixed ctap_username; 1822 /* equal to seq_num of rpc_gss_cred_vers_3_t */ 1823 unsigned int ctap_seq_num; 1824 }; 1826 ctap_shared_secret is a secret value the user principal generated 1827 and was used to establish the copy_from_auth privilege with the 1828 source principal. 1830 copy_confirm_auth: A destination principal is confirming with the 1831 source principal that it is authorized to copy data from the 1832 source on behalf of the user principal. When the inter-server 1833 copy protocol is NFSv4, or for that matter, any protocol capable 1834 of being secured via RPCSEC_GSSv3 (i.e. any ONC RPC protocol), 1835 this privilege is established before the file is copied from the 1836 source to the destination. 1838 struct copy_confirm_auth_priv { 1839 /* equal to GSS_GetMIC() of cfap_shared_secret */ 1840 opaque ccap_shared_secret_mic<>; 1841 /* the NFSv4 user name that the user principal maps to */ 1842 utf8str_mixed ccap_username; 1843 /* equal to seq_num of rpc_gss_cred_vers_3_t */ 1844 unsigned int ccap_seq_num; 1845 }; 1847 4.4.1.2.1. Establishing a Security Context 1849 When the user principal wants to COPY a file between two servers, if 1850 it has not established copy_from_auth and copy_to_auth privileges on 1851 the servers, it establishes them: 1853 o The user principal generates a secret it will share with the two 1854 servers. This shared secret will be placed in the 1855 cfap_shared_secret and ctap_shared_secret fields of the 1856 appropriate privilege data types, copy_from_auth_priv and 1857 copy_to_auth_priv. 1859 o An instance of copy_from_auth_priv is filled in with the shared 1860 secret, the destination server, and the NFSv4 user id of the user 1861 principal. It will be sent with an RPCSEC_GSS3_CREATE procedure, 1862 and so cfap_seq_num is set to the seq_num of the credential of the 1863 RPCSEC_GSS3_CREATE procedure. Because cfap_shared_secret is a 1864 secret, after XDR encoding copy_from_auth_priv, GSS_Wrap() (with 1865 privacy) is invoked on copy_from_auth_priv. The 1866 RPCSEC_GSS3_CREATE procedure's arguments are: 1868 struct { 1869 rpc_gss3_gss_binding *compound_binding; 1870 rpc_gss3_chan_binding *chan_binding_mic; 1871 rpc_gss3_assertion assertions<>; 1872 rpc_gss3_extension extensions<>; 1873 } rpc_gss3_create_args; 1875 The string "copy_from_auth" is placed in assertions[0].privs. The 1876 output of GSS_Wrap() is placed in extensions[0].data. The field 1877 extensions[0].critical is set to TRUE. The source server calls 1878 GSS_Unwrap() on the privilege, and verifies that the seq_num 1879 matches the credential. It then verifies that the NFSv4 user id 1880 being asserted matches the source server's mapping of the user 1881 principal. If it does, the privilege is established on the source 1882 server as: <"copy_from_auth", user id, destination>. The 1883 successful reply to RPCSEC_GSS3_CREATE has: 1885 struct { 1886 opaque handle<>; 1887 rpc_gss3_chan_binding *chan_binding_mic; 1888 rpc_gss3_assertion granted_assertions<>; 1889 rpc_gss3_assertion server_assertions<>; 1890 rpc_gss3_extension extensions<>; 1891 } rpc_gss3_create_res; 1893 The field "handle" is the RPCSEC_GSSv3 handle that the client will 1894 use on COPY_NOTIFY requests involving the source and destination 1895 server. granted_assertions[0].privs will be equal to 1896 "copy_from_auth". The server will return a GSS_Wrap() of 1897 copy_to_auth_priv. 1899 o An instance of copy_to_auth_priv is filled in with the shared 1900 secret, the source server, and the NFSv4 user id. It will be sent 1901 with an RPCSEC_GSS3_CREATE procedure, and so ctap_seq_num is set 1902 to the seq_num of the credential of the RPCSEC_GSS3_CREATE 1903 procedure. Because ctap_shared_secret is a secret, after XDR 1904 encoding copy_to_auth_priv, GSS_Wrap() is invoked on 1905 copy_to_auth_priv. The RPCSEC_GSS3_CREATE procedure's arguments 1906 are: 1908 struct { 1909 rpc_gss3_gss_binding *compound_binding; 1910 rpc_gss3_chan_binding *chan_binding_mic; 1911 rpc_gss3_assertion assertions<>; 1912 rpc_gss3_extension extensions<>; 1913 } rpc_gss3_create_args; 1915 The string "copy_to_auth" is placed in assertions[0].privs. The 1916 output of GSS_Wrap() is placed in extensions[0].data. The field 1917 extensions[0].critical is set to TRUE. After unwrapping, 1918 verifying the seq_num, and the user principal to NFSv4 user ID 1919 mapping, the destination establishes a privilege of 1920 <"copy_to_auth", user id, source>. The successful reply to 1921 RPCSEC_GSS3_CREATE has: 1923 struct { 1924 opaque handle<>; 1925 rpc_gss3_chan_binding *chan_binding_mic; 1926 rpc_gss3_assertion granted_assertions<>; 1927 rpc_gss3_assertion server_assertions<>; 1928 rpc_gss3_extension extensions<>; 1929 } rpc_gss3_create_res; 1931 The field "handle" is the RPCSEC_GSSv3 handle that the client will 1932 use on COPY requests involving the source and destination server. 1933 The field granted_assertions[0].privs will be equal to 1934 "copy_to_auth". The server will return a GSS_Wrap() of 1935 copy_to_auth_priv. 1937 4.4.1.2.2. Starting a Secure Inter-Server Copy 1939 When the client sends a COPY_NOTIFY request to the source server, it 1940 uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle. 1941 cna_destination_server in COPY_NOTIFY MUST be the same as the name of 1942 the destination server specified in copy_from_auth_priv. Otherwise, 1943 COPY_NOTIFY will fail with NFS4ERR_ACCESS. The source server 1944 verifies that the privilege <"copy_from_auth", user id, destination> 1945 exists, and annotates it with the source filehandle, if the user 1946 principal has read access to the source file, and if administrative 1947 policies give the user principal and the NFS client read access to 1948 the source file (i.e. if the ACCESS operation would grant read 1949 access). Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS. 1951 When the client sends a COPY request to the destination server, it 1952 uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle. 1953 ca_source_server in COPY MUST be the same as the name of the source 1954 server specified in copy_to_auth_priv. Otherwise, COPY will fail 1955 with NFS4ERR_ACCESS. The destination server verifies that the 1956 privilege <"copy_to_auth", user id, source> exists, and annotates it 1957 with the source and destination filehandles. If the client has 1958 failed to establish the "copy_to_auth" policy it will reject the 1959 request with NFS4ERR_PARTNER_NO_AUTH. 1961 If the client sends a COPY_REVOKE to the source server to rescind the 1962 destination server's copy privilege, it uses the privileged 1963 "copy_from_auth" RPCSEC_GSSv3 handle and the cra_destination_server 1964 in COPY_REVOKE MUST be the same as the name of the destination server 1965 specified in copy_from_auth_priv. The source server will then delete 1966 the <"copy_from_auth", user id, destination> privilege and fail any 1967 subsequent copy requests sent under the auspices of this privilege 1968 from the destination server. 1970 4.4.1.2.3. Securing ONC RPC Server-to-Server Copy Protocols 1972 After a destination server has a "copy_to_auth" privilege established 1973 on it, and it receives a COPY request, if it knows it will use an ONC 1974 RPC protocol to copy data, it will establish a "copy_confirm_auth" 1975 privilege on the source server, using nfs@ as the 1976 initiator principal, and nfs@ as the target principal. 1978 The value of the field ccap_shared_secret_mic is a GSS_VerifyMIC() of 1979 the shared secret passed in the copy_to_auth privilege. The field 1980 ccap_username is the mapping of the user principal to an NFSv4 user 1981 name ("user"@"domain" form), and MUST be the same as ctap_username 1982 and cfap_username. The field ccap_seq_num is the seq_num of the 1983 RPCSEC_GSSv3 credential used for the RPCSEC_GSS3_CREATE procedure the 1984 destination will send to the source server to establish the 1985 privilege. 1987 The source server verifies the privilege, and establishes a 1988 <"copy_confirm_auth", user id, destination> privilege. If the source 1989 server fails to verify the privilege, the COPY operation will be 1990 rejected with NFS4ERR_PARTNER_NO_AUTH. All subsequent ONC RPC 1991 requests sent from the destination to copy data from the source to 1992 the destination will use the RPCSEC_GSSv3 handle returned by the 1993 source's RPCSEC_GSS3_CREATE response. 1995 Note that the use of the "copy_confirm_auth" privilege accomplishes 1996 the following: 1998 o if a protocol like NFS is being used, with export policies, export 1999 policies can be overridden in case the destination server as-an- 2000 NFS-client is not authorized 2002 o manual configuration to allow a copy relationship between the 2003 source and destination is not needed. 2005 If the attempt to establish a "copy_confirm_auth" privilege fails, 2006 then when the user principal sends a COPY request to destination, the 2007 destination server will reject it with NFS4ERR_PARTNER_NO_AUTH. 2009 4.4.1.2.4. Securing Non ONC RPC Server-to-Server Copy Protocols 2011 If the destination won't be using ONC RPC to copy the data, then the 2012 source and destination are using an unspecified copy protocol. The 2013 destination could use the shared secret and the NFSv4 user id to 2014 prove to the source server that the user principal has authorized the 2015 copy. 2017 For protocols that authenticate user names with passwords (e.g. HTTP 2019 [14] and FTP [15]), the nfsv4 user id could be used as the user name, 2020 and an ASCII hexadecimal representation of the RPCSEC_GSSv3 shared 2021 secret could be used as the user password or as input into non- 2022 password authentication methods like CHAP [16]. 2024 4.4.1.3. Inter-Server Copy via ONC RPC but without RPCSEC_GSSv3 2026 ONC RPC security flavors other than RPCSEC_GSSv3 MAY be used with the 2027 server-side copy offload operations described in this document. In 2028 particular, host-based ONC RPC security flavors such as AUTH_NONE and 2029 AUTH_SYS MAY be used. If a host-based security flavor is used, a 2030 minimal level of protection for the server-to-server copy protocol is 2031 possible. 2033 In the absence of strong security mechanisms such as RPCSEC_GSSv3, 2034 the challenge is how the source server and destination server 2035 identify themselves to each other, especially in the presence of 2036 multi-homed source and destination servers. In a multi-homed 2037 environment, the destination server might not contact the source 2038 server from the same network address specified by the client in the 2039 COPY_NOTIFY. This can be overcome using the procedure described 2040 below. 2042 When the client sends the source server the COPY_NOTIFY operation, 2043 the source server may reply to the client with a list of target 2044 addresses, names, and/or URLs and assign them to the unique triple: 2045 . If the destination uses 2046 one of these target netlocs to contact the source server, the source 2047 server will be able to uniquely identify the destination server, even 2048 if the destination server does not connect from the address specified 2049 by the client in COPY_NOTIFY. 2051 For example, suppose the network topology is as shown in Figure 4. 2052 If the source filehandle is 0x12345, the source server may respond to 2053 a COPY_NOTIFY for destination 10.11.78.56 with the URLs: 2055 nfs://10.11.78.18//_COPY/10.11.78.56/_FH/0x12345 2057 nfs://192.168.33.18//_COPY/10.11.78.56/_FH/0x12345 2059 The client will then send these URLs to the destination server in the 2060 COPY operation. Suppose that the 192.168.33.0/24 network is a high 2061 speed network and the destination server decides to transfer the file 2062 over this network. If the destination contacts the source server 2063 from 192.168.33.56 over this network using NFSv4.1, it does the 2064 following: 2066 COMPOUND { PUTROOTFH, LOOKUP "_COPY" ; LOOKUP "10.11.78.56"; LOOKUP 2067 "_FH" ; OPEN "0x12345" ; GETFH } 2069 The source server will therefore know that these NFSv4.1 operations 2070 are being issued by the destination server identified in the 2071 COPY_NOTIFY. 2073 4.4.1.4. Inter-Server Copy without ONC RPC and RPCSEC_GSSv3 2075 The same techniques as Section 4.4.1.3, using unique URLs for each 2076 destination server, can be used for other protocols (e.g. HTTP [14] 2077 and FTP [15]) as well. 2079 4.5. IANA Considerations 2081 This section has no actions for IANA. 2083 5. Space Reservation 2085 5.1. Introduction 2087 This section describes a set of operations that allow applications 2088 such as hypervisors to reserve space for a file, report the amount of 2089 actual disk space a file occupies and freeup the backing space of a 2090 file when it is not required. 2092 In virtualized environments, virtual disk files are often stored on 2093 NFS mounted volumes. Since virtual disk files represent the hard 2094 disks of virtual machines, hypervisors often have to guarantee 2095 certain properties for the file. 2097 One such example is space reservation. When a hypervisor creates a 2098 virtual disk file, it often tries to preallocate the space for the 2099 file so that there are no future allocation related errors during the 2100 operation of the virtual machine. Such errors prevent a virtual 2101 machine from continuing execution and result in downtime. 2103 Another useful feature would be the ability to report the number of 2104 blocks that would be freed when a file is deleted. Currently, NFS 2105 reports two size attributes: 2107 size The logical file size of the file. 2109 space_used The size in bytes that the file occupies on disk 2111 While these attributes are sufficient for space accounting in 2112 traditional filesystems, they prove to be inadequate in modern 2113 filesystems that support block sharing. Having a way to tell the 2114 number of blocks that would be freed if the file was deleted would be 2115 useful to applications that wish to migrate files when a volume is 2116 low on space. 2118 Since virtual disks represent a hard drive in a virtual machine, a 2119 virtual disk can be viewed as a filesystem within a file. Since not 2120 all blocks within a filesystem are in use, there is an opportunity to 2121 reclaim blocks that are no longer in use. A call to deallocate 2122 blocks could result in better space efficiency. Lesser space MAY be 2123 consumed for backups after block deallocation. 2125 We propose the following operations and attributes for the 2126 aforementioned use cases: 2128 space_reserve This attribute specifies whether the blocks backing 2129 the file have been preallocated. 2131 space_freed This attribute specifies the space freed when a file is 2132 deleted, taking block sharing into consideration. 2134 max_hole_punch This attribute specifies the maximum sized hole that 2135 can be punched on the filesystem. 2137 HOLE_PUNCH This operation zeroes and/or deallocates the blocks 2138 backing a region of the file. 2140 5.2. Use Cases 2142 5.2.1. Space Reservation 2144 Some applications require that once a file of a certain size is 2145 created, writes to that file never fail with an out of space 2146 condition. One such example is that of a hypervisor writing to a 2147 virtual disk. An out of space condition while writing to virtual 2148 disks would mean that the virtual machine would need to be frozen. 2150 Currently, in order to achieve such a guarantee, applications zero 2151 the entire file. The initial zeroing allocates the backing blocks 2152 and all subsequent writes are overwrites of already allocated blocks. 2153 This approach is not only inefficient in terms of the amount of I/O 2154 done, it is also not guaranteed to work on filesystems that are log 2155 structured or deduplicated. An efficient way of guaranteeing space 2156 reservation would be beneficial to such applications. 2158 If the space_reserved attribute is set on a file, it is guaranteed 2159 that writes that do not grow the file will not fail with 2160 NFSERR_NOSPC. 2162 5.2.2. Space freed on deletes 2164 Currently, files in NFS have two size attributes: 2166 size The logical file size of the file. 2168 space_used The size in bytes that the file occupies on disk. 2170 While these attributes are sufficient for space accounting in 2171 traditional filesystems, they prove to be inadequate in modern 2172 filesystems that support block sharing. In such filesystems, 2173 multiple inodes can point to a single block with a block reference 2174 count to guard against premature freeing. 2176 If space_used of a file is interpreted to mean the size in bytes of 2177 all disk blocks pointed to by the inode of the file, then shared 2178 blocks get double counted, over-reporting the space utilization. 2179 This also has the adverse effect that the deletion of a file with 2180 shared blocks frees up less than space_used bytes. 2182 On the other hand, if space_used is interpreted to mean the size in 2183 bytes of those disk blocks unique to the inode of the file, then 2184 shared blocks are not counted in any file, resulting in under- 2185 reporting of the space utilization. 2187 For example, two files A and B have 10 blocks each. Let 6 of these 2188 blocks be shared between them. Thus, the combined space utilized by 2189 the two files is 14 * BLOCK_SIZE bytes. In the former case, the 2190 combined space utilization of the two files would be reported as 20 * 2191 BLOCK_SIZE. However, deleting either would only result in 4 * 2192 BLOCK_SIZE being freed. Conversely, the latter interpretation would 2193 report that the space utilization is only 8 * BLOCK_SIZE. 2195 Adding another size attribute, space_freed, is helpful in solving 2196 this problem. space_freed is the number of blocks that are allocated 2197 to the given file that would be freed on its deletion. In the 2198 example, both A and B would report space_freed as 4 * BLOCK_SIZE and 2199 space_used as 10 * BLOCK_SIZE. If A is deleted, B will report 2200 space_freed as 10 * BLOCK_SIZE as the deletion of B would result in 2201 the deallocation of all 10 blocks. 2203 The addition of this problem doesn't solve the problem of space being 2204 over-reported. However, over-reporting is better than under- 2205 reporting. 2207 5.2.3. Operations and attributes 2209 In the sections that follow, one operation and three attributes are 2210 defined that together provide the space management facilities 2211 outlined earlier in the document. The operation is intended to be 2212 OPTIONAL and the attributes RECOMMENDED as defined in section 17 of 2213 [2]. 2215 5.2.4. Attribute 77: space_reserve 2217 The space_reserve attribute is a read/write attribute of type 2218 boolean. It is a per file attribute. When the space_reserved 2219 attribute is set via SETATTR, the server must ensure that there is 2220 disk space to accommodate every byte in the file before it can return 2221 success. If the server cannot guarantee this, it must return 2222 NFS4ERR_NOSPC. 2224 If the client tries to grow a file which has the space_reserved 2225 attribute set, the server must guarantee that there is disk space to 2226 accommodate every byte in the file with the new size before it can 2227 return success. If the server cannot guarantee this, it must return 2228 NFS4ERR_NOSPC. 2230 It is not required that the server allocate the space to the file 2231 before returning success. The allocation can be deferred, however, 2232 it must be guaranteed that it will not fail for lack of space. 2234 The value of space_reserved can be obtained at any time through 2235 GETATTR. 2237 In order to avoid ambiguity, the space_reserve bit cannot be set 2238 along with the size bit in SETATTR. Increasing the size of a file 2239 with space_reserve set will fail if space reservation cannot be 2240 guaranteed for the new size. If the file size is decreased, space 2241 reservation is only guaranteed for the new size and the extra blocks 2242 backing the file can be released. 2244 5.2.5. Attribute 78: space_freed 2246 space_freed gives the number of bytes freed if the file is deleted. 2247 This attribute is read only and is of type length4. It is a per file 2248 attribute. 2250 5.2.6. Attribute 79: max_hole_punch 2252 max_hole_punch specifies the maximum size of a hole that the 2253 HOLE_PUNCH operation can handle. This attribute is read only and of 2254 type length4. It is a per filesystem attribute. This attribute MUST 2255 be implemented if HOLE_PUNCH is implemented. 2257 5.2.7. Operation 64: HOLE_PUNCH - Zero and deallocate blocks backing 2258 the file in the specified range. 2260 5.2.7.1. ARGUMENT 2262 struct HOLE_PUNCH4args { 2263 /* CURRENT_FH: file */ 2264 offset4 hpa_offset; 2265 length4 hpa_count; 2266 }; 2268 5.2.7.2. RESULT 2270 struct HOLEPUNCH4res { 2271 nfsstat4 hpr_status; 2272 }; 2274 5.2.7.3. DESCRIPTION 2276 Whenever a client wishes to deallocate the blocks backing a 2277 particular region in the file, it calls the HOLE_PUNCH operation with 2278 the current filehandle set to the filehandle of the file in question, 2279 start offset and length in bytes of the region set in hpa_offset and 2280 hpa_count respectively. All further reads to this region MUST return 2281 zeros until overwritten. The filehandle specified must be that of a 2282 regular file. 2284 Situations may arise where hpa_offset and/or hpa_offset + hpa_count 2285 will not be aligned to a boundary that the server does allocations/ 2286 deallocations in. For most filesystems, this is the block size of 2287 the file system. In such a case, the server can deallocate as many 2288 bytes as it can in the region. The blocks that cannot be deallocated 2289 MUST be zeroed. Except for the block deallocation and maximum hole 2290 punching capability, a HOLE_PUNCH operation is to be treated similar 2291 to a write of zeroes. 2293 The server is not required to complete deallocating the blocks 2294 specified in the operation before returning. It is acceptable to 2295 have the deallocation be deferred. In fact, HOLE_PUNCH is merely a 2296 hint; it is valid for a server to return success without ever doing 2297 anything towards deallocating the blocks backing the region 2298 specified. However, any future reads to the region MUST return 2299 zeroes. 2301 HOLE_PUNCH will result in the space_used attribute being decreased by 2302 the number of bytes that were deallocated. The space_freed attribute 2303 may or may not decrease, depending on the support and whether the 2304 blocks backing the specified range were shared or not. The size 2305 attribute will remain unchanged. 2307 The HOLE_PUNCH operation MUST NOT change the space reservation 2308 guarantee of the file. While the server can deallocate the blocks 2309 specified by hpa_offset and hpa_count, future writes to this region 2310 MUST NOT fail with NFSERR_NOSPC. 2312 The HOLE_PUNCH operation may fail for the following reasons (this is 2313 a partial list): 2315 NFS4ERR_NOTSUPP The Hole punch operations are not supported by the 2316 NFS server receiving this request. 2318 NFS4ERR_DIR The current filehandle is of type NF4DIR. 2320 NFS4ERR_SYMLINK The current filehandle is of type NF4LNK. 2322 NFS4ERR_WRONG_TYPE The current filehandle does not designate an 2323 ordinary file. 2325 5.3. Security Considerations 2327 There are no security considerations for this section. 2329 5.4. IANA Considerations 2331 This section has no actions for IANA. 2333 6. Simple and Efficient Read Support for Sparse Files 2335 6.1. Introduction 2337 NFS is now used in many data centers as the sole or primary method of 2338 data access. Consequently, more types of applications are using NFS 2339 than ever before, each with their own requirements and generated 2340 workloads. As part of this, sparse files are increasing in number 2341 while NFS continues to lack any specific knowledge of a sparse file's 2342 layout. This document puts forth a proposal for the NFSv4.2 protocol 2343 to support efficient reading of sparse files. 2345 A sparse file is a common way of representing a large file without 2346 having to reserve disk space for it. Consequently, a sparse file 2347 uses less physical space than its size indicates. This means the 2348 file contains 'holes', byte ranges within the file that contain no 2349 data. Most modern file systems support sparse files, including most 2350 UNIX file systems and NTFS, but notably not Apple's HFS+. Common 2351 examples of sparse files include VM OS/disk images, database files, 2352 log files, and even checkpoint recovery files most commonly used by 2353 the HPC community. 2355 If an application reads a hole in a sparse file, the file system must 2356 returns all zeros to the application. For local data access there is 2357 little penalty, but with NFS these zeroes must be transferred back to 2358 the client. If an application uses the NFS client to read data into 2359 memory, this wastes time and bandwidth as the application waits for 2360 the zeroes to be transferred. Once the zeroes arrive, they then 2361 steal memory or cache space from real data. To make matters worse, 2362 if an application then proceeds to write data to another file system, 2363 the zeros are written into the file, expanding the sparse file into a 2364 full sized regular file. Beyond wasting disk space, this can 2365 actually prevent large sparse files from ever being copied to another 2366 storage location due to space limitations. 2368 This document adds a new READPLUS operation to efficiently read from 2369 sparse files by avoiding the transfer of all zero regions from the 2370 server to the client. READPLUS supports all the features of READ but 2371 includes a minimal extension to support sparse files. In addition, 2372 the return value of READPLUS is now compatible with NFSv4.1 minor 2373 versioning rules and could support other future extensions without 2374 requiring yet another operation. READPLUS is guaranteed to perform 2375 no worse than READ, and can dramatically improve performance with 2376 sparse files. READPLUS does not depend on pNFS protocol features, 2377 but can be used by pNFS to support sparse files. 2379 6.2. Terminology 2381 Regular file Regular file: An object of file type NF4REG or 2382 NF4NAMEDATTR. 2384 Sparse file Sparse File. A Regular file that contains one or more 2385 Holes. 2387 Hole Hole. A byte range within a Sparse file that contains regions 2388 of all zeroes. For block-based file systems, this could also be 2389 an unallocated region of the file. 2391 6.3. Applications and Sparse Files 2393 Applications may cause an NFS client to read holes in a file for 2394 several reasons. This section describes three different application 2395 workloads that cause the NFS client to transfer data unnecessarily. 2396 These workloads are simply examples, and there are probably many more 2397 workloads that are negatively impacted by sparse files. 2399 The first workload that can cause holes to be read is sequential 2400 reads within a sparse file. When this happens, the NFS client may 2401 perform read requests ("readahead") into sections of the file not 2402 explicitly requested by the application. Since the NFS client cannot 2403 differentiate between holes and non-holes, the NFS client may 2404 prefetch empty sections of the file. 2406 This workload is exemplified by Virtual Machines and their associated 2407 file system images, e.g., VMware .vmdk files, which are large sparse 2408 files encapsulating an entire operating system. If a VM reads files 2409 within the file system image, this will translate to sequential NFS 2410 read requests into the much larger file system image file. Since NFS 2411 does not understand the internals of the file system image, it ends 2412 up performing readahead file holes. 2414 The second workload is generated by copying a file from a directory 2415 in NFS to either the same NFS server, to another file system, e.g., 2416 another NFS or Samba server, to a local ext3 file system, or even a 2417 network socket. In this case, bandwidth and server resources are 2418 wasted as the entire file is transferred from the NFS server to the 2419 NFS client. Once a byte range of the file has been transferred to 2420 the client, it is up to the client application, e.g., rsync, cp, scp, 2421 on how it writes the data to the target location. For example, cp 2422 supports sparse files and will not write all zero regions, whereas 2423 scp does not support sparse files and will transfer every byte of the 2424 file. 2426 The third workload is generated by applications that do not utilize 2427 the NFS client cache, but instead use direct I/O and manage cached 2428 data independently, e.g., databases. These applications may perform 2429 whole file caching with sparse files, which would mean that even the 2430 holes will be transferred to the clients and cached. 2432 6.4. Overview of Sparse Files and NFSv4 2434 This proposal seeks to provide sparse file support to the largest 2435 number of NFS client and server implementations, and as such proposes 2436 to add a new return code to the mandatory NFSv4.1 READPLUS operation 2437 instead of proposing additions or extensions of new or existing 2438 optional features (such as pNFS). 2440 As well, this document seeks to ensure that the proposed extensions 2441 are simple and do not transfer data between the client and server 2442 unnecessarily. For example, one possible way to implement sparse 2443 file read support would be to have the client, on the first hole 2444 encountered or at OPEN time, request a Data Region Map from the 2445 server. A Data Region Map would specify all zero and non-zero 2446 regions in a file. While this option seems simple, it is less useful 2447 and can become inefficient and cumbersome for several reasons: 2449 o Data Region Maps can be large, and transferring them can reduce 2450 overall read performance. For example, VMware's .vmdk files can 2451 have a file size of over 100 GBs and have a map well over several 2452 MBs. 2454 o Data Region Maps can change frequently, and become invalidated on 2455 every write to the file. This can result the map being 2456 transferred multiple times with each update to the file. For 2457 example, a VM that updates a config file in its file system image 2458 would invalidate the Data Region Map not only for itself, but for 2459 all other clients accessing the same file system image. 2461 o Data Region Maps do not handle all zero-filled sections of the 2462 file, reducing the effectiveness of the solution. While it may be 2463 possible to modify the maps to handle zero-filled sections (at 2464 possibly great effort to the server), it is almost impossible with 2465 pNFS. With pNFS, the owner of the Data Region Map is the metadata 2466 server, which is not in the data path and has no knowledge of the 2467 contents of a data region. 2469 Another way to handle holes is compression, but this not ideal since 2470 it requires all implementations to agree on a single compression 2471 algorithm and requires a fair amount of computational overhead. 2473 Note that supporting writing to a sparse file does not require 2474 changes to the protocol. Applications and/or NFS implementations can 2475 choose to ignore WRITE requests of all zeroes to the NFS server 2476 without consequence. 2478 6.5. Operation 65: READPLUS 2480 The section introduces a new read operation, named READPLUS, which 2481 allows NFS clients to avoid reading holes in a sparse file. READPLUS 2482 is guaranteed to perform no worse than READ, and can dramatically 2483 improve performance with sparse files. 2485 READPLUS supports all the features of the existing NFSv4.1 READ 2486 operation [2] and adds a simple yet significant extension to the 2487 format of its response. The change allows the client to avoid 2488 returning all zeroes from a file hole, wasting computational and 2489 network resources and reducing performance. READPLUS uses a new 2490 result structure that tells the client that the result is all zeroes 2491 AND the byte-range of the hole in which the request was made. 2492 Returning the hole's byte-range, and only upon request, avoids 2493 transferring large Data Region Maps that may be soon invalidated and 2494 contain information about a file that may not even be read in its 2495 entirely. 2497 A new read operation is required due to NFSv4.1 minor versioning 2498 rules that do not allow modification of existing operation's 2499 arguments or results. READPLUS is designed in such a way to allow 2500 future extensions to the result structure. The same approach could 2501 be taken to extend the argument structure, but a good use case is 2502 first required to make such a change. 2504 6.5.1. ARGUMENT 2506 struct COPY_NOTIFY4args { 2507 /* CURRENT_FH: source file */ 2508 netloc4 cna_destination_server; 2509 }; 2511 6.5.2. RESULT 2513 union COPY_NOTIFY4res switch (nfsstat4 cnr_status) { 2514 case NFS4_OK: 2515 nfstime4 cnr_lease_time; 2516 netloc4 cnr_source_server<>; 2517 default: 2518 void; 2519 }; 2521 6.5.3. DESCRIPTION 2523 The READPLUS operation is based upon the NFSv4.1 READ operation [2], 2524 and similarly reads data from the regular file identified by the 2525 current filehandle. 2527 The client provides an offset of where the READPLUS is to start and a 2528 count of how many bytes are to be read. An offset of zero means to 2529 read data starting at the beginning of the file. If offset is 2530 greater than or equal to the size of the file, the status NFS4_OK is 2531 returned with nfs_readplusrestype4 set to READ_OK, data length set to 2532 zero, and eof set to TRUE. The READPLUS is subject to access 2533 permissions checking. 2535 If the client specifies a count value of zero, the READPLUS succeeds 2536 and returns zero bytes of data, again subject to access permissions 2537 checking. In all situations, the server may choose to return fewer 2538 bytes than specified by the client. The client needs to check for 2539 this condition and handle the condition appropriately. 2541 If the client specifies an offset and count value that is entirely 2542 contained within a hole of the file, the status NFS4_OK is returned 2543 with nfs_readplusresok4 set to READ_HOLE, and if information is 2544 available regarding the hole, a nfs_readplusreshole structure 2545 containing the offset and range of the entire hole. The 2546 nfs_readplusreshole structure is considered valid until the file is 2547 changed (detected via the change attribute). The server MUST provide 2548 the same semantics for nfs_readplusreshole as if the client read the 2549 region and received zeroes; the implied holes contents lifetime MUST 2550 be exactly the same as any other read data. 2552 If the client specifies an offset and count value that begins in a 2553 non-hole of the file but extends into hole the server should return a 2554 short read with status NFS4_OK, nfs_readplusresok4 set to READ_OK, 2555 and data length set to the number of bytes returned. The client will 2556 then issue another READPLUS for the remaining bytes, which the server 2557 will respond with information about the hole in the file. 2559 If the server knows that the requested byte range is into a hole of 2560 the file, but has no further information regarding the hole, it 2561 returns a nfs_readplusreshole structure with holeres4 set to 2562 HOLE_NOINFO. 2564 If hole information is available on the server and can be returned to 2565 the client, the server returns a nfs_readplusreshole structure with 2566 the value of holeres4 to HOLE_INFO. The values of hole_offset and 2567 hole_length define the byte-range for the current hole in the file. 2568 These values represent the information known to the server and may 2569 describe a byte-range smaller than the true size of the hole. 2571 Except when special stateids are used, the stateid value for a 2572 READPLUS request represents a value returned from a previous byte- 2573 range lock or share reservation request or the stateid associated 2574 with a delegation. The stateid identifies the associated owners if 2575 any and is used by the server to verify that the associated locks are 2576 still valid (e.g., have not been revoked). 2578 If the read ended at the end-of-file (formally, in a correctly formed 2579 READPLUS operation, if offset + count is equal to the size of the 2580 file), or the READPLUS operation extends beyond the size of the file 2581 (if offset + count is greater than the size of the file), eof is 2582 returned as TRUE; otherwise, it is FALSE. A successful READPLUS of 2583 an empty file will always return eof as TRUE. 2585 If the current filehandle is not an ordinary file, an error will be 2586 returned to the client. In the case that the current filehandle 2587 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If 2588 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 2589 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 2591 For a READPLUS with a stateid value of all bits equal to zero, the 2592 server MAY allow the READPLUS to be serviced subject to mandatory 2593 byte-range locks or the current share deny modes for the file. For a 2594 READPLUS with a stateid value of all bits equal to one, the server 2595 MAY allow READPLUS operations to bypass locking checks at the server. 2597 On success, the current filehandle retains its value. 2599 6.5.4. IMPLEMENTATION 2601 If the server returns a "short read" (i.e., fewer data than requested 2602 and eof is set to FALSE), the client should send another READPLUS to 2603 get the remaining data. A server may return less data than requested 2604 under several circumstances. The file may have been truncated by 2605 another client or perhaps on the server itself, changing the file 2606 size from what the requesting client believes to be the case. This 2607 would reduce the actual amount of data available to the client. It 2608 is possible that the server reduce the transfer size and so return a 2609 short read result. Server resource exhaustion may also occur in a 2610 short read. 2612 If mandatory byte-range locking is in effect for the file, and if the 2613 byte-range corresponding to the data to be read from the file is 2614 WRITE_LT locked by an owner not associated with the stateid, the 2615 server will return the NFS4ERR_LOCKED error. The client should try 2616 to get the appropriate READ_LT via the LOCK operation before re- 2617 attempting the READPLUS. When the READPLUS completes, the client 2618 should release the byte-range lock via LOCKU. 2620 If another client has an OPEN_DELEGATE_WRITE delegation for the file 2621 being read, the delegation must be recalled, and the operation cannot 2622 proceed until that delegation is returned or revoked. Except where 2623 this happens very quickly, one or more NFS4ERR_DELAY errors will be 2624 returned to requests made while the delegation remains outstanding. 2625 Normally, delegations will not be recalled as a result of a READPLUS 2626 operation since the recall will occur as a result of an earlier OPEN. 2627 However, since it is possible for a READPLUS to be done with a 2628 special stateid, the server needs to check for this case even though 2629 the client should have done an OPEN previously. 2631 6.5.4.1. Additional pNFS Implementation Information 2633 With pNFS, the semantics of using READPLUS remains the same. Any 2634 data server MAY return a READ_HOLE result for a READPLUS request that 2635 it receives. 2637 When a data server chooses to return a READ_HOLE result, it has a 2638 certain level of flexibility in how it fills out the 2639 nfs_readplusreshole structure. 2641 1. For a data server that cannot determine any hole information, the 2642 data server SHOULD return HOLE_NOINFO. 2644 2. For a data server that can only obtain hole information for the 2645 parts of the file stored on that data server, the data server 2646 SHOULD return HOLE_INFO and the byte range of the hole stored on 2647 that data server. 2649 3. For a data server that can obtain hole information for the entire 2650 file without severe performance impact, it MAY return HOLE_INFO 2651 nd the byte range of the entire file hole. 2653 In general, a data server should do its best to return as much 2654 information about a hole as is feasible. In general, pNFS server 2655 implementers should try ensure that data servers do not overload the 2656 metadata server with requests for information. Therefore, if 2657 supplying global sparse information for a file to data servers can 2658 overwhelm a metadata server, then data servers should use option 1 or 2659 2 above. 2661 When a pNFS client receives a READ_HOLE result and a non-empty 2662 nfs_readplusreshole structure, it MAY use this information in 2663 conjunction with a valid layout for the file to determine the next 2664 data server for the next region of data that is not in a hole. 2666 6.5.5. READPLUS with Sparse Files Example 2668 To see how the return value READ_HOLE will work, the following table 2669 describes a sparse file. For each byte range, the file contains 2670 either non-zero data or a hole. 2672 +-------------+----------+ 2673 | Byte-Range | Contents | 2674 +-------------+----------+ 2675 | 0-31999 | Non-Zero | 2676 | 32K-255999 | Hole | 2677 | 256K-287999 | Non-Zero | 2678 | 288K-353999 | Hole | 2679 | 354K-417999 | Non-Zero | 2680 +-------------+----------+ 2682 Table 3 2684 Under the given circumstances, if a client was to read the file from 2685 beginning to end with a max read size of 64K, the following will be 2686 the result. This assumes the client has already opened the file and 2687 acquired a valid stateid and just needs to issue READPLUS requests. 2689 1. READPLUS(s, 0, 64K) --> NFS_OK, readplusrestype4 = READ_OK, eof = 2690 false, data<>[32K]. Return a short read, as the last half of the 2691 equest was all zeroes. 2693 2. READPLUS(s, 32K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE, 2694 nfs_readplusreshole(HOLE_INFO)(32K, 224K). The requested range 2695 was all zeros, and the current hole begins at offset 32K and is 2696 224K in length. 2698 3. READPLUS(s, 256K, 64K) --> NFS_OK, readplusrestype4 = READ_OK, 2699 eof = false, data<>[32K]. Return a short read, as the last half 2700 of the request was all zeroes. 2702 4. READPLUS(s, 288K, 64K) --> NFS_OK, readplusrestype4 = READ_HOLE, 2703 nfs_readplusreshole(HOLE_INFO)(288K, 66K). 2705 5. READPLUS(s, 354K, 64K) --> NFS_OK, readplusrestype4 = READ_OK, 2706 eof = true, data<>[64K]. 2708 6.6. Related Work 2710 Solaris and ZFS support an extension to lseek(2) that allows 2711 applications to discover holes in a file. The values, SEEK_HOLE and 2712 SEEK_DATA, allow clients to seek to the next hole or beginning of 2713 data, respectively. 2715 XFS supports the XFS_IOC_GETBMAP extended attribute, which returns 2716 the Data Region Map for a file. Clients can then use this 2717 information to avoid reading holes in a file. 2719 NTFS and CIFS support the FSCTL_SET_SPARSE attribute, which allows 2720 applications to control whether empty regions of the file are 2721 preallocated and filled in with zeros or simply left unallocated. 2723 6.7. Security Considerations 2725 The additions to the NFS protocol for supporting sparse file reads 2726 does not alter the security considerations of the NFSv4.1 protocol 2727 [2]. 2729 6.8. IANA Considerations 2731 There are no IANA considerations in this section. 2733 7. Security Considerations 2735 8. IANA Considerations 2737 This section uses terms that are defined in [17]. 2739 9. References 2741 9.1. Normative References 2743 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2744 Levels", March 1997. 2746 [2] Shepler, S., Eisler, M., and D. Noveck, "Network File System 2747 (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, 2748 January 2010. 2750 [3] Black, D., Glasgow, J., and S. Fridella, "Parallel NFS (pNFS) 2751 Block/Volume Layout", RFC 5663, January 2010. 2753 [4] Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel 2754 NFS (pNFS) Operations", RFC 5664, January 2010. 2756 [5] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 2757 Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, 2758 January 2005. 2760 [6] Williams, N., "Remote Procedure Call (RPC) Security Version 3", 2761 draft-williams-rpcsecgssv3 (work in progress), 2008. 2763 [7] Shepler, S., Eisler, M., and D. Noveck, "Network File System 2764 (NFS) Version 4 Minor Version 1 External Data Representation 2765 Standard (XDR) Description", RFC 5662, January 2010. 2767 [8] Haynes, T., "Network File System (NFS) Version 4 Minor Version 2768 2 External Data Representation Standard (XDR) Description", 2769 March 2011. 2771 [9] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 2772 Specification", RFC 2203, September 1997. 2774 9.2. Informative References 2776 [10] Haynes, T. and D. Noveck, "Network File System (NFS) version 4 2777 Protocol", draft-ietf-nfsv4-rfc3530bis-09 (Work In Progress), 2778 March 2011. 2780 [11] Eisler, M., "XDR: External Data Representation Standard", 2781 RFC 4506, May 2006. 2783 [12] Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik, 2784 "NSDB Protocol for Federated Filesystems", 2785 draft-ietf-nfsv4-federated-fs-protocol (Work In Progress), 2786 2010. 2788 [13] Lentini, J., Everhart, C., Ellard, D., Tewari, R., and M. Naik, 2789 "Administration Protocol for Federated Filesystems", 2790 draft-ietf-nfsv4-federated-fs-admin (Work In Progress), 2010. 2792 [14] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 2793 Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- 2794 HTTP/1.1", RFC 2616, June 1999. 2796 [15] Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9, 2797 RFC 959, October 1985. 2799 [16] Simpson, W., "PPP Challenge Handshake Authentication Protocol 2800 (CHAP)", RFC 1994, August 1996. 2802 [17] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 2803 Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. 2805 [18] Nowicki, B., "NFS: Network File System Protocol specification", 2806 RFC 1094, March 1989. 2808 [19] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 2809 Protocol Specification", RFC 1813, June 1995. 2811 [20] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 2812 RFC 1833, August 1995. 2814 [21] Eisler, M., "NFS Version 2 and Version 3 Security Issues and 2815 the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", 2816 RFC 2623, June 1999. 2818 [22] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. 2820 [23] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, 2821 June 1999. 2823 [24] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On- 2824 line Database", RFC 3232, January 2002. 2826 [25] Linn, J., "The Kerberos Version 5 GSS-API Mechanism", RFC 1964, 2827 June 1996. 2829 [26] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, 2830 C., Eisler, M., and D. Noveck, "Network File System (NFS) 2831 version 4 Protocol", RFC 3530, April 2003. 2833 Appendix A. Acknowledgments 2835 For the pNFS Access Permissions Check, the original draft was by 2836 Sorin Faibish, David Black, Mike Eisler, and Jason Glasgow. The work 2837 was influenced by discussions with Benny Halevy and Bruce Fields. A 2838 review was done by Tom Haynes. 2840 For the Sharing change attribute implementation details with NFSv4 2841 clients, the original draft was by Trond Myklebust. 2843 For the NFS Server-side Copy, the original draft was by James 2844 Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul 2845 Iyer. Talpey co-authored an unpublished version of that document. 2846 It was also was reviewed by a number of individuals: Pranoop Erasani, 2847 Tom Haynes, Arthur Lent, Trond Myklebust, Dave Noveck, Theresa 2848 Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani, and Nico 2849 Williams. 2851 For the NFS space reservation operations, the original draft was by 2852 Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer. 2854 For the sparse file support, the original draft was by Dean 2855 Hildebrand and Marc Eshel. Valuable input and advice was received 2856 from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and 2857 Richard Scheffenegger. 2859 Appendix B. RFC Editor Notes 2861 [RFC Editor: please remove this section prior to publishing this 2862 document as an RFC] 2864 [RFC Editor: prior to publishing this document as an RFC, please 2865 replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the 2866 RFC number of this document] 2868 Author's Address 2870 Thomas Haynes 2871 NetApp 2872 9110 E 66th St 2873 Tulsa, OK 74133 2874 USA 2876 Phone: +1 918 307 1415 2877 Email: thomas@netapp.com 2878 URI: http://www.tulsalabs.com