idnits 2.17.1 draft-ietf-nfsv4-minorversion2-41.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: o A random number is generated to use as a secret to be shared between the two servers. Note that the random number SHOULD not be reused between establishing different security contexts. The resulting shared secret will be placed in the cfap_shared_secret and ctap_shared_secret fields of the appropriate privilege data types, copy_from_auth_priv and copy_to_auth_priv. Because of this shared_secret the RPCSEC_GSS3_CREATE control messages for copy_from_auth and copy_to_auth MUST use a Quality of Protection (QOP) of rpc_gss_svc_privacy. -- The document date (January 28, 2016) is 3010 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 4045 == Missing Reference: '32K' is mentioned on line 4045, but not defined == Outdated reference: A later version (-41) exists of draft-ietf-nfsv4-minorversion2-dot-x-40 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) == Outdated reference: A later version (-11) exists of draft-ietf-nfsv4-versioning-03 -- Obsolete informational reference (is this intentional?): RFC 2401 (Obsoleted by RFC 4301) -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Haynes 3 Internet-Draft Primary Data 4 Intended status: Standards Track January 28, 2016 5 Expires: July 31, 2016 7 NFS Version 4 Minor Version 2 8 draft-ietf-nfsv4-minorversion2-41.txt 10 Abstract 12 This Internet-Draft describes NFS version 4 minor version two, 13 describing the protocol extensions made from NFS version 4 minor 14 version 1. Major extensions introduced in NFS version 4 minor 15 version two include: Server Side Copy, Application Input/Output (I/O) 16 Advise, Space Reservations, Sparse Files, Application Data Blocks, 17 and Labeled NFS. 19 Requirements Language 21 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 22 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 23 document are to be interpreted as described in RFC 2119 [RFC2119]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on July 31, 2016. 42 Copyright Notice 44 Copyright (c) 2016 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.1. Scope of This Document . . . . . . . . . . . . . . . . . 5 61 1.2. NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . . 5 62 1.3. Overview of NFSv4.2 Features . . . . . . . . . . . . . . 6 63 1.3.1. Server Side Clone and Copy . . . . . . . . . . . . . 6 64 1.3.2. Application Input/Output (I/O) Advise . . . . . . . . 6 65 1.3.3. Sparse Files . . . . . . . . . . . . . . . . . . . . 6 66 1.3.4. Space Reservation . . . . . . . . . . . . . . . . . . 7 67 1.3.5. Application Data Block (ADB) Support . . . . . . . . 7 68 1.3.6. Labeled NFS . . . . . . . . . . . . . . . . . . . . . 7 69 1.3.7. Layout Enhancements . . . . . . . . . . . . . . . . . 7 70 1.4. Enhancements to Minor Versioning Model . . . . . . . . . 7 71 2. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 8 72 3. pNFS considerations for New Operations . . . . . . . . . . . 8 73 3.1. Atomicity for ALLOCATE and DEALLOCATE . . . . . . . . . . 9 74 3.2. Sharing of stateids with NFSv4.1 . . . . . . . . . . . . 9 75 3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout 76 Type . . . . . . . . . . . . . . . . . . . . . . . . . . 9 77 3.3.1. Operations Sent to NFSv4.2 Data Servers . . . . . . . 9 78 4. Server Side Copy . . . . . . . . . . . . . . . . . . . . . . 9 79 4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 10 80 4.1.1. Copy Operations . . . . . . . . . . . . . . . . . . . 11 81 4.1.2. Requirements for Operations . . . . . . . . . . . . . 11 82 4.2. Requirements for Inter-Server Copy . . . . . . . . . . . 12 83 4.3. Implementation Considerations . . . . . . . . . . . . . . 13 84 4.3.1. Locking the Files . . . . . . . . . . . . . . . . . . 13 85 4.3.2. Client Caches . . . . . . . . . . . . . . . . . . . . 13 86 4.4. Intra-Server Copy . . . . . . . . . . . . . . . . . . . . 13 87 4.5. Inter-Server Copy . . . . . . . . . . . . . . . . . . . . 15 88 4.6. Server-to-Server Copy Protocol . . . . . . . . . . . . . 19 89 4.6.1. Considerations on Selecting a Copy Protocol . . . . . 19 90 4.6.2. Using NFSv4.x as the Copy Protocol . . . . . . . . . 19 91 4.6.3. Using an Alternative Copy Protocol . . . . . . . . . 19 92 4.7. netloc4 - Network Locations . . . . . . . . . . . . . . . 20 93 4.8. Copy Offload Stateids . . . . . . . . . . . . . . . . . . 21 94 4.9. Security Considerations . . . . . . . . . . . . . . . . . 21 95 4.9.1. Inter-Server Copy Security . . . . . . . . . . . . . 22 96 5. Support for Application I/O Hints . . . . . . . . . . . . . . 29 97 6. Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . 30 98 6.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 31 99 6.2. New Operations . . . . . . . . . . . . . . . . . . . . . 31 100 6.2.1. READ_PLUS . . . . . . . . . . . . . . . . . . . . . . 31 101 6.2.2. DEALLOCATE . . . . . . . . . . . . . . . . . . . . . 31 102 7. Space Reservation . . . . . . . . . . . . . . . . . . . . . . 32 103 8. Application Data Block Support . . . . . . . . . . . . . . . 34 104 8.1. Generic Framework . . . . . . . . . . . . . . . . . . . . 34 105 8.1.1. Data Block Representation . . . . . . . . . . . . . . 35 106 8.2. An Example of Detecting Corruption . . . . . . . . . . . 35 107 8.3. Example of READ_PLUS . . . . . . . . . . . . . . . . . . 37 108 8.4. An Example of Zeroing Space . . . . . . . . . . . . . . . 38 109 9. Labeled NFS . . . . . . . . . . . . . . . . . . . . . . . . . 38 110 9.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 39 111 9.2. MAC Security Attribute . . . . . . . . . . . . . . . . . 40 112 9.2.1. Delegations . . . . . . . . . . . . . . . . . . . . . 40 113 9.2.2. Permission Checking . . . . . . . . . . . . . . . . . 41 114 9.2.3. Object Creation . . . . . . . . . . . . . . . . . . . 41 115 9.2.4. Existing Objects . . . . . . . . . . . . . . . . . . 41 116 9.2.5. Label Changes . . . . . . . . . . . . . . . . . . . . 41 117 9.3. pNFS Considerations . . . . . . . . . . . . . . . . . . . 42 118 9.4. Discovery of Server Labeled NFS Support . . . . . . . . . 42 119 9.5. MAC Security NFS Modes of Operation . . . . . . . . . . . 42 120 9.5.1. Full Mode . . . . . . . . . . . . . . . . . . . . . . 43 121 9.5.2. Guest Mode . . . . . . . . . . . . . . . . . . . . . 44 122 9.6. Security Considerations for Labeled NFS . . . . . . . . . 44 123 10. Sharing change attribute implementation characteristics with 124 NFSv4 clients . . . . . . . . . . . . . . . . . . . . . . . . 45 125 11. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 45 126 11.1. Error Definitions . . . . . . . . . . . . . . . . . . . 46 127 11.1.1. General Errors . . . . . . . . . . . . . . . . . . . 46 128 11.1.2. Server to Server Copy Errors . . . . . . . . . . . . 46 129 11.1.3. Labeled NFS Errors . . . . . . . . . . . . . . . . . 47 130 11.2. New Operations and Their Valid Errors . . . . . . . . . 47 131 11.3. New Callback Operations and Their Valid Errors . . . . . 52 132 12. New File Attributes . . . . . . . . . . . . . . . . . . . . . 52 133 12.1. New RECOMMENDED Attributes - List and Definition 134 References . . . . . . . . . . . . . . . . . . . . . . . 52 135 12.2. Attribute Definitions . . . . . . . . . . . . . . . . . 53 136 13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 56 137 14. Modifications to NFSv4.1 Operations . . . . . . . . . . . . . 59 138 14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 59 139 14.2. Operation 48: GETDEVICELIST - Get All Device Mappings 140 for a File System . . . . . . . . . . . . . . . . . . . 60 141 15. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . 62 142 15.1. Operation 59: ALLOCATE - Reserve Space in A Region of a 143 File . . . . . . . . . . . . . . . . . . . . . . . . . . 62 144 15.2. Operation 60: COPY - Initiate a server-side copy . . . . 63 145 15.3. Operation 61: COPY_NOTIFY - Notify a source server of a 146 future copy . . . . . . . . . . . . . . . . . . . . . . 68 147 15.4. Operation 62: DEALLOCATE - Unreserve Space in a Region 148 of a File . . . . . . . . . . . . . . . . . . . . . . . 70 149 15.5. Operation 63: IO_ADVISE - Application I/O access pattern 150 hints . . . . . . . . . . . . . . . . . . . . . . . . . 71 151 15.6. Operation 64: LAYOUTERROR - Provide Errors for the 152 Layout . . . . . . . . . . . . . . . . . . . . . . . . . 77 153 15.7. Operation 65: LAYOUTSTATS - Provide Statistics for the 154 Layout . . . . . . . . . . . . . . . . . . . . . . . . . 80 155 15.8. Operation 66: OFFLOAD_CANCEL - Stop an Offloaded 156 Operation . . . . . . . . . . . . . . . . . . . . . . . 81 157 15.9. Operation 67: OFFLOAD_STATUS - Poll for Status of 158 Asynchronous Operation . . . . . . . . . . . . . . . . . 82 159 15.10. Operation 68: READ_PLUS - READ Data or Holes from a File 83 160 15.11. Operation 69: SEEK - Find the Next Data or Hole . . . . 88 161 15.12. Operation 70: WRITE_SAME - WRITE an ADB Multiple Times 162 to a File . . . . . . . . . . . . . . . . . . . . . . . 89 163 15.13. Operation 71: CLONE - Clone a range of file into another 164 file . . . . . . . . . . . . . . . . . . . . . . . . . . 93 165 16. NFSv4.2 Callback Operations . . . . . . . . . . . . . . . . . 95 166 16.1. Operation 15: CB_OFFLOAD - Report results of an 167 asynchronous operation . . . . . . . . . . . . . . . . . 95 168 17. Security Considerations . . . . . . . . . . . . . . . . . . . 96 169 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 97 170 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 97 171 19.1. Normative References . . . . . . . . . . . . . . . . . . 97 172 19.2. Informative References . . . . . . . . . . . . . . . . . 98 173 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 99 174 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 100 175 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 101 177 1. Introduction 179 The NFS version 4 minor version 2 (NFSv4.2) protocol is the third 180 minor version of the NFS version 4 (NFSv4) protocol. The first minor 181 version, NFSv4.0, is described in [RFC7530] and the second minor 182 version, NFSv4.1, is described in [RFC5661]. 184 As a minor version, NFSv4.2 is consistent with the overall goals for 185 NFSv4, but extends the protocol so as to better meet those goals, 186 based on experiences with NFSv4.1. In addition, NFSv4.2 has adopted 187 some additional goals, which motivate some of the major extensions in 188 NFSv4.2. 190 1.1. Scope of This Document 192 This document describes the NFSv4.2 protocol as a set of extensions 193 to the specification for NFSv4.1. That specification remains current 194 and forms the basis for the additions defined herein. In addition, 195 the specfication for NFSv4.0 remains current as well. 197 It is necessary to implement all the REQUIRED features of NFSv4.1 198 before adding NFSv4.2 features to the implementation. With respect 199 to NFSv4.0 and NFSv4.1, this document does not: 201 o describe the NFSv4.0 or NFSv4.1 protocols, except where needed to 202 contrast with NFSv4.2 204 o modify the specification of the NFSv4.0 or NFSv4.1 protocols 206 o clarify the NFSv4.0 or NFSv4.1 protocols, that is any 207 clarifications made here apply only to NFSv4.2 and neither of the 208 prior protocols 210 NFSv4.2 is a superset of NFSv4.1, with all of the new features being 211 optional. As such, NFSv4.2 maintains the same compatibility that 212 NFSv4.1 had with NFSv4.0. Any interactions of a new feature with 213 NFSv4.1 semantics, is described in the relevant text. 215 The full External Data Representation (XDR) [RFC4506] for NFSv4.2 is 216 presented in [I-D.ietf-nfsv4-minorversion2-dot-x]. 218 1.2. NFSv4.2 Goals 220 A major goal of the enhancements provided in NFSv4.2 is to take 221 common local file system features that have not been available 222 through earlier versions of NFS, and to offer them remotely. These 223 features might 225 o already be available on the servers, e.g., sparse files 227 o be under development as a new standard, e.g., SEEK pulls in both 228 SEEK_HOLE and SEEK_DATA 230 o be used by clients with the servers via some proprietary means, 231 e.g., Labeled NFS 233 NFSv4.2 provides means for clients to leverage these features on the 234 server in cases in which that had previously not been possible within 235 the confines of the NFS protocol. 237 1.3. Overview of NFSv4.2 Features 239 1.3.1. Server Side Clone and Copy 241 A traditional file copy of a remotely accessed file, whether from one 242 server to another or between locations in the same server, results in 243 the data being put on the network twice - source to client and then 244 client to destination. New operations are introduced to allow 245 unnecessary traffic to be eliminated: 247 o The intra-server clone feature allows the client to request a 248 synchronous cloning, perhaps by copy-on-write semantics. 250 o The intra-server copy feature allows the client to request the 251 server to perform the copy internally, avoiding unnecessary 252 network traffic. 254 o The inter-server copy feature allows the client to authorize the 255 source and destination servers to interact directly. 257 As such copies can be lengthy, asynchronous support is also provided. 259 1.3.2. Application Input/Output (I/O) Advise 261 Applications and clients want to advise the server as to expected I/O 262 behavior. Using IO_ADVISE (see Section 15.5) to communicate future I 263 /O behavior such as whether a file will be accessed sequentially or 264 randomly, and whether a file will or will not be accessed in the near 265 future, allows servers to optimize future I/O requests for a file by, 266 for example, prefetching or evicting data. This operation can be 267 used to support the posix_fadvise [posix_fadvise] function. In 268 addition, it may be helpful to applications such as databases and 269 video editors. 271 1.3.3. Sparse Files 273 Sparse files are ones which have unallocated or uninitialized data 274 blocks as holes in the file. Such holes are typically transferred as 275 0s when read from the file. READ_PLUS (see Section 15.10) allows a 276 server to send back to the client metadata describing the hole and 277 DEALLOCATE (see Section 15.4) allows the client to punch holes into a 278 file. In addition, SEEK (see Section 15.11) is provided to scan for 279 the next hole or data from a given location. 281 1.3.4. Space Reservation 283 When a file is sparse, one concern applications have is ensuring that 284 there will always be enough data blocks available for the file during 285 future writes. ALLOCATE (see Section 15.1) allows a client to 286 request a guarantee that space will be available. Also DEALLOCATE 287 (see Section 15.4) allows the client to punch a hole into a file, 288 thus releasing a space reservation. 290 1.3.5. Application Data Block (ADB) Support 292 Some applications treat a file as if it were a disk and as such want 293 to initialize (or format) the file image. The WRITE_SAME (see 294 Section 15.12) is introduced to send this metadata to the server to 295 allow it to write the block contents. 297 1.3.6. Labeled NFS 299 While both clients and servers can employ Mandatory Access Control 300 (MAC) security models to enforce data access, there has been no 301 protocol support for interoperability. A new file object attribute, 302 sec_label (see Section 12.2.4) allows for the server to store MAC 303 labels on files, which the client retrieves and uses to enforce data 304 access (see Section 9.5.2). The format of the sec_label accommodates 305 any MAC security system. 307 1.3.7. Layout Enhancements 309 In the parallel NFS implementations of NFSv4.1 (see Section 12 of 310 [RFC5661]), the client cannot communicate back to the metadata server 311 any errors or performance characteristics with the storage devices. 312 NFSv4.2 provides two new operations to do so respectively: 313 LAYOUTERROR (see Section 15.6) and LAYOUTSTATS (see Section 15.7). 315 1.4. Enhancements to Minor Versioning Model 317 In NFSv4.1, the only way to introduce new variants of an operation 318 was to introduce a new operation. For instance, READ would have to 319 be replaced or supplemented by, say, either READ2 or READ_PLUS. With 320 the use of discriminated unions as parameters to such functions in 321 NFSv4.2, it is possible to add a new arm (i.e., a new entry in the 322 union and a corresponding new field in the structure) in a subsequent 323 minor version. And it is also possible to move such an operation 324 from OPTIONAL/RECOMMENDED to REQUIRED. Forcing an implementation to 325 adopt each arm of a discriminated union at such a time does not meet 326 the spirit of the minor versioning rules. As such, new arms of a 327 discriminated union MUST follow the same guidelines for minor 328 versioning as operations in NFSv4.1 - i.e., they may not be made 329 REQUIRED. To support this, a new error code, NFS4ERR_UNION_NOTSUPP, 330 allows the server to communicate to the client that the operation is 331 supported, but the specific arm of the discriminated union is not. 333 2. Minor Versioning 335 NFSv4.2 is a minor version of NFSv4 and is built upon NFSv4.1 as 336 documented in [RFC5661] and [RFC5662]. 338 NFSv4.2 does not modify the rules applicable to the NFSv4 versioning 339 process and follows the rules set out in [RFC5661] or in standard- 340 track documents updating that document (e.g., in an RFC based on 341 [I-D.ietf-nfsv4-versioning]). 343 NFSv4.2 only defines extensions to NFSv4.1, each of which may be 344 supported (or not) independently. It does not 346 o introduce infrastructural features 348 o make existing features MANDATORY to NOT implement 350 o change the status of existing features (i.e., by changing their 351 status among OPTIONAL, RECOMMENDED, REQUIRED). 353 The following versioning-related considerations should be noted. 355 o When a new case is added to an existing switch, servers need to 356 report non-support of that new case by returning 357 NFS4ERR_UNION_NOTSUPP. 359 o As regards the potential cross-minor-version transfer of stateids, 360 Parallel NFS (pNFS) (see Section 12 of [RFC5661]) implementations 361 of the file mapping type may support of use of an NFSv4.2 metadata 362 server (see Sections 1.7.2.2 and 12.2.2 of [RFC5661]) with NFSv4.1 363 data servers. In this context, a stateid returned by an NFSv4.2 364 COMPOUND will be used in an NFSv4.1 COMPOUND directed to the data 365 server (see Sections 3.2 and 3.3). 367 3. pNFS considerations for New Operations 369 The interactions of the new operations with non-pNFS functionality is 370 straight forward and covered in the relevant sections. However, the 371 interactions of the new operations with pNFS is more complicated and 372 this section provides an overview. 374 3.1. Atomicity for ALLOCATE and DEALLOCATE 376 Both ALLOCATE (see Section 15.1) and DEALLOCATE (see Section 15.4) 377 are sent to the metadata server, which is responsible for 378 coordinating the changes onto the storage devices. In particular, 379 both operations must either fully succeed or fail, it cannot be the 380 case that one storage device succeeds whilst another fails. 382 3.2. Sharing of stateids with NFSv4.1 384 A NFSv4.2 metadata server can hand out a layout to a NFSv4.1 storage 385 device. Section 13.9.1 of [RFC5661] discusses how the client gets a 386 stateid from the metadata server to present to a storage device. 388 3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout Type 390 A file layout provided by a NFSv4.2 server may refer either to a 391 storage device that only implements NFSv4.1 as specified in 392 [RFC5661], or to a storage device that implements additions from 393 NFSv4.2, in which case the rules in Section 3.3.1 apply. As the File 394 Layout Type does not provide a means for informing the client as to 395 which minor version a particular storage device is providing, the 396 client will have to negotiate this with the storage device via the 397 normal Remote Procedure Call (RPC) semantics of major and minor 398 version discovery. For example, as per Section 16.2.3 of [RFC5661], 399 the client could try a COMPOUND with a minorversion of 2 and if it 400 gets NFS4ERR_MINOR_VERS_MISMATCH, drop back to 1. 402 3.3.1. Operations Sent to NFSv4.2 Data Servers 404 In addition to the commands listed in [RFC5661], NFSv4.2 data servers 405 MAY accept a COMPOUND containing the following additional operations: 406 IO_ADVISE (see Section 15.5), READ_PLUS (see Section 15.10), 407 WRITE_SAME (see Section 15.12), and SEEK (see Section 15.11), which 408 will be treated like the subset specified as "Operations Sent to 409 NFSv4.1 Data Servers" in Section 13.6 of [RFC5661]. 411 Additional details on the implementation of these operations in a 412 pNFS context are documented in the operation specific sections. 414 4. Server Side Copy 416 The server-side copy features provide mechanisms which allow an NFS 417 client to copy file data on a server or between two servers without 418 the data being transmitted back and forth over the network through 419 the NFS client. Without these features, an NFS client would copy 420 data from one location to another by reading the data from the source 421 server over the network, and then writing the data back over the 422 network to the destination server. 424 If the source object and destination object are on different file 425 servers, the file servers will communicate with one another to 426 perform the copy operation. The server-to-server protocol by which 427 this is accomplished is not defined in this document. 429 The copy feature allows the server to perform the copying either 430 synchronously or asynchronously. The client can request synchronous 431 copying but the server may not be able to honor this request. If the 432 server intends to perform asynchronous copying, it supplies the 433 client with a request identifier that the client can use to monitor 434 the progress of the copying and, if appropriate, cancel a request in 435 progress. The request identifier is a stateid representing the 436 internal state held by the server while the copying is performed. 437 Multiple asynchronous copies of all or part of a file may be in 438 progress in parallel on a server; the stateid request identifier 439 allows monitoring and canceling to be applied to the correct request. 441 4.1. Protocol Overview 443 The server-side copy offload operations support both intra-server and 444 inter-server file copies. An intra-server copy is a copy in which 445 the source file and destination file reside on the same server. In 446 an inter-server copy, the source file and destination file are on 447 different servers. In both cases, the copy may be performed 448 synchronously or asynchronously. 450 In addition, the CLONE operation provides copy-like functionality in 451 the intra-server case which is both synchronous and atomic, in that 452 other operations may not see the target file in any state between 453 that before the clone operation and after it. 455 Throughout the rest of this document, the NFS server containing the 456 source file is referred to as the "source server" and the NFS server 457 to which the file is transferred as the "destination server". In the 458 case of an intra-server copy, the source server and destination 459 server are the same server. Therefore in the context of an intra- 460 server copy, the terms source server and destination server refer to 461 the single server performing the copy. 463 The new operations are designed to copy files or regions within them. 464 Other file system objects can be copied by building on these 465 operations or using other techniques. For example, if a user wishes 466 to copy a directory, the client can synthesize a directory copy 467 operation by first creating the destination directory and the 468 individual (empty) files within it, and then copying the contents of 469 the source directory's files to files in the new destination 470 directory. 472 For the inter-server copy, the operations are defined to be 473 compatible with the traditional copy authorization approach. The 474 client and user are authorized at the source for reading. Then they 475 are authorized at the destination for writing. 477 4.1.1. Copy Operations 479 CLONE: Used by the client to request an synchronous atomic copy-like 480 operation. (Section 15.13) 482 COPY_NOTIFY: Used by the client to request the source server to 483 authorize a future file copy that will be made by a given 484 destination server on behalf of the given user. (Section 15.3) 486 COPY: Used by the client to request a file copy. (Section 15.2) 488 OFFLOAD_CANCEL: Used by the client to terminate an asynchronous file 489 copy. (Section 15.8) 491 OFFLOAD_STATUS: Used by the client to poll the status of an 492 asynchronous file copy. (Section 15.9) 494 CB_OFFLOAD: Used by the destination server to report the results of 495 an asynchronous file copy to the client. (Section 16.1) 497 4.1.2. Requirements for Operations 499 Three OPTIONAL features are provided relative to server-side copy. A 500 server may choose independently to implement any of them. A server 501 implementing any of these features may be REQUIRED to implement 502 certain operations. Other operations are OPTIONAL in the context of 503 a particular feature (see Table 5 in Section 13), but may become 504 REQUIRED depending on server behavior. Clients need to use these 505 operations to successfully copy a file. 507 For a client to do an intra-server file copy, it needs to use either 508 the COPY or the CLONE operation. If COPY is used the client MUST 509 support the CB_OFFLOAD operation. If COPY is used and it returns a 510 stateid, then the client MAY use the OFFLOAD_CANCEL and 511 OFFLOAD_STATUS operations. 513 For a client to do an inter-server file copy, then it needs to use 514 the COPY and COPY_NOTIFY operations and MUST support the CB_OFFLOAD 515 operation. If COPY returns a stateid, then the client MAY use the 516 OFFLOAD_CANCEL and OFFLOAD_STATUS operations. 518 If a server supports intra-server copy feature, then the server MUST 519 support the COPY operation. If a server's COPY operation returns a 520 stateid, then the server MUST also support these operations: 521 CB_OFFLOAD, OFFLOAD_CANCEL, and OFFLOAD_STATUS. 523 If a server supports the clone feature, then it MUST support the 524 CLONE operations and the clone_blksize attribute on any filesystem on 525 which CLONE is supported (as either source or destination file). 527 If a source server supports inter-server copy feature, then it MUST 528 support the operations COPY_NOTIFY and OFFLOAD_CANCEL. If a 529 destination server supports inter-server copy feature, then it MUST 530 support the COPY operation. If a destination server's COPY operation 531 returns a stateid, then the destination server MUST also support 532 these operations: CB_OFFLOAD, OFFLOAD_CANCEL, COPY_NOTIFY, and 533 OFFLOAD_STATUS. 535 Each operation is performed in the context of the user identified by 536 the Open Network Computing (ONC) RPC credential of its containing 537 COMPOUND or CB_COMPOUND request. For example, an OFFLOAD_CANCEL 538 operation issued by a given user indicates that a specified COPY 539 operation initiated by the same user be canceled. Therefore an 540 OFFLOAD_CANCEL MUST NOT interfere with a copy of the same file 541 initiated by another user. 543 An NFS server MAY allow an administrative user to monitor or cancel 544 copy operations using an implementation specific interface. 546 4.2. Requirements for Inter-Server Copy 548 The specification of inter-server copy is driven by several 549 requirements: 551 o The specification MUST NOT mandate the server-to-server protocol. 553 o The specification MUST provide guidance for using NFSv4.x as a 554 copy protocol. For those source and destination servers willing 555 to use NFSv4.x, there are specific security considerations that 556 this specification MUST address. 558 o The specification MUST NOT mandate preconfiguration between the 559 source and destination server. Requiring that the source and 560 destination first have a "copying relationship" increases the 561 administrative burden. However the specification MUST NOT 562 preclude implementations that require preconfiguration. 564 o The specification MUST NOT mandate a trust relationship between 565 the source and destination server. The NFSv4 security model 566 requires mutual authentication between a principal on an NFS 567 client and a principal on an NFS server. This model MUST continue 568 with the introduction of COPY. 570 4.3. Implementation Considerations 572 4.3.1. Locking the Files 574 Both the source and destination file may need to be locked to protect 575 the content during the copy operations. A client can achieve this by 576 a combination of OPEN and LOCK operations. I.e., either share or 577 byte range locks might be desired. 579 Note that when the client establishes a lock stateid on the source, 580 the context of that stateid is for the client and not the 581 destination. As such, there might already be an outstanding stateid, 582 issued to the destination as client of the source, with the same 583 value as that provided for the lock stateid. The source MUST 584 interpret the lock stateid as that of the client, i.e., when the 585 destination presents it in the context of a inter-server copy, it is 586 on behalf of the client. 588 4.3.2. Client Caches 590 In a traditional copy, if the client is in the process of writing to 591 the file before the copy (and perhaps with a write delegation), it 592 will be straightforward to update the destination server. With an 593 inter-server copy, the source has no insight into the changes cached 594 on the client. The client SHOULD write back the data to the source. 595 If it does not do so, it is possible that the destination will 596 receive a corrupt copy of file. 598 4.4. Intra-Server Copy 600 To copy a file on a single server, the client uses a COPY operation. 601 The server may respond to the copy operation with the final results 602 of the copy or it may perform the copy asynchronously and deliver the 603 results using a CB_OFFLOAD operation callback. If the copy is 604 performed asynchronously, the client may poll the status of the copy 605 using OFFLOAD_STATUS or cancel the copy using OFFLOAD_CANCEL. 607 A synchronous intra-server copy is shown in Figure 1. In this 608 example, the NFS server chooses to perform the copy synchronously. 609 The copy operation is completed, either successfully or 610 unsuccessfully, before the server replies to the client's request. 611 The server's reply contains the final result of the operation. 613 Client Server 614 + + 615 | | 616 |--- OPEN ---------------------------->| Client opens 617 |<------------------------------------/| the source file 618 | | 619 |--- OPEN ---------------------------->| Client opens 620 |<------------------------------------/| the destination file 621 | | 622 |--- COPY ---------------------------->| Client requests 623 |<------------------------------------/| a file copy 624 | | 625 |--- CLOSE --------------------------->| Client closes 626 |<------------------------------------/| the destination file 627 | | 628 |--- CLOSE --------------------------->| Client closes 629 |<------------------------------------/| the source file 630 | | 631 | | 633 Figure 1: A synchronous intra-server copy. 635 An asynchronous intra-server copy is shown in Figure 2. In this 636 example, the NFS server performs the copy asynchronously. The 637 server's reply to the copy request indicates that the copy operation 638 was initiated and the final result will be delivered at a later time. 639 The server's reply also contains a copy stateid. The client may use 640 this copy stateid to poll for status information (as shown) or to 641 cancel the copy using an OFFLOAD_CANCEL. When the server completes 642 the copy, the server performs a callback to the client and reports 643 the results. 645 Client Server 646 + + 647 | | 648 |--- OPEN ---------------------------->| Client opens 649 |<------------------------------------/| the source file 650 | | 651 |--- OPEN ---------------------------->| Client opens 652 |<------------------------------------/| the destination file 653 | | 654 |--- COPY ---------------------------->| Client requests 655 |<------------------------------------/| a file copy 656 | | 657 | | 658 |--- OFFLOAD_STATUS ------------------>| Client may poll 659 |<------------------------------------/| for status 660 | | 661 | . | Multiple OFFLOAD_STATUS 662 | . | operations may be sent. 663 | . | 664 | | 665 |<-- CB_OFFLOAD -----------------------| Server reports results 666 |\------------------------------------>| 667 | | 668 |--- CLOSE --------------------------->| Client closes 669 |<------------------------------------/| the destination file 670 | | 671 |--- CLOSE --------------------------->| Client closes 672 |<------------------------------------/| the source file 673 | | 674 | | 676 Figure 2: An asynchronous intra-server copy. 678 4.5. Inter-Server Copy 680 A copy may also be performed between two servers. The copy protocol 681 is designed to accommodate a variety of network topologies. As shown 682 in Figure 3, the client and servers may be connected by multiple 683 networks. In particular, the servers may be connected by a 684 specialized, high speed network (network 192.0.2.0/24 in the diagram) 685 that does not include the client. The protocol allows the client to 686 setup the copy between the servers (over network 203.0.113.0/24 in 687 the diagram) and for the servers to communicate on the high speed 688 network if they choose to do so. 690 192.0.2.0/24 691 +-------------------------------------+ 692 | | 693 | | 694 | 192.0.2.18 | 192.0.2.56 695 +-------+------+ +------+------+ 696 | Source | | Destination | 697 +-------+------+ +------+------+ 698 | 203.0.113.18 | 203.0.113.56 699 | | 700 | | 701 | 203.0.113.0/24 | 702 +------------------+------------------+ 703 | 704 | 705 | 203.0.113.243 706 +-----+-----+ 707 | Client | 708 +-----------+ 710 Figure 3: An example inter-server network topology. 712 For an inter-server copy, the client notifies the source server that 713 a file will be copied by the destination server using a COPY_NOTIFY 714 operation. The client then initiates the copy by sending the COPY 715 operation to the destination server. The destination server may 716 perform the copy synchronously or asynchronously. 718 A synchronous inter-server copy is shown in Figure 4. In this case, 719 the destination server chooses to perform the copy before responding 720 to the client's COPY request. 722 An asynchronous copy is shown in Figure 5. In this case, the 723 destination server chooses to respond to the client's COPY request 724 immediately and then perform the copy asynchronously. 726 Client Source Destination 727 + + + 728 | | | 729 |--- OPEN --->| | Returns 730 |<------------------/| | open state os1 731 | | | 732 |--- COPY_NOTIFY --->| | 733 |<------------------/| | 734 | | | 735 |--- OPEN ---------------------------->| Returns 736 |<------------------------------------/| open state os2 737 | | | 738 |--- COPY ---------------------------->| 739 | | | 740 | | | 741 | |<----- read -----| 742 | |\--------------->| 743 | | | 744 | | . | Multiple reads may 745 | | . | be necessary 746 | | . | 747 | | | 748 | | | 749 |<------------------------------------/| Destination replies 750 | | | to COPY 751 | | | 752 |--- CLOSE --------------------------->| Release os2 753 |<------------------------------------/| 754 | | | 755 |--- CLOSE --->| | Release os1 756 |<------------------/| | 758 Figure 4: A synchronous inter-server copy. 760 Client Source Destination 761 + + + 762 | | | 763 |--- OPEN --->| | Returns 764 |<------------------/| | open state os1 765 | | | 766 |--- LOCK --->| | Optional, could be done 767 |<------------------/| | with a share lock 768 | | | 769 |--- COPY_NOTIFY --->| | Need to pass in 770 |<------------------/| | os1 or lock state 771 | | | 772 | | | 773 | | | 774 |--- OPEN ---------------------------->| Returns 775 |<------------------------------------/| open state os2 776 | | | 777 |--- LOCK ---------------------------->| Optional ... 778 |<------------------------------------/| 779 | | | 780 |--- COPY ---------------------------->| Need to pass in 781 |<------------------------------------/| os2 or lock state 782 | | | 783 | | | 784 | |<----- read -----| 785 | |\--------------->| 786 | | | 787 | | . | Multiple reads may 788 | | . | be necessary 789 | | . | 790 | | | 791 | | | 792 |--- OFFLOAD_STATUS ------------------>| Client may poll 793 |<------------------------------------/| for status 794 | | | 795 | | . | Multiple OFFLOAD_STATUS 796 | | . | operations may be sent 797 | | . | 798 | | | 799 | | | 800 | | | 801 |<-- CB_OFFLOAD -----------------------| Destination reports 802 |\------------------------------------>| results 803 | | | 804 |--- LOCKU --------------------------->| Only if LOCK was done 805 |<------------------------------------/| 806 | | | 807 |--- CLOSE --------------------------->| Release os2 808 |<------------------------------------/| 809 | | | 810 |--- LOCKU --->| | Only if LOCK was done 811 |<------------------/| | 812 | | | 813 |--- CLOSE --->| | Release os1 814 |<------------------/| | 815 | | | 817 Figure 5: An asynchronous inter-server copy. 819 4.6. Server-to-Server Copy Protocol 821 The choice of what protocol to use in an inter-server copy is 822 ultimately the destination server's decision. However, the 823 destination server has to be cognizant that it is working on behalf 824 of the client. 826 4.6.1. Considerations on Selecting a Copy Protocol 828 The client can have requirements over both the size of transactions 829 and error recovery semantics. It may want to split the copy up such 830 that each chunk is synchronously transferred. It may want the copy 831 protocol to copy the bytes in consecutive order such that upon an 832 error, the client can restart the copy at the last known good offset. 833 If the destination server cannot meet these requirements, the client 834 may prefer the traditional copy mechanism such that it can meet those 835 requirements. 837 4.6.2. Using NFSv4.x as the Copy Protocol 839 The destination server MAY use standard NFSv4.x (where x >= 1) 840 operations to read the data from the source server. If NFSv4.x is 841 used for the server-to-server copy protocol, the destination server 842 can use the source filehandle and ca_src_stateid provided in the COPY 843 request with standard NFSv4.x operations to read data from the source 844 server. Note that the ca_src_stateid MUST be the cnr_stateid 845 returned from the source via the COPY_NOTIFY (Section 15.3). 847 4.6.3. Using an Alternative Copy Protocol 849 In a homogeneous environment, the source and destination servers 850 might be able to perform the file copy extremely efficiently using 851 specialized protocols. For example the source and destination 852 servers might be two nodes sharing a common file system format for 853 the source and destination file systems. Thus the source and 854 destination are in an ideal position to efficiently render the image 855 of the source file to the destination file by replicating the file 856 system formats at the block level. Another possibility is that the 857 source and destination might be two nodes sharing a common storage 858 area network, and thus there is no need to copy any data at all, and 859 instead ownership of the file and its contents might simply be re- 860 assigned to the destination. To allow for these possibilities, the 861 destination server is allowed to use a server-to-server copy protocol 862 of its choice. 864 In a heterogeneous environment, using a protocol other than NFSv4.x 865 (e.g., HTTP [RFC7230] or FTP [RFC959]) presents some challenges. In 866 particular, the destination server is presented with the challenge of 867 accessing the source file given only an NFSv4.x filehandle. 869 One option for protocols that identify source files with path names 870 is to use an ASCII hexadecimal representation of the source 871 filehandle as the file name. 873 Another option for the source server is to use URLs to direct the 874 destination server to a specialized service. For example, the 875 response to COPY_NOTIFY could include the URL ftp:// 876 s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII 877 hexadecimal representation of the source filehandle. When the 878 destination server receives the source server's URL, it would use 879 "_FH/0x12345" as the file name to pass to the FTP server listening on 880 port 9999 of s1.example.com. On port 9999 there would be a special 881 instance of the FTP service that understands how to convert NFS 882 filehandles to an open file descriptor (in many operating systems, 883 this would require a new system call, one which is the inverse of the 884 makefh() function that the pre-NFSv4 MOUNT service needs). 886 Authenticating and identifying the destination server to the source 887 server is also a challenge. Recommendations for how to accomplish 888 this are given in Section 4.9.1.3. 890 4.7. netloc4 - Network Locations 892 The server-side copy operations specify network locations using the 893 netloc4 data type shown below: 895 897 enum netloc_type4 { 898 NL4_NAME = 1, 899 NL4_URL = 2, 900 NL4_NETADDR = 3 901 }; 902 union netloc4 switch (netloc_type4 nl_type) { 903 case NL4_NAME: utf8str_cis nl_name; 904 case NL4_URL: utf8str_cis nl_url; 905 case NL4_NETADDR: netaddr4 nl_addr; 906 }; 908 910 If the netloc4 is of type NL4_NAME, the nl_name field MUST be 911 specified as a UTF-8 string. The nl_name is expected to be resolved 912 to a network address via DNS, Lightweight Directory Access Protocol 913 (LDAP), Network Information Service (NIS), /etc/hosts, or some other 914 means. If the netloc4 is of type NL4_URL, a server URL [RFC3986] 915 appropriate for the server-to-server copy operation is specified as a 916 UTF-8 string. If the netloc4 is of type NL4_NETADDR, the nl_addr 917 field MUST contain a valid netaddr4 as defined in Section 3.3.9 of 918 [RFC5661]. 920 When netloc4 values are used for an inter-server copy as shown in 921 Figure 3, their values may be evaluated on the source server, 922 destination server, and client. The network environment in which 923 these systems operate should be configured so that the netloc4 values 924 are interpreted as intended on each system. 926 4.8. Copy Offload Stateids 928 A server may perform a copy offload operation asynchronously. An 929 asynchronous copy is tracked using a copy offload stateid. Copy 930 offload stateids are included in the COPY, OFFLOAD_CANCEL, 931 OFFLOAD_STATUS, and CB_OFFLOAD operations. 933 A copy offload stateid will be valid until either (A) the client or 934 server restarts or (B) the client returns the resource by issuing a 935 OFFLOAD_CANCEL operation or the client replies to a CB_OFFLOAD 936 operation. 938 A copy offload stateid's seqid MUST NOT be zero. In the context of a 939 copy offload operation, it is inappropriate to indicate "the most 940 recent copy offload operation" using a stateid with seqid of zero 941 (see Section 8.2.2 of [RFC5661]). It is inappropriate because the 942 stateid refers to internal state in the server and there may be 943 several asynchronous copy operations being performed in parallel on 944 the same file by the server. Therefore a copy offload stateid with 945 seqid of zero MUST be considered invalid. 947 4.9. Security Considerations 949 The security considerations pertaining to NFSv4.1 [RFC5661] apply to 950 this section. And as such, the standard security mechanisms used by 951 the protocol can be used to secure the server-to-server operations. 953 NFSv4 clients and servers supporting the inter-server copy operations 954 described in this chapter are REQUIRED to implement the mechanism 955 described in Section 4.9.1.1, and to support rejecting COPY_NOTIFY 956 requests that do not use RPCSEC_GSS with privacy. If the server-to- 957 server copy protocol is ONC RPC based, the servers are also REQUIRED 958 to implement [I-D.ietf-nfsv4-rpcsec-gssv3] including the RPCSEC_GSSv3 959 copy_to_auth, copy_from_auth, and copy_confirm_auth structured 960 privileges. This requirement to implement is not a requirement to 961 use; for example, a server may depending on configuration also allow 962 COPY_NOTIFY requests that use only AUTH_SYS. 964 If a server requires the use of an RPCSEC_GSSv3 copy_to_auth, 965 copy_from_auth, or copy_confirm_auth privilege and it is not used, 966 the server will reject the request with NFS4ERR_PARTNER_NO_AUTH. 968 4.9.1. Inter-Server Copy Security 970 4.9.1.1. Inter-Server Copy via ONC RPC with RPCSEC_GSSv3 972 When the client sends a COPY_NOTIFY to the source server to expect 973 the destination to attempt to copy data from the source server, it is 974 expected that this copy is being done on behalf of the principal 975 (called the "user principal") that sent the RPC request that encloses 976 the COMPOUND procedure that contains the COPY_NOTIFY operation. The 977 user principal is identified by the RPC credentials. A mechanism 978 that allows the user principal to authorize the destination server to 979 perform the copy, that lets the source server properly authenticate 980 the destination's copy, and does not allow the destination server to 981 exceed this authorization, is necessary. 983 An approach that sends delegated credentials of the client's user 984 principal to the destination server is not used for the following 985 reason. If the client's user delegated its credentials, the 986 destination would authenticate as the user principal. If the 987 destination were using the NFSv4 protocol to perform the copy, then 988 the source server would authenticate the destination server as the 989 user principal, and the file copy would securely proceed. However, 990 this approach would allow the destination server to copy other files. 991 The user principal would have to trust the destination server to not 992 do so. This is counter to the requirements, and therefore is not 993 considered. 995 Instead, a feature of the RPCSEC_GSSv3 [I-D.ietf-nfsv4-rpcsec-gssv3] 996 protocol can be used: RPC application defined structured privilege 997 assertion. This feature allows the destination server to 998 authenticate to the source server as acting on behalf of the user 999 principal, and to authorize the destination server to perform READs 1000 of the file to be copied from the source on behalf of the user 1001 principal. Once the copy is complete, the client can destroy the 1002 RPCSEC_GSSv3 handles to end the authorization of both the source and 1003 destination servers to copy. 1005 For each structured privilege assertion defined by a RPC application 1006 RPCSEC_GSSv3 requires the application to define a name string and a 1007 data structure that will be encoded and passed between client and 1008 server as opaque data. For NFSv4 the data structures specified below 1009 MUST be serialized using XDR. 1011 Three RPCSEC_GSSv3 structured privilege assertions that work together 1012 to authorize the copy are defined here. For each of the assertions 1013 the description starts with the name string passed in the rp_name 1014 field of the rgss3_privs structure defined in Section 2.7.1.4 of 1015 [I-D.ietf-nfsv4-rpcsec-gssv3] and specifies the XDR encoding of the 1016 associated structured data passed via the rp_privilege field of the 1017 structure. 1019 copy_from_auth: A user principal is authorizing a source principal 1020 ("nfs@") to allow a destination principal 1021 ("nfs@") to setup the copy_confirm_auth privilege 1022 required to copy a file from the source to the destination on 1023 behalf of the user principal. This privilege is established on 1024 the source server before the user principal sends a COPY_NOTIFY 1025 operation to the source server, and the resultant RPCSEC_GSSv3 1026 context is used to secure the COPY_NOTIFY operation. 1028 1030 struct copy_from_auth_priv { 1031 secret4 cfap_shared_secret; 1032 netloc4 cfap_destination; 1033 /* the NFSv4 user name that the user principal maps to */ 1034 utf8str_mixed cfap_username; 1035 }; 1037 1039 cfap_shared_secret is an automatically generated random number 1040 secret value. 1042 copy_to_auth: A user principal is authorizing a destination 1043 principal ("nfs@") to setup a copy_confirm_auth 1044 privilege with a source principal ("nfs@") to allow it to 1045 copy a file from the source to the destination on behalf of the 1046 user principal. This privilege is established on the destination 1047 server before the user principal sends a COPY operation to the 1048 destination server, and the resultant RPCSEC_GSSv3 context is used 1049 to secure the COPY operation. 1051 1053 struct copy_to_auth_priv { 1054 /* equal to cfap_shared_secret */ 1055 secret4 ctap_shared_secret; 1056 netloc4 ctap_source<>; 1057 /* the NFSv4 user name that the user principal maps to */ 1058 utf8str_mixed ctap_username; 1059 }; 1061 1063 ctap_shared_secret is the automatically generated secret value 1064 used to establish the copy_from_auth privilege with the source 1065 principal. See Section 4.9.1.1.1. 1067 copy_confirm_auth: A destination principal ("nfs@") is 1068 confirming with the source principal ("nfs@") that it is 1069 authorized to copy data from the source. This privilege is 1070 established on the destination server before the file is copied 1071 from the source to the destination. The resultant RPCSEC_GSSv3 1072 context is used to secure the READ operations from the source to 1073 the destination server. 1075 1077 struct copy_confirm_auth_priv { 1078 /* equal to GSS_GetMIC() of cfap_shared_secret */ 1079 opaque ccap_shared_secret_mic<>; 1080 /* the NFSv4 user name that the user principal maps to */ 1081 utf8str_mixed ccap_username; 1082 }; 1084 1086 4.9.1.1.1. Establishing a Security Context 1088 When the user principal wants to COPY a file between two servers, if 1089 it has not established copy_from_auth and copy_to_auth privileges on 1090 the servers, it establishes them: 1092 o As noted in [I-D.ietf-nfsv4-rpcsec-gssv3] the client uses an 1093 existing RPCSEC_GSSv3 context termed the "parent" handle to 1094 establish and protect RPCSEC_GSSv3 structured privilege assertion 1095 exchanges. The copy_from_auth privilege will use the context 1096 established between the user principal and the source server used 1097 to OPEN the source file as the RPCSEC_GSSv3 parent handle. The 1098 copy_to_auth privilege will use the context established between 1099 the user principal and the destination server used to OPEN the 1100 destination file as the RPCSEC_GSSv3 parent handle. 1102 o A random number is generated to use as a secret to be shared 1103 between the two servers. Note that the random number SHOULD not 1104 be reused between establishing different security contexts. The 1105 resulting shared secret will be placed in the cfap_shared_secret 1106 and ctap_shared_secret fields of the appropriate privilege data 1107 types, copy_from_auth_priv and copy_to_auth_priv. Because of this 1108 shared_secret the RPCSEC_GSS3_CREATE control messages for 1109 copy_from_auth and copy_to_auth MUST use a Quality of Protection 1110 (QOP) of rpc_gss_svc_privacy. 1112 o An instance of copy_from_auth_priv is filled in with the shared 1113 secret, the destination server, and the NFSv4 user id of the user 1114 principal and is placed in rpc_gss3_create_args 1115 assertions[0].privs.privilege. The string "copy_from_auth" is 1116 placed in assertions[0].privs.name. The source server unwraps the 1117 rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies that 1118 the NFSv4 user id being asserted matches the source server's 1119 mapping of the user principal. If it does, the privilege is 1120 established on the source server as: <"copy_from_auth", user id, 1121 destination>. The field "handle" in a successful reply is the 1122 RPCSEC_GSSv3 copy_from_auth "child" handle that the client will 1123 use on COPY_NOTIFY requests to the source server. 1125 o An instance of copy_to_auth_priv is filled in with the shared 1126 secret, the cnr_source_server list returned by COPY_NOTIFY, and 1127 the NFSv4 user id of the user principal. The copy_to_auth_priv 1128 instance is placed in rpc_gss3_create_args 1129 assertions[0].privs.privilege. The string "copy_to_auth" is 1130 placed in assertions[0].privs.name. The destination server 1131 unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and 1132 verifies that the NFSv4 user id being asserted matches the 1133 destination server's mapping of the user principal. If it does, 1134 the privilege is established on the destination server as: 1135 <"copy_to_auth", user id, source list>. The field "handle" in a 1136 successful reply is the RPCSEC_GSSv3 copy_to_auth "child" handle 1137 that the client will use on COPY requests to the destination 1138 server involving the source server. 1140 As noted in [I-D.ietf-nfsv4-rpcsec-gssv3] Section 2.3.1 "Create 1141 Request", both the client and the source server should associate the 1142 RPCSEC_GSSv3 "child" handle with the parent RPCSEC_GSSv3 handle used 1143 to create the RPCSEC_GSSv3 child handle. 1145 4.9.1.1.2. Starting a Secure Inter-Server Copy 1147 When the client sends a COPY_NOTIFY request to the source server, it 1148 uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle. 1149 cna_destination_server in COPY_NOTIFY MUST be the same as 1150 cfap_destination specified in copy_from_auth_priv. Otherwise, 1151 COPY_NOTIFY will fail with NFS4ERR_ACCESS. The source server 1152 verifies that the privilege <"copy_from_auth", user id, destination> 1153 exists, and annotates it with the source filehandle, if the user 1154 principal has read access to the source file, and if administrative 1155 policies give the user principal and the NFS client read access to 1156 the source file (i.e., if the ACCESS operation would grant read 1157 access). Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS. 1159 When the client sends a COPY request to the destination server, it 1160 uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle. 1161 ca_source_server list in COPY MUST be the same as ctap_source list 1162 specified in copy_to_auth_priv. Otherwise, COPY will fail with 1163 NFS4ERR_ACCESS. The destination server verifies that the privilege 1164 <"copy_to_auth", user id, source list> exists, and annotates it with 1165 the source and destination filehandles. If the COPY returns a 1166 wr_callback_id, then this is an asynchronous copy and the 1167 wr_callback_id must also must be annotated to the copy_to_auth 1168 privilege. If the client has failed to establish the "copy_to_auth" 1169 privilege it will reject the request with NFS4ERR_PARTNER_NO_AUTH. 1171 If either the COPY_NOTIFY, or the COPY operations fail, the 1172 associated "copy_from_auth" and "copy_to_auth" RPCSEC_GSSv3 handles 1173 MUST be destroyed. 1175 4.9.1.1.3. Securing ONC RPC Server-to-Server Copy Protocols 1177 After a destination server has a "copy_to_auth" privilege established 1178 on it, and it receives a COPY request, if it knows it will use an ONC 1179 RPC protocol to copy data, it will establish a "copy_confirm_auth" 1180 privilege on the source server prior to responding to the COPY 1181 operation as follows: 1183 o Before establishing an RPCSEC_GSSv3 context, a parent context 1184 needs to exist between nfs@ as the initiator 1185 principal, and nfs@ as the target principal. If NFS is to 1186 be used as the copy protocol, this means that the destination 1187 server must mount the source server using RPCSEC_GSSv3. 1189 o An instance of copy_confirm_auth_priv is filled in with 1190 information from the established "copy_to_auth" privilege. The 1191 value of the field ccap_shared_secret_mic is a GSS_GetMIC() of the 1192 ctap_shared_secret in the copy_to_auth privilege using the parent 1193 handle context. The field ccap_username is the mapping of the 1194 user principal to an NFSv4 user name ("user"@"domain" form), and 1195 MUST be the same as the ctap_username in the copy_to_auth 1196 privilege. The copy_confirm_auth_priv instance is placed in 1197 rpc_gss3_create_args assertions[0].privs.privilege. The string 1198 "copy_confirm_auth" is placed in assertions[0].privs.name. 1200 o The RPCSEC_GSS3_CREATE copy_from_auth message is sent to the 1201 source server with a QOP of rpc_gss_svc_privacy. The source 1202 server unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload 1203 and verifies the cap_shared_secret_mic by calling GSS_VerifyMIC() 1204 using the parent context on the cfap_shared_secret from the 1205 established "copy_from_auth" privilege, and verifies the that the 1206 ccap_username equals the cfap_username. 1208 o If all verification succeeds, the "copy_confirm_auth" privilege is 1209 established on the source server as < "copy_confirm_auth", 1210 shared_secret_mic, user id> Because the shared secret has been 1211 verified, the resultant copy_confirm_auth RPCSEC_GSSv3 child 1212 handle is noted to be acting on behalf of the user principal. 1214 o If the source server fails to verify the copy_from_auth privilege 1215 the COPY_NOTIFY operation will be rejected with 1216 NFS4ERR_PARTNER_NO_AUTH. 1218 o If the destination server fails to verify the copy_to_auth or 1219 copy_confirm_auth privilege, the COPY will be rejected with 1220 NFS4ERR_PARTNER_NO_AUTH, causing the client to destroy the 1221 associated copy_from_auth and copy_to_auth RPCSEC_GSSv3 structured 1222 privilege assertion handles. 1224 o All subsequent ONC RPC READ requests sent from the destination to 1225 copy data from the source to the destination will use the 1226 RPCSEC_GSSv3 copy_confirm_auth child handle. 1228 Note that the use of the "copy_confirm_auth" privilege accomplishes 1229 the following: 1231 o If a protocol like NFS is being used, with export policies, export 1232 policies can be overridden in case the destination server as-an- 1233 NFS-client is not authorized 1235 o Manual configuration to allow a copy relationship between the 1236 source and destination is not needed. 1238 4.9.1.1.4. Maintaining a Secure Inter-Server Copy 1240 If the client determines that either the copy_from_auth or the 1241 copy_to_auth handle becomes invalid during a copy, then the copy MUST 1242 be aborted by the client sending an OFFLOAD_CANCEL to both the source 1243 and destination servers and destroying the respective copy related 1244 context handles as described in Section 4.9.1.1.5. 1246 4.9.1.1.5. Finishing or Stopping a Secure Inter-Server Copy 1248 Under normal operation, the client MUST destroy the copy_from_auth 1249 and the copy_to_auth RPCSEC_GSSv3 handle once the COPY operation 1250 returns for a synchronous inter-server copy or a CB_OFFLOAD reports 1251 the result of an asynchronous copy. 1253 The copy_confirm_auth privilege constructed from information held by 1254 the copy_to_auth privilege, and MUST be destroyed by the destination 1255 server (via an RPCSEC_GSS3_DESTROY call) when the copy_to_auth 1256 RPCSEC_GSSv3 handle is destroyed. 1258 The copy_confirm_auth RPCSEC_GSS3 handle is associated with a 1259 copy_from_auth RPCSEC_GSS3 handle on the source server via the shared 1260 secret and MUST be locally destroyed (there is no RPCSEC_GSS3_DESTROY 1261 as the source server is not the initiator) when the copy_from_auth 1262 RPCSEC_GSSv3 handle is destroyed. 1264 If the client sends an OFFLOAD_CANCEL to the source server to rescind 1265 the destination server's synchronous copy privilege, it uses the 1266 privileged "copy_from_auth" RPCSEC_GSSv3 handle and the 1267 cra_destination_server in OFFLOAD_CANCEL MUST be the same as the name 1268 of the destination server specified in copy_from_auth_priv. The 1269 source server will then delete the <"copy_from_auth", user id, 1270 destination> privilege and fail any subsequent copy requests sent 1271 under the auspices of this privilege from the destination server. 1272 The client MUST destroy both the "copy_from_auth" and the 1273 "copy_to_auth" RPCSEC_GSSv3 handles. 1275 If the client sends an OFFLOAD_STATUS to the destination server to 1276 check on the status of an asynchronous copy, it uses the privileged 1277 "copy_to_auth" RPCSEC_GSSv3 handle and the osa_stateid in 1278 OFFLOAD_STATUS MUST be the same as the wr_callback_id specified in 1279 the "copy_to_auth" privilege stored on the destination server. 1281 If the client sends an OFFLOAD_CANCEL to the destination server to 1282 cancel an asynchronous copy, it uses the privileged "copy_to_auth" 1283 RPCSEC_GSSv3 handle and the oaa_stateid in OFFLOAD_CANCEL MUST be the 1284 same as the wr_callback_id specified in the "copy_to_auth" privilege 1285 stored on the destination server. The destination server will then 1286 delete the <"copy_to_auth", user id, source list, nounce, nounce MIC, 1287 context handle, handle version> privilege and the associated 1288 "copy_confirm_auth" RPCSEC_GSSv3 handle. The client MUST destroy 1289 both the copy_to_auth and copy_from_auth RPCSEC_GSSv3 handles. 1291 4.9.1.2. Inter-Server Copy via ONC RPC without RPCSEC_GSS 1293 ONC RPC security flavors other than RPCSEC_GSS MAY be used with the 1294 server-side copy offload operations described in this chapter. In 1295 particular, host-based ONC RPC security flavors such as AUTH_NONE and 1296 AUTH_SYS MAY be used. If a host-based security flavor is used, a 1297 minimal level of protection for the server-to-server copy protocol is 1298 possible. 1300 In the absence of a strong security mechanism designed for the 1301 purpose, the challenge is how the source server and destination 1302 server identify themselves to each other, especially in the presence 1303 of multi-homed source and destination servers. In a multi-homed 1304 environment, the destination server might not contact the source 1305 server from the same network address specified by the client in the 1306 COPY_NOTIFY. The cnr_stateid returned from the COPY_NOTIFY can be 1307 used to uniquely identify the destination server to the source 1308 server. The use of cnr_stateid provides initial authentication of 1309 the destination server, but cannot defend against man-in-the-middle 1310 attacks after authentication or an eavesdropper that observes the 1311 opaque stateid on the wire. Other secure communication techniques 1312 (e.g., IPsec) are necessary to block these attacks. 1314 Servers SHOULD reject COPY_NOTIFY requests that do not use RPCSEC_GSS 1315 with privacy, thus ensuring the cnr_stateid in the COPY_NOTIFY reply 1316 is encrypted. For the same reason, clients SHOULD send COPY requests 1317 to the destination using RPCSEC_GSS with privacy. 1319 4.9.1.3. Inter-Server Copy without ONC RPC 1321 The same techniques as Section 4.9.1.2, using unique URLs for each 1322 destination server, can be used for other protocols (e.g., HTTP 1323 [RFC7230] and FTP [RFC959]) as well. 1325 5. Support for Application I/O Hints 1327 Applications can issue client I/O hints via posix_fadvise() 1328 [posix_fadvise] to the NFS client. While this can help the NFS 1329 client optimize I/O and caching for a file, it does not allow the NFS 1330 server and its exported file system to do likewise. The IO_ADVISE 1331 procedure (Section 15.5) is used to communicate the client file 1332 access patterns to the NFS server. The NFS server upon receiving a 1333 IO_ADVISE operation MAY choose to alter its I/O and caching behavior, 1334 but is under no obligation to do so. 1336 Application specific NFS clients such as those used by hypervisors 1337 and databases can also leverage application hints to communicate 1338 their specialized requirements. 1340 6. Sparse Files 1342 A sparse file is a common way of representing a large file without 1343 having to utilize all of the disk space for it. Consequently, a 1344 sparse file uses less physical space than its size indicates. This 1345 means the file contains 'holes', byte ranges within the file that 1346 contain no data. Most modern file systems support sparse files, 1347 including most UNIX file systems and NTFS, but notably not Apple's 1348 HFS+. Common examples of sparse files include Virtual Machine (VM) 1349 OS/disk images, database files, log files, and even checkpoint 1350 recovery files most commonly used by the HPC community. 1352 In addition many modern file systems support the concept of 1353 'unwritten' or 'uninitialized' blocks, which have uninitialized space 1354 allocated to them on disk, but will return zeros until data is 1355 written to them. Such functionality is already present in the data 1356 model of the pNFS Block/Volume Layout (see [RFC5663]). Uninitialized 1357 blocks can be thought of as holes inside a space reservation window. 1359 If an application reads a hole in a sparse file, the file system must 1360 return all zeros to the application. For local data access there is 1361 little penalty, but with NFS these zeroes must be transferred back to 1362 the client. If an application uses the NFS client to read data into 1363 memory, this wastes time and bandwidth as the application waits for 1364 the zeroes to be transferred. 1366 A sparse file is typically created by initializing the file to be all 1367 zeros - nothing is written to the data in the file, instead the hole 1368 is recorded in the metadata for the file. So a 8G disk image might 1369 be represented initially by a few hundred bits in the metadata (on 1370 UNIX file systems, the inode) and nothing on the disk. If the VM 1371 then writes 100M to a file in the middle of the image, there would 1372 now be two holes represented in the metadata and 100M in the data. 1374 No new operation is needed to allow the creation of a sparsely 1375 populated file, when a file is created and a write occurs past the 1376 current size of the file, the non-allocated region will either be a 1377 hole or filled with zeros. The choice of behavior is dictated by the 1378 underlying file system and is transparent to the application. What 1379 is needed are the abilities to read sparse files and to punch holes 1380 to reinitialize the contents of a file. 1382 Two new operations DEALLOCATE (Section 15.4) and READ_PLUS 1383 (Section 15.10) are introduced. DEALLOCATE allows for the hole 1384 punching, where an application might want to reset the allocation and 1385 reservation status of a range of the file. READ_PLUS supports all 1386 the features of READ but includes an extension to support sparse 1387 files. READ_PLUS is guaranteed to perform no worse than READ, and 1388 can dramatically improve performance with sparse files. READ_PLUS 1389 does not depend on pNFS protocol features, but can be used by pNFS to 1390 support sparse files. 1392 6.1. Terminology 1394 Regular file: An object of file type NF4REG or NF4NAMEDATTR. 1396 Sparse file: A Regular file that contains one or more holes. 1398 Hole: A byte range within a Sparse file that contains all zeroes. A 1399 hole might or might not have space allocated or reserved to it. 1401 6.2. New Operations 1403 6.2.1. READ_PLUS 1405 READ_PLUS is a new variant of the NFSv4.1 READ operation [RFC5661]. 1406 Besides being able to support all of the data semantics of the READ 1407 operation, it can also be used by the client and server to 1408 efficiently transfer holes. Because the client does not know in 1409 advance whether a hole is present or not, if the client supports 1410 READ_PLUS and so does the server, then it should always use the 1411 READ_PLUS operation in preference to the READ operation. 1413 READ_PLUS extends the response with a new arm representing holes to 1414 avoid returning data for portions of the file which are initialized 1415 to zero and may or may not contain a backing store. Returning actual 1416 data blocks corresponding to holes wastes computational and network 1417 resources, thus reducing performance. 1419 When a client sends a READ operation, it is not prepared to accept a 1420 READ_PLUS-style response providing a compact encoding of the scope of 1421 holes. If a READ occurs on a sparse file, then the server must 1422 expand such data to be raw bytes. If a READ occurs in the middle of 1423 a hole, the server can only send back bytes starting from that 1424 offset. By contrast, if a READ_PLUS occurs in the middle of a hole, 1425 the server can send back a range which starts before the offset and 1426 extends past the requested length. 1428 6.2.2. DEALLOCATE 1430 The client can use the DEALLOCATE operation on a range of a file as a 1431 hole punch, which allows the client to avoid the transfer of a 1432 repetitive pattern of zeros across the network. This hole punch is a 1433 result of the unreserved space returning all zeros until overwritten. 1435 7. Space Reservation 1437 Applications want to be able to reserve space for a file, report the 1438 amount of actual disk space a file occupies, and free-up the backing 1439 space of a file when it is not required. 1441 One example is the posix_fallocate operation ([posix_fallocate]) 1442 which allows applications to ask for space reservations from the 1443 operating system, usually to provide a better file layout and reduce 1444 overhead for random or slow growing file appending workloads. 1446 Another example is space reservation for virtual disks in a 1447 hypervisor. In virtualized environments, virtual disk files are 1448 often stored on NFS mounted volumes. When a hypervisor creates a 1449 virtual disk file, it often tries to preallocate the space for the 1450 file so that there are no future allocation related errors during the 1451 operation of the virtual machine. Such errors prevent a virtual 1452 machine from continuing execution and result in downtime. 1454 Currently, in order to achieve such a guarantee, applications zero 1455 the entire file. The initial zeroing allocates the backing blocks 1456 and all subsequent writes are overwrites of already allocated blocks. 1457 This approach is not only inefficient in terms of the amount of I/O 1458 done, it is also not guaranteed to work on file systems that are log 1459 structured or deduplicated. An efficient way of guaranteeing space 1460 reservation would be beneficial to such applications. 1462 The new ALLOCATE operation (see Section 15.1) allows a client to 1463 request a guarantee that space will be available. The ALLOCATE 1464 operation guarantees that any future writes to the region it was 1465 successfully called for will not fail with NFS4ERR_NOSPC. 1467 Another useful feature is the ability to report the number of blocks 1468 that would be freed when a file is deleted. Currently, NFS reports 1469 two size attributes: 1471 size The logical file size of the file. 1473 space_used The size in bytes that the file occupies on disk 1475 While these attributes are sufficient for space accounting in 1476 traditional file systems, they prove to be inadequate in modern file 1477 systems that support block sharing. In such file systems, multiple 1478 inodes (the metadata portion of the file system object) can point to 1479 a single block with a block reference count to guard against 1480 premature freeing. Having a way to tell the number of blocks that 1481 would be freed if the file was deleted would be useful to 1482 applications that wish to migrate files when a volume is low on 1483 space. 1485 Since virtual disks represent a hard drive in a virtual machine, a 1486 virtual disk can be viewed as a file system within a file. Since not 1487 all blocks within a file system are in use, there is an opportunity 1488 to reclaim blocks that are no longer in use. A call to deallocate 1489 blocks could result in better space efficiency. Lesser space might 1490 be consumed for backups after block deallocation. 1492 The following operations and attributes can be used to resolve these 1493 issues: 1495 space_freed This attribute reports the space that would be freed 1496 when a file is deleted, taking block sharing into consideration. 1498 DEALLOCATE This operation deallocates the blocks backing a region of 1499 the file. 1501 If space_used of a file is interpreted to mean the size in bytes of 1502 all disk blocks pointed to by the inode of the file, then shared 1503 blocks get double counted, over-reporting the space utilization. 1504 This also has the adverse effect that the deletion of a file with 1505 shared blocks frees up less than space_used bytes. 1507 On the other hand, if space_used is interpreted to mean the size in 1508 bytes of those disk blocks unique to the inode of the file, then 1509 shared blocks are not counted in any file, resulting in under- 1510 reporting of the space utilization. 1512 For example, two files A and B have 10 blocks each. Let 6 of these 1513 blocks be shared between them. Thus, the combined space utilized by 1514 the two files is 14 * BLOCK_SIZE bytes. In the former case, the 1515 combined space utilization of the two files would be reported as 20 * 1516 BLOCK_SIZE. However, deleting either would only result in 4 * 1517 BLOCK_SIZE being freed. Conversely, the latter interpretation would 1518 report that the space utilization is only 8 * BLOCK_SIZE. 1520 Adding another size attribute, space_freed (see Section 12.2.2), is 1521 helpful in solving this problem. space_freed is the number of blocks 1522 that are allocated to the given file that would be freed on its 1523 deletion. In the example, both A and B would report space_freed as 4 1524 * BLOCK_SIZE and space_used as 10 * BLOCK_SIZE. If A is deleted, B 1525 will report space_freed as 10 * BLOCK_SIZE as the deletion of B would 1526 result in the deallocation of all 10 blocks. 1528 The addition of these attributes does not solve the problem of space 1529 being over-reported. However, over-reporting is better than under- 1530 reporting. 1532 8. Application Data Block Support 1534 At the OS level, files are contained on disk blocks. Applications 1535 are also free to impose structure on the data contained in a file and 1536 thus can define an Application Data Block (ADB) to be such a 1537 structure. From the application's viewpoint, it only wants to handle 1538 ADBs and not raw bytes (see [Strohm11]). An ADB is typically 1539 comprised of two sections: header and data. The header describes the 1540 characteristics of the block and can provide a means to detect 1541 corruption in the data payload. The data section is typically 1542 initialized to all zeros. 1544 The format of the header is application specific, but there are two 1545 main components typically encountered: 1547 1. An Application Data Block Number (ADBN) which allows the 1548 application to determine which data block is being referenced. 1549 This is useful when the client is not storing the blocks in 1550 contiguous memory, i.e., a logical block number. 1552 2. Fields to describe the state of the ADB and a means to detect 1553 block corruption. For both pieces of data, a useful property 1554 would be that the allowed values are specially selected so that 1555 if passed across the network, corruption due to translation 1556 between big and little endian architectures is detectable. For 1557 example, 0xF0DEDEF0 has the same (32 wide) bit pattern in both 1558 architectures, making it inappropriate. 1560 Applications already impose structures on files [Strohm11] and detect 1561 corruption in data blocks [Ashdown08]. What they are not able to do 1562 is efficiently transfer and store ADBs. To initialize a file with 1563 ADBs, the client must send each full ADB to the server and that must 1564 be stored on the server. 1566 This section defines a framework for transferring the ADB from client 1567 to server and present one approach to detecting corruption in a given 1568 ADB implementation. 1570 8.1. Generic Framework 1572 The representation of the ADB needs to be flexible enough to support 1573 many different applications. The most basic approach is no 1574 imposition of a block at all, which entails working with the raw 1575 bytes. Such an approach would be useful for storing holes, punching 1576 holes, etc. In more complex deployments, a server might be 1577 supporting multiple applications, each with their own definition of 1578 the ADB. One might store the ADBN at the start of the block and then 1579 have a guard pattern to detect corruption [McDougall07]. The next 1580 might store the ADBN at an offset of 100 bytes within the block and 1581 have no guard pattern at all, i.e., existing applications might 1582 already have well defined formats for their data blocks. 1584 The guard pattern can be used to represent the state of the block, to 1585 protect against corruption, or both. Again, it needs to be able to 1586 be placed anywhere within the ADB. 1588 Both the starting offset of the block and the size of the block need 1589 to be represented. Note that nothing prevents the application from 1590 defining different sized blocks in a file. 1592 8.1.1. Data Block Representation 1594 1596 struct app_data_block4 { 1597 offset4 adb_offset; 1598 length4 adb_block_size; 1599 length4 adb_block_count; 1600 length4 adb_reloff_blocknum; 1601 count4 adb_block_num; 1602 length4 adb_reloff_pattern; 1603 opaque adb_pattern<>; 1604 }; 1606 1608 The app_data_block4 structure captures the abstraction presented for 1609 the ADB. The additional fields present are to allow the transmission 1610 of adb_block_count ADBs at one time. The adb_block_num is used to 1611 convey the ADBN of the first block in the sequence. Each ADB will 1612 contain the same adb_pattern string. 1614 As both adb_block_num and adb_pattern are optional, if either 1615 adb_reloff_pattern or adb_reloff_blocknum is set to NFS4_UINT64_MAX, 1616 then the corresponding field is not set in any of the ADB. 1618 8.2. An Example of Detecting Corruption 1620 In this section, an example ADB format is defined in which corruption 1621 can be detected. Note that this is just one possible format and 1622 means to detect corruption. 1624 Consider a very basic implementation of an operating system's disk 1625 blocks. A block is either data or it is an indirect block which 1626 allows for files to be larger than one block. It is desired to be 1627 able to initialize a block. Lastly, to quickly unlink a file, a 1628 block can be marked invalid. The contents remain intact - which 1629 would enable this OS application to undelete a file. 1631 The application defines 4k sized data blocks, with an 8 byte block 1632 counter occurring at offset 0 in the block, and with the guard 1633 pattern occurring at offset 8 inside the block. Furthermore, the 1634 guard pattern can take one of four states: 1636 0xfeedface - This is the FREE state and indicates that the ADB 1637 format has been applied. 1639 0xcafedead - This is the DATA state and indicates that real data 1640 has been written to this block. 1642 0xe4e5c001 - This is the INDIRECT state and indicates that the 1643 block contains block counter numbers that are chained off of this 1644 block. 1646 0xba1ed4a3 - This is the INVALID state and indicates that the block 1647 contains data whose contents are garbage. 1649 Finally, it also defines an 8 byte checksum [Baira08] starting at 1650 byte 16 which applies to the remaining contents of the block. If the 1651 state is FREE, then that checksum is trivially zero. As such, the 1652 application has no need to transfer the checksum implicitly inside 1653 the ADB - it need not make the transfer layer aware of the fact that 1654 there is a checksum (see [Ashdown08] for an example of checksums used 1655 to detect corruption in application data blocks). 1657 Corruption in each ADB can thus be detected: 1659 o If the guard pattern is anything other than one of the allowed 1660 values, including all zeros. 1662 o If the guard pattern is FREE and any other byte in the remainder 1663 of the ADB is anything other than zero. 1665 o If the guard pattern is anything other than FREE, then if the 1666 stored checksum does not match the computed checksum. 1668 o If the guard pattern is INDIRECT and one of the stored indirect 1669 block numbers has a value greater than the number of ADBs in the 1670 file. 1672 o If the guard pattern is INDIRECT and one of the stored indirect 1673 block numbers is a duplicate of another stored indirect block 1674 number. 1676 As can be seen, the application can detect errors based on the 1677 combination of the guard pattern state and the checksum. But also, 1678 the application can detect corruption based on the state and the 1679 contents of the ADB. This last point is important in validating the 1680 minimum amount of data incorporated into the generic framework. 1681 I.e., the guard pattern is sufficient in allowing applications to 1682 design their own corruption detection. 1684 Finally, it is important to note that none of these corruption checks 1685 occur in the transport layer. The server and client components are 1686 totally unaware of the file format and might report everything as 1687 being transferred correctly even in the case the application detects 1688 corruption. 1690 8.3. Example of READ_PLUS 1692 The hypothetical application presented in Section 8.2 can be used to 1693 illustrate how READ_PLUS would return an array of results. A file is 1694 created and initialized with 100 4k ADBs in the FREE state with the 1695 WRITE_SAME operation (see Section 15.12): 1697 WRITE_SAME {0, 4k, 100, 0, 0, 8, 0xfeedface} 1699 Further, assume the application writes a single ADB at 16k, changing 1700 the guard pattern to 0xcafedead, then there would be in memory: 1702 0k -> (4k - 1) : 00 00 00 00 ... fe ed fa ce 00 00 ... 00 1703 4k -> (8k - 1) : 00 00 00 01 ... fe ed fa ce 00 00 ... 00 1704 8k -> (12k - 1) : 00 00 00 02 ... fe ed fa ce 00 00 ... 00 1705 12k -> (16k - 1) : 00 00 00 03 ... fe ed fa ce 00 00 ... 00 1706 16k -> (20k - 1) : 00 00 00 04 ... ca fe de ad 00 00 ... 00 1707 20k -> (24k - 1) : 00 00 00 05 ... fe ed fa ce 00 00 ... 00 1708 24k -> (28k - 1) : 00 00 00 06 ... fe ed fa ce 00 00 ... 00 1709 ... 1710 396k -> (400k - 1) : 00 00 00 63 ... fe ed fa ce 00 00 ... 00 1712 And when the client did a READ_PLUS of 64k at the start of the file, 1713 it could get back a result of data: 1715 0k -> (4k - 1) : 00 00 00 00 ... fe ed fa ce 00 00 ... 00 1716 4k -> (8k - 1) : 00 00 00 01 ... fe ed fa ce 00 00 ... 00 1717 8k -> (12k - 1) : 00 00 00 02 ... fe ed fa ce 00 00 ... 00 1718 12k -> (16k - 1) : 00 00 00 03 ... fe ed fa ce 00 00 ... 00 1719 16k -> (20k - 1) : 00 00 00 04 ... ca fe de ad 00 00 ... 00 1720 20k -> (24k - 1) : 00 00 00 05 ... fe ed fa ce 00 00 ... 00 1721 24k -> (24k - 1) : 00 00 00 06 ... fe ed fa ce 00 00 ... 00 1722 ... 1723 62k -> (64k - 1) : 00 00 00 15 ... fe ed fa ce 00 00 ... 00 1725 8.4. An Example of Zeroing Space 1727 A simpler use case for WRITE_SAME are applications that want to 1728 efficiently zero out a file, but do not want to modify space 1729 reservations. This can easily be achieved by a call to WRITE_SAME 1730 without a ADB block numbers and pattern, e.g.: 1732 WRITE_SAME {0, 1k, 10000, 0, 0, 0, 0} 1734 9. Labeled NFS 1736 Access control models such as Unix permissions or Access Control 1737 Lists are commonly referred to as Discretionary Access Control (DAC) 1738 models. These systems base their access decisions on user identity 1739 and resource ownership. In contrast Mandatory Access Control (MAC) 1740 models base their access control decisions on the label on the 1741 subject (usually a process) and the object it wishes to access 1742 [RFC4949]. These labels may contain user identity information but 1743 usually contain additional information. In DAC systems users are 1744 free to specify the access rules for resources that they own. MAC 1745 models base their security decisions on a system wide policy 1746 established by an administrator or organization which the users do 1747 not have the ability to override. In this section, a MAC model is 1748 added to NFSv4.2. 1750 First, a method is provided for transporting and storing security 1751 label data on NFSv4 file objects. Security labels have several 1752 semantics that are met by NFSv4 recommended attributes such as the 1753 ability to set the label value upon object creation. Access control 1754 on these attributes are done through a combination of two mechanisms. 1755 As with other recommended attributes on file objects the usual DAC 1756 checks, Access Control Lists (ACLs) and permission bits, will be 1757 performed to ensure that proper file ownership is enforced. In 1758 addition a MAC system MAY be employed on the client, server, or both 1759 to enforce additional policy on what subjects may modify security 1760 label information. 1762 Second, a method is described for the client to determine if an NFSv4 1763 file object security label has changed. A client which needs to know 1764 if a label on a file or set of files is going to change SHOULD 1765 request a delegation on each labeled file. In order to change such a 1766 security label, the server will have to recall delegations on any 1767 file affected by the label change, so informing clients of the label 1768 change. 1770 An additional useful feature would be modification to the RPC layer 1771 used by NFSv4 to allow RPC calls to assert client process subject 1772 security labels and enable full mode enforcement as described in 1773 Section 9.5.1. Such modifications are outside the scope of this 1774 document (see [I-D.ietf-nfsv4-rpcsec-gssv3]). 1776 9.1. Definitions 1778 Label Format Specifier (LFS): is an identifier used by the client to 1779 establish the syntactic format of the security label and the 1780 semantic meaning of its components. These specifiers exist in a 1781 registry associated with documents describing the format and 1782 semantics of the label. 1784 Label Format Registry: is the IANA registry (see [RFC7569]) 1785 containing all registered LFSes along with references to the 1786 documents that describe the syntactic format and semantics of the 1787 security label. 1789 Policy Identifier (PI): is an optional part of the definition of a 1790 Label Format Specifier which allows for clients and server to 1791 identify specific security policies. 1793 Object: is a passive resource within the system that is to be 1794 protected. Objects can be entities such as files, directories, 1795 pipes, sockets, and many other system resources relevant to the 1796 protection of the system state. 1798 Subject: is an active entity usually a process which is requesting 1799 access to an object. 1801 MAC-Aware: is a server which can transmit and store object labels. 1803 MAC-Functional: is a client or server which is Labeled NFS enabled. 1804 Such a system can interpret labels and apply policies based on the 1805 security system. 1807 Multi-Level Security (MLS): is a traditional model where objects are 1808 given a sensitivity level (Unclassified, Secret, Top Secret, etc) 1809 and a category set (see [LB96], [RFC1108], [RFC2401], and 1810 [RFC4949]). 1812 9.2. MAC Security Attribute 1814 MAC models base access decisions on security attributes bound to 1815 subjects (usually processes) and objects (for NFS, file objects). 1816 This information can range from a user identity for an identity based 1817 MAC model, sensitivity levels for Multi-level security, or a type for 1818 Type Enforcement. These models base their decisions on different 1819 criteria but the semantics of the security attribute remain the same. 1820 The semantics required by the security attributes are listed below: 1822 o MUST provide flexibility with respect to the MAC model. 1824 o MUST provide the ability to atomically set security information 1825 upon object creation. 1827 o MUST provide the ability to enforce access control decisions both 1828 on the client and the server. 1830 o MUST NOT expose an object to either the client or server name 1831 space before its security information has been bound to it. 1833 NFSv4 implements the security attribute as a recommended attribute. 1834 These attributes have a fixed format and semantics, which conflicts 1835 with the flexible nature of the security attribute. To resolve this 1836 the security attribute consists of two components. The first 1837 component is a LFS as defined in [RFC7569] to allow for 1838 interoperability between MAC mechanisms. The second component is an 1839 opaque field which is the actual security attribute data. To allow 1840 for various MAC models, NFSv4 should be used solely as a transport 1841 mechanism for the security attribute. It is the responsibility of 1842 the endpoints to consume the security attribute and make access 1843 decisions based on their respective models. In addition, creation of 1844 objects through OPEN and CREATE allows for the security attribute to 1845 be specified upon creation. By providing an atomic create and set 1846 operation for the security attribute it is possible to enforce the 1847 second and fourth requirements. The recommended attribute 1848 FATTR4_SEC_LABEL (see Section 12.2.4) will be used to satisfy this 1849 requirement. 1851 9.2.1. Delegations 1853 In the event that a security attribute is changed on the server while 1854 a client holds a delegation on the file, both the server and the 1855 client MUST follow the NFSv4.1 protocol (see Chapter 10 of [RFC5661]) 1856 with respect to attribute changes. It SHOULD flush all changes back 1857 to the server and relinquish the delegation. 1859 9.2.2. Permission Checking 1861 It is not feasible to enumerate all possible MAC models and even 1862 levels of protection within a subset of these models. This means 1863 that the NFSv4 client and servers cannot be expected to directly make 1864 access control decisions based on the security attribute. Instead 1865 NFSv4 should defer permission checking on this attribute to the host 1866 system. These checks are performed in addition to existing DAC and 1867 ACL checks outlined in the NFSv4 protocol. Section 9.5 gives a 1868 specific example of how the security attribute is handled under a 1869 particular MAC model. 1871 9.2.3. Object Creation 1873 When creating files in NFSv4 the OPEN and CREATE operations are used. 1874 One of the parameters to these operations is an fattr4 structure 1875 containing the attributes the file is to be created with. This 1876 allows NFSv4 to atomically set the security attribute of files upon 1877 creation. When a client is MAC-Functional it must always provide the 1878 initial security attribute upon file creation. In the event that the 1879 server is MAC-Functional as well, it should determine by policy 1880 whether it will accept the attribute from the client or instead make 1881 the determination itself. If the client is not MAC-Functional, then 1882 the MAC-Functional server must decide on a default label. A more in 1883 depth explanation can be found in Section 9.5. 1885 9.2.4. Existing Objects 1887 Note that under the MAC model, all objects must have labels. 1888 Therefore, if an existing server is upgraded to include Labeled NFS 1889 support, then it is the responsibility of the security system to 1890 define the behavior for existing objects. 1892 9.2.5. Label Changes 1894 Consider a guest mode system (Section 9.5.2) in which the clients 1895 enforce MAC checks and the server has only a DAC security system 1896 which stores the labels along with the file data. In this type of 1897 system, a user with the appropriate DAC credentials on a client with 1898 poorly configured or disabled MAC labeling enforcement is allowed 1899 access to the file label (and data) on the server and can change the 1900 label. 1902 Clients which need to know if a label on a file or set of files has 1903 changed SHOULD request a delegation on each labeled file so that a 1904 label change by another client will be known via the process 1905 described in Section 9.2.1 which must be followed: the delegation 1906 will be recalled, which effectively notifies the client of the 1907 change. 1909 Note that the MAC security policies on a client can be such that the 1910 client does not have access to the file unless it has a delegation. 1912 9.3. pNFS Considerations 1914 The new FATTR4_SEC_LABEL attribute is metadata information and as 1915 such the storage device is not aware of the value contained on the 1916 metadata server. Fortunately, the NFSv4.1 protocol [RFC5661] already 1917 has provisions for doing access level checks from the storage device 1918 to the metadata server. In order for the storage device to validate 1919 the subject label presented by the client, it SHOULD utilize this 1920 mechanism. 1922 9.4. Discovery of Server Labeled NFS Support 1924 The server can easily determine that a client supports Labeled NFS 1925 when it queries for the FATTR4_SEC_LABEL label for an object. The 1926 client might need to discover which LFS the server supports. 1928 The following compound MUST NOT be denied by any MAC label check: 1930 PUTROOTFH, GETATTR {FATTR4_SEC_LABEL} 1932 Note that the server might have imposed a security flavor on the root 1933 that precludes such access. I.e., if the server requires kerberized 1934 access and the client presents a compound with AUTH_SYS, then the 1935 server is allowed to return NFS4ERR_WRONGSEC in this case. But if 1936 the client presents a correct security flavor, then the server MUST 1937 return the FATTR4_SEC_LABEL attribute with the supported LFS filled 1938 in. 1940 9.5. MAC Security NFS Modes of Operation 1942 A system using Labeled NFS may operate in two modes. The first mode 1943 provides the most protection and is called "full mode". In this mode 1944 both the client and server implement a MAC model allowing each end to 1945 make an access control decision. The remaining mode is called the 1946 "guest mode" and in this mode one end of the connection is not 1947 implementing a MAC model and thus offers less protection than full 1948 mode. 1950 9.5.1. Full Mode 1952 Full mode environments consist of MAC-Functional NFSv4 servers and 1953 clients and may be composed of mixed MAC models and policies. The 1954 system requires that both the client and server have an opportunity 1955 to perform an access control check based on all relevant information 1956 within the network. The file object security attribute is provided 1957 using the mechanism described in Section 9.2. 1959 Fully MAC-Functional NFSv4 servers are not possible in the absence of 1960 RPCSEC_GSSv3 [I-D.ietf-nfsv4-rpcsec-gssv3] support for client process 1961 subject label assertion. However, servers may make decisions based 1962 on the RPC credential information available. 1964 9.5.1.1. Initial Labeling and Translation 1966 The ability to create a file is an action that a MAC model may wish 1967 to mediate. The client is given the responsibility to determine the 1968 initial security attribute to be placed on a file. This allows the 1969 client to make a decision as to the acceptable security attributes to 1970 create a file with before sending the request to the server. Once 1971 the server receives the creation request from the client it may 1972 choose to evaluate if the security attribute is acceptable. 1974 Security attributes on the client and server may vary based on MAC 1975 model and policy. To handle this the security attribute field has an 1976 LFS component. This component is a mechanism for the host to 1977 identify the format and meaning of the opaque portion of the security 1978 attribute. A full mode environment may contain hosts operating in 1979 several different LFSes. In this case a mechanism for translating 1980 the opaque portion of the security attribute is needed. The actual 1981 translation function will vary based on MAC model and policy and is 1982 out of the scope of this document. If a translation is unavailable 1983 for a given LFS then the request MUST be denied. Another recourse is 1984 to allow the host to provide a fallback mapping for unknown security 1985 attributes. 1987 9.5.1.2. Policy Enforcement 1989 In full mode access control decisions are made by both the clients 1990 and servers. When a client makes a request it takes the security 1991 attribute from the requesting process and makes an access control 1992 decision based on that attribute and the security attribute of the 1993 object it is trying to access. If the client denies that access an 1994 RPC call to the server is never made. If however the access is 1995 allowed the client will make a call to the NFS server. 1997 When the server receives the request from the client it uses any 1998 credential information conveyed in the RPC request and the attributes 1999 of the object the client is trying to access to make an access 2000 control decision. If the server's policy allows this access it will 2001 fulfill the client's request, otherwise it will return 2002 NFS4ERR_ACCESS. 2004 Future protocol extensions may also allow the server to factor into 2005 the decision a security label extracted from the RPC request. 2007 Implementations MAY validate security attributes supplied over the 2008 network to ensure that they are within a set of attributes permitted 2009 from a specific peer, and if not, reject them. Note that a system 2010 may permit a different set of attributes to be accepted from each 2011 peer. 2013 9.5.1.3. Limited Server 2015 A Limited Server mode (see Section 4.2 of [RFC7204]) consists of a 2016 server which is label aware, but does not enforce policies. Such a 2017 server will store and retrieve all object labels presented by 2018 clients, utilize the methods described in Section 9.2.5 to allow the 2019 clients to detect changing labels, but may not factor the label into 2020 access decisions. Instead, it will expect the clients to enforce all 2021 such access locally. 2023 9.5.2. Guest Mode 2025 Guest mode implies that either the client or the server does not 2026 handle labels. If the client is not Labeled NFS aware, then it will 2027 not offer subject labels to the server. The server is the only 2028 entity enforcing policy, and may selectively provide standard NFS 2029 services to clients based on their authentication credentials and/or 2030 associated network attributes (e.g., IP address, network interface). 2031 The level of trust and access extended to a client in this mode is 2032 configuration-specific. If the server is not Labeled NFS aware, then 2033 it will not return object labels to the client. Clients in this 2034 environment are may consist of groups implementing different MAC 2035 model policies. The system requires that all clients in the 2036 environment be responsible for access control checks. 2038 9.6. Security Considerations for Labeled NFS 2040 This entire chapter deals with security issues. 2042 Depending on the level of protection the MAC system offers there may 2043 be a requirement to tightly bind the security attribute to the data. 2045 When only one of the client or server enforces labels, it is 2046 important to realize that the other side is not enforcing MAC 2047 protections. Alternate methods might be in use to handle the lack of 2048 MAC support and care should be taken to identify and mitigate threats 2049 from possible tampering outside of these methods. 2051 An example of this is that a server that modifies READDIR or LOOKUP 2052 results based on the client's subject label might want to always 2053 construct the same subject label for a client which does not present 2054 one. This will prevent a non-Labeled NFS client from mixing entries 2055 in the directory cache. 2057 10. Sharing change attribute implementation characteristics with NFSv4 2058 clients 2060 Although both the NFSv4 [RFC7530] and NFSv4.1 protocol [RFC5661], 2061 define the change attribute as being mandatory to implement, there is 2062 little in the way of guidance as to its construction. The only 2063 mandated constraint is that the value must change whenever the file 2064 data or metadata change. 2066 While this allows for a wide range of implementations, it also leaves 2067 the client with no way to determine which is the most recent value 2068 for the change attribute in a case where several RPC calls have been 2069 issued in parallel. In other words if two COMPOUNDs, both containing 2070 WRITE and GETATTR requests for the same file, have been issued in 2071 parallel, how does the client determine which of the two change 2072 attribute values returned in the replies to the GETATTR requests 2073 correspond to the most recent state of the file? In some cases, the 2074 only recourse may be to send another COMPOUND containing a third 2075 GETATTR that is fully serialized with the first two. 2077 NFSv4.2 avoids this kind of inefficiency by allowing the server to 2078 share details about how the change attribute is expected to evolve, 2079 so that the client may immediately determine which, out of the 2080 several change attribute values returned by the server, is the most 2081 recent. change_attr_type is defined as a new recommended attribute 2082 (see Section 12.2.3), and is per file system. 2084 11. Error Values 2086 NFS error numbers are assigned to failed operations within a Compound 2087 (COMPOUND or CB_COMPOUND) request. A Compound request contains a 2088 number of NFS operations that have their results encoded in sequence 2089 in a Compound reply. The results of successful operations will 2090 consist of an NFS4_OK status followed by the encoded results of the 2091 operation. If an NFS operation fails, an error status will be 2092 entered in the reply and the Compound request will be terminated. 2094 11.1. Error Definitions 2096 Protocol Error Definitions 2098 +-------------------------+--------+------------------+ 2099 | Error | Number | Description | 2100 +-------------------------+--------+------------------+ 2101 | NFS4ERR_BADLABEL | 10093 | Section 11.1.3.1 | 2102 | NFS4ERR_OFFLOAD_DENIED | 10091 | Section 11.1.2.1 | 2103 | NFS4ERR_OFFLOAD_NO_REQS | 10094 | Section 11.1.2.2 | 2104 | NFS4ERR_PARTNER_NO_AUTH | 10089 | Section 11.1.2.3 | 2105 | NFS4ERR_PARTNER_NOTSUPP | 10088 | Section 11.1.2.4 | 2106 | NFS4ERR_UNION_NOTSUPP | 10090 | Section 11.1.1.1 | 2107 | NFS4ERR_WRONG_LFS | 10092 | Section 11.1.3.2 | 2108 +-------------------------+--------+------------------+ 2110 Table 1 2112 11.1.1. General Errors 2114 This section deals with errors that are applicable to a broad set of 2115 different purposes. 2117 11.1.1.1. NFS4ERR_UNION_NOTSUPP (Error Code 10090) 2119 One of the arguments to the operation is a discriminated union and 2120 while the server supports the given operation, it does not support 2121 the selected arm of the discriminated union. 2123 11.1.2. Server to Server Copy Errors 2125 These errors deal with the interaction between server to server 2126 copies. 2128 11.1.2.1. NFS4ERR_OFFLOAD_DENIED (Error Code 10091) 2130 The copy offload operation is supported by both the source and the 2131 destination, but the destination is not allowing it for this file. 2132 If the client sees this error, it should fall back to the normal copy 2133 semantics. 2135 11.1.2.2. NFS4ERR_OFFLOAD_NO_REQS (Error Code 10094) 2137 The copy offload operation is supported by both the source and the 2138 destination, but the destination can not meet the client requirements 2139 for either consecutive byte copy or synchronous copy. If the client 2140 sees this error, it should either relax the requirements (if any) or 2141 fall back to the normal copy semantics. 2143 11.1.2.3. NFS4ERR_PARTNER_NO_AUTH (Error Code 10089) 2145 The source server does not authorize a server-to-server copy offload 2146 operation. This may be due to the client's failure to send the 2147 COPY_NOTIFY operation to the source server, the source server 2148 receiving a server-to-server copy offload request after the copy 2149 lease time expired, or for some other permission problem. 2151 The destination server does not authorize a server-to-server copy 2152 offload operation. This may be due to an inter-server COPY request 2153 where the destination server requires RPCSEC_GSSv3 and it is not 2154 used, or some other permissions problem. 2156 11.1.2.4. NFS4ERR_PARTNER_NOTSUPP (Error Code 10088) 2158 The remote server does not support the server-to-server copy offload 2159 protocol. 2161 11.1.3. Labeled NFS Errors 2163 These errors are used in Labeled NFS. 2165 11.1.3.1. NFS4ERR_BADLABEL (Error Code 10093) 2167 The label specified is invalid in some manner. 2169 11.1.3.2. NFS4ERR_WRONG_LFS (Error Code 10092) 2171 The LFS specified in the subject label is not compatible with the LFS 2172 in the object label. 2174 11.2. New Operations and Their Valid Errors 2176 This section contains a table that gives the valid error returns for 2177 each new NFSv4.2 protocol operation. The error code NFS4_OK 2178 (indicating no error) is not listed but should be understood to be 2179 returnable by all new operations. The error values for all other 2180 operations are defined in Section 15.2 of [RFC5661]. 2182 Valid Error Returns for Each New Protocol Operation 2184 +----------------+--------------------------------------------------+ 2185 | Operation | Errors | 2186 +----------------+--------------------------------------------------+ 2187 | ALLOCATE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2188 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2189 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2190 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2191 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2192 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2193 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | 2194 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 2195 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2196 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2197 | | NFS4ERR_REP_TOO_BIG, | 2198 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2199 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2200 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2201 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2202 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2203 +----------------+--------------------------------------------------+ 2204 | CLONE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2205 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2206 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2207 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2208 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2209 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2210 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | 2211 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | 2212 | | NFS4ERR_NOSPC, NFS4ERR_OLD_STATEID, | 2213 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2214 | | NFS4ERR_REP_TOO_BIG, | 2215 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2216 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2217 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2218 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2219 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE, | 2220 | | NFS4ERR_XDEV | 2221 +----------------+--------------------------------------------------+ 2222 | COPY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2223 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2224 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2225 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2226 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2227 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2228 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, | 2229 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2230 | | NFS4ERR_NOSPC, NFS4ERR_OFFLOAD_DENIED, | 2231 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 2232 | | NFS4ERR_OP_NOT_IN_SESSION, | 2233 | | NFS4ERR_PARTNER_NO_AUTH, | 2234 | | NFS4ERR_PARTNER_NOTSUPP, NFS4ERR_PNFS_IO_HOLE, | 2235 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2236 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2237 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2238 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2239 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2240 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2241 +----------------+--------------------------------------------------+ 2242 | COPY_NOTIFY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2243 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2244 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2245 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2246 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2247 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | 2248 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2249 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 2250 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PNFS_IO_HOLE, | 2251 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2252 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2253 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2254 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2255 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 2256 | | NFS4ERR_WRONG_TYPE | 2257 +----------------+--------------------------------------------------+ 2258 | DEALLOCATE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2259 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2260 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2261 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2262 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 2263 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 2264 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2265 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2266 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2267 | | NFS4ERR_REP_TOO_BIG, | 2268 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2269 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2270 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2271 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2272 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2273 +----------------+--------------------------------------------------+ 2274 | GETDEVICELIST | NFS4ERR_NOTSUPP | 2275 +----------------+--------------------------------------------------+ 2276 | IO_ADVISE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2277 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2278 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2279 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2280 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 2281 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 2282 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2283 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2284 | | NFS4ERR_OP_NOT_IN_SESSION, | 2285 | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | 2286 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2287 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2288 +----------------+--------------------------------------------------+ 2289 | LAYOUTERROR | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2290 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 2291 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 2292 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 2293 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, | 2294 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2295 | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | 2296 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2297 | | NFS4ERR_REP_TOO_BIG, | 2298 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2299 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2300 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2301 | | NFS4ERR_TOO_MANY_OPS, | 2302 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | 2303 | | NFS4ERR_WRONG_TYPE | 2304 +----------------+--------------------------------------------------+ 2305 | LAYOUTSTATS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2306 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 2307 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 2308 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 2309 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, | 2310 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2311 | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | 2312 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2313 | | NFS4ERR_REP_TOO_BIG, | 2314 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2315 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2316 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2317 | | NFS4ERR_TOO_MANY_OPS, | 2318 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | 2319 | | NFS4ERR_WRONG_TYPE | 2320 +----------------+--------------------------------------------------+ 2321 | OFFLOAD_CANCEL | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2322 | | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, | 2323 | | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, | 2324 | | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, | 2325 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2326 | | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS | 2327 +----------------+--------------------------------------------------+ 2328 | OFFLOAD_STATUS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2329 | | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, | 2330 | | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, | 2331 | | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, | 2332 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2333 | | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS | 2334 +----------------+--------------------------------------------------+ 2335 | READ_PLUS | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2336 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2337 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2338 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2339 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2340 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | 2341 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2342 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2343 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2344 | | NFS4ERR_PARTNER_NO_AUTH, NFS4ERR_PNFS_IO_HOLE, | 2345 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2346 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2347 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2348 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2349 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 2350 | | NFS4ERR_WRONG_TYPE | 2351 +----------------+--------------------------------------------------+ 2352 | SEEK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2353 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2354 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2355 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2356 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2357 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | 2358 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2359 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2360 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2361 | | NFS4ERR_PNFS_IO_HOLE, NFS4ERR_PNFS_NO_LAYOUT, | 2362 | | NFS4ERR_REP_TOO_BIG, | 2363 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2364 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2365 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2366 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 2367 | | NFS4ERR_UNION_NOTSUPP, NFS4ERR_WRONG_TYPE | 2368 +----------------+--------------------------------------------------+ 2369 | WRITE_SAME | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2370 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2371 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2372 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2373 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2374 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2375 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, | 2376 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2377 | | NFS4ERR_NOSPC, NFS4ERR_NOTSUPP, | 2378 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 2379 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PNFS_IO_HOLE, | 2380 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2381 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2382 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2383 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2384 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2385 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2386 +----------------+--------------------------------------------------+ 2388 Table 2 2390 11.3. New Callback Operations and Their Valid Errors 2392 This section contains a table that gives the valid error returns for 2393 each new NFSv4.2 callback operation. The error code NFS4_OK 2394 (indicating no error) is not listed but should be understood to be 2395 returnable by all new callback operations. The error values for all 2396 other callback operations are defined in Section 15.3 of [RFC5661]. 2398 Valid Error Returns for Each New Protocol Callback Operation 2400 +------------+------------------------------------------------------+ 2401 | Callback | Errors | 2402 | Operation | | 2403 +------------+------------------------------------------------------+ 2404 | CB_OFFLOAD | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 2405 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 2406 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_REP_TOO_BIG, | 2407 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, NFS4ERR_REQ_TOO_BIG, | 2408 | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | 2409 | | NFS4ERR_TOO_MANY_OPS | 2410 +------------+------------------------------------------------------+ 2412 Table 3 2414 12. New File Attributes 2416 12.1. New RECOMMENDED Attributes - List and Definition References 2418 The list of new RECOMMENDED attributes appears in Table 4. The 2419 meaning of the columns of the table are: 2421 Name: The name of the attribute. 2423 Id: The number assigned to the attribute. In the event of conflicts 2424 between the assigned number and 2425 [I-D.ietf-nfsv4-minorversion2-dot-x], the latter is authoritative, 2426 but in such an event, it should be resolved with Errata to this 2427 document and/or [I-D.ietf-nfsv4-minorversion2-dot-x]. See 2428 [IESG08] for the Errata process. 2430 Data Type: The XDR data type of the attribute. 2432 Acc: Access allowed to the attribute. 2434 R means read-only (GETATTR may retrieve, SETATTR may not set). 2436 W means write-only (SETATTR may set, GETATTR may not retrieve). 2438 R W means read/write (GETATTR may retrieve, SETATTR may set). 2440 Defined in: The section of this specification that describes the 2441 attribute. 2443 +------------------+----+-------------------+-----+----------------+ 2444 | Name | Id | Data Type | Acc | Defined in | 2445 +------------------+----+-------------------+-----+----------------+ 2446 | clone_blksize | 77 | uint32_t | R | Section 12.2.1 | 2447 | space_freed | 78 | length4 | R | Section 12.2.2 | 2448 | change_attr_type | 79 | change_attr_type4 | R | Section 12.2.3 | 2449 | sec_label | 80 | sec_label4 | R W | Section 12.2.4 | 2450 +------------------+----+-------------------+-----+----------------+ 2452 Table 4 2454 12.2. Attribute Definitions 2456 12.2.1. Attribute 77: clone_blksize 2458 The clone_blksize attribute indicates the granularity of a CLONE 2459 operation. 2461 12.2.2. Attribute 78: space_freed 2463 space_freed gives the number of bytes freed if the file is deleted. 2464 This attribute is read only and is of type length4. It is a per file 2465 attribute. 2467 12.2.3. Attribute 79: change_attr_type 2469 2471 enum change_attr_type4 { 2472 NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR = 0, 2473 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER = 1, 2474 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2, 2475 NFS4_CHANGE_TYPE_IS_TIME_METADATA = 3, 2476 NFS4_CHANGE_TYPE_IS_UNDEFINED = 4 2477 }; 2479 2480 change_attr_type is a per file system attribute which enables the 2481 NFSv4.2 server to provide additional information about how it expects 2482 the change attribute value to evolve after the file data, or metadata 2483 has changed. While Section 5.4 of [RFC5661] discusses per file 2484 system attributes, it is expected that the value of change_attr_type 2485 not depend on the value of "homogeneous" and only changes in the 2486 event of a migration. 2488 NFS4_CHANGE_TYPE_IS_UNDEFINED: The change attribute does not take 2489 values that fit into any of these categories. 2491 NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR: The change attribute value MUST 2492 monotonically increase for every atomic change to the file 2493 attributes, data, or directory contents. 2495 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER: The change attribute value MUST 2496 be incremented by one unit for every atomic change to the file 2497 attributes, data, or directory contents. This property is 2498 preserved when writing to pNFS data servers. 2500 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS: The change attribute 2501 value MUST be incremented by one unit for every atomic change to 2502 the file attributes, data, or directory contents. In the case 2503 where the client is writing to pNFS data servers, the number of 2504 increments is not guaranteed to exactly match the number of 2505 writes. 2507 NFS4_CHANGE_TYPE_IS_TIME_METADATA: The change attribute is 2508 implemented as suggested in [RFC7530] in terms of the 2509 time_metadata attribute. 2511 If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR, 2512 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or 2513 NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at 2514 the very least that the change attribute is monotonically increasing, 2515 which is sufficient to resolve the question of which value is the 2516 most recent. 2518 If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then 2519 by inspecting the value of the 'time_delta' attribute it additionally 2520 has the option of detecting rogue server implementations that use 2521 time_metadata in violation of the spec. 2523 If the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it has the 2524 ability to predict what the resulting change attribute value should 2525 be after a COMPOUND containing a SETATTR, WRITE, or CREATE. This 2526 again allows it to detect changes made in parallel by another client. 2528 The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits the 2529 same, but only if the client is not doing pNFS WRITEs. 2531 Finally, if the server does not support change_attr_type or if 2532 NFS4_CHANGE_TYPE_IS_UNDEFINED is set, then the server SHOULD make an 2533 effort to implement the change attribute in terms of the 2534 time_metadata attribute. 2536 12.2.4. Attribute 80: sec_label 2538 2540 typedef uint32_t policy4; 2542 struct labelformat_spec4 { 2543 policy4 lfs_lfs; 2544 policy4 lfs_pi; 2545 }; 2547 struct sec_label4 { 2548 labelformat_spec4 slai_lfs; 2549 opaque slai_data<>; 2550 }; 2552 2554 The FATTR4_SEC_LABEL contains an array of two components with the 2555 first component being an LFS. It serves to provide the receiving end 2556 with the information necessary to translate the security attribute 2557 into a form that is usable by the endpoint. Label Formats assigned 2558 an LFS may optionally choose to include a Policy Identifier field to 2559 allow for complex policy deployments. The LFS and Label Format 2560 Registry are described in detail in [RFC7569]. The translation used 2561 to interpret the security attribute is not specified as part of the 2562 protocol as it may depend on various factors. The second component 2563 is an opaque section which contains the data of the attribute. This 2564 component is dependent on the MAC model to interpret and enforce. 2566 In particular, it is the responsibility of the LFS specification to 2567 define a maximum size for the opaque section, slai_data<>. When 2568 creating or modifying a label for an object, the client needs to be 2569 guaranteed that the server will accept a label that is sized 2570 correctly. By both client and server being part of a specific MAC 2571 model, the client will be aware of the size. 2573 13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL 2575 The following tables summarize the operations of the NFSv4.2 protocol 2576 and the corresponding designation of REQUIRED, RECOMMENDED, and 2577 OPTIONAL to implement or MUST NOT implement. The designation of MUST 2578 NOT implement is reserved for those operations that were defined in 2579 either NFSv4.0 or NFSV4.1 and MUST NOT be implemented in NFSv4.2. 2581 For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation 2582 for operations sent by the client is for the server implementation. 2583 The client is generally required to implement the operations needed 2584 for the operating environment for which it serves. For example, a 2585 read-only NFSv4.2 client would have no need to implement the WRITE 2586 operation and is not required to do so. 2588 The REQUIRED or OPTIONAL designation for callback operations sent by 2589 the server is for both the client and server. Generally, the client 2590 has the option of creating the backchannel and sending the operations 2591 on the fore channel that will be a catalyst for the server sending 2592 callback operations. A partial exception is CB_RECALL_SLOT; the only 2593 way the client can avoid supporting this operation is by not creating 2594 a backchannel. 2596 Since this is a summary of the operations and their designation, 2597 there are subtleties that are not presented here. Therefore, if 2598 there is a question of the requirements of implementation, the 2599 operation descriptions themselves must be consulted along with other 2600 relevant explanatory text within this either specification or that of 2601 NFSv4.1 [RFC5661]. 2603 The abbreviations used in the second and third columns of the table 2604 are defined as follows. 2606 REQ: REQUIRED to implement 2608 REC: RECOMMENDED to implement 2610 OPT: OPTIONAL to implement 2612 MNI: MUST NOT implement 2614 For the NFSv4.2 features that are OPTIONAL, the operations that 2615 support those features are OPTIONAL, and the server MUST return 2616 NFS4ERR_NOTSUPP in response to the client's use of those operations, 2617 when those operations are not implemented by the server. If an 2618 OPTIONAL feature is supported, it is possible that a set of 2619 operations related to the feature become REQUIRED to implement. The 2620 third column of the table designates the feature(s) and if the 2621 operation is REQUIRED or OPTIONAL in the presence of support for the 2622 feature. 2624 The OPTIONAL features identified and their abbreviations are as 2625 follows: 2627 pNFS: Parallel NFS 2629 FDELG: File Delegations 2631 DDELG: Directory Delegations 2633 COPYra: Intra-server Server Side Copy 2635 COPYer: Inter-server Server Side Copy 2637 ADB: Application Data Blocks 2639 Operations 2641 +----------------------+--------------------+-----------------------+ 2642 | Operation | REQ, REC, OPT, or | Feature (REQ, REC, or | 2643 | | MNI | OPT) | 2644 +----------------------+--------------------+-----------------------+ 2645 | ALLOCATE | OPT | | 2646 | ACCESS | REQ | | 2647 | BACKCHANNEL_CTL | REQ | | 2648 | BIND_CONN_TO_SESSION | REQ | | 2649 | CLONE | OPT | | 2650 | CLOSE | REQ | | 2651 | COMMIT | REQ | | 2652 | COPY | OPT | COPYer (REQ), COPYra | 2653 | | | (REQ) | 2654 | COPY_NOTIFY | OPT | COPYer (REQ) | 2655 | DEALLOCATE | OPT | | 2656 | CREATE | REQ | | 2657 | CREATE_SESSION | REQ | | 2658 | DELEGPURGE | OPT | FDELG (REQ) | 2659 | DELEGRETURN | OPT | FDELG, DDELG, pNFS | 2660 | | | (REQ) | 2661 | DESTROY_CLIENTID | REQ | | 2662 | DESTROY_SESSION | REQ | | 2663 | EXCHANGE_ID | REQ | | 2664 | FREE_STATEID | REQ | | 2665 | GETATTR | REQ | | 2666 | GETDEVICEINFO | OPT | pNFS (REQ) | 2667 | GETDEVICELIST | MNI | pNFS (MNI) | 2668 | GETFH | REQ | | 2669 | GET_DIR_DELEGATION | OPT | DDELG (REQ) | 2670 | ILLEGAL | REQ | | 2671 | IO_ADVISE | OPT | | 2672 | LAYOUTCOMMIT | OPT | pNFS (REQ) | 2673 | LAYOUTGET | OPT | pNFS (REQ) | 2674 | LAYOUTRETURN | OPT | pNFS (REQ) | 2675 | LAYOUTERROR | OPT | pNFS (OPT) | 2676 | LAYOUTSTATS | OPT | pNFS (OPT) | 2677 | LINK | OPT | | 2678 | LOCK | REQ | | 2679 | LOCKT | REQ | | 2680 | LOCKU | REQ | | 2681 | LOOKUP | REQ | | 2682 | LOOKUPP | REQ | | 2683 | NVERIFY | REQ | | 2684 | OFFLOAD_CANCEL | OPT | COPYer (OPT), COPYra | 2685 | | | (OPT) | 2686 | OFFLOAD_STATUS | OPT | COPYer (OPT), COPYra | 2687 | | | (OPT) | 2688 | OPEN | REQ | | 2689 | OPENATTR | OPT | | 2690 | OPEN_CONFIRM | MNI | | 2691 | OPEN_DOWNGRADE | REQ | | 2692 | PUTFH | REQ | | 2693 | PUTPUBFH | REQ | | 2694 | PUTROOTFH | REQ | | 2695 | READ | REQ | | 2696 | READDIR | REQ | | 2697 | READLINK | OPT | | 2698 | READ_PLUS | OPT | | 2699 | RECLAIM_COMPLETE | REQ | | 2700 | RELEASE_LOCKOWNER | MNI | | 2701 | REMOVE | REQ | | 2702 | RENAME | REQ | | 2703 | RENEW | MNI | | 2704 | RESTOREFH | REQ | | 2705 | SAVEFH | REQ | | 2706 | SECINFO | REQ | | 2707 | SECINFO_NO_NAME | REC | pNFS file layout | 2708 | | | (REQ) | 2709 | SEEK | OPT | | 2710 | SEQUENCE | REQ | | 2711 | SETATTR | REQ | | 2712 | SETCLIENTID | MNI | | 2713 | SETCLIENTID_CONFIRM | MNI | | 2714 | SET_SSV | REQ | | 2715 | TEST_STATEID | REQ | | 2716 | VERIFY | REQ | | 2717 | WANT_DELEGATION | OPT | FDELG (OPT) | 2718 | WRITE | REQ | | 2719 | WRITE_SAME | OPT | ADB (REQ) | 2720 +----------------------+--------------------+-----------------------+ 2722 Table 5 2724 Callback Operations 2726 +-------------------------+------------------+----------------------+ 2727 | Operation | REQ, REC, OPT, | Feature (REQ, REC, | 2728 | | or MNI | or OPT) | 2729 +-------------------------+------------------+----------------------+ 2730 | CB_GETATTR | OPT | FDELG (REQ) | 2731 | CB_ILLEGAL | REQ | | 2732 | CB_LAYOUTRECALL | OPT | pNFS (REQ) | 2733 | CB_NOTIFY | OPT | DDELG (REQ) | 2734 | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | 2735 | CB_NOTIFY_LOCK | OPT | | 2736 | CB_OFFLOAD | OPT | COPYer (REQ), COPYra | 2737 | | | (REQ) | 2738 | CB_PUSH_DELEG | OPT | FDELG (OPT) | 2739 | CB_RECALL | OPT | FDELG, DDELG, pNFS | 2740 | | | (REQ) | 2741 | CB_RECALL_ANY | OPT | FDELG, DDELG, pNFS | 2742 | | | (REQ) | 2743 | CB_RECALL_SLOT | REQ | | 2744 | CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS (REQ) | 2745 | CB_SEQUENCE | OPT | FDELG, DDELG, pNFS | 2746 | | | (REQ) | 2747 | CB_WANTS_CANCELLED | OPT | FDELG, DDELG, pNFS | 2748 | | | (REQ) | 2749 +-------------------------+------------------+----------------------+ 2751 Table 6 2753 14. Modifications to NFSv4.1 Operations 2755 14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID 2757 14.1.1. ARGUMENT 2759 2761 /* new */ 2762 const EXCHGID4_FLAG_SUPP_FENCE_OPS = 0x00000004; 2764 2766 14.1.2. RESULT 2768 Unchanged 2770 14.1.3. MOTIVATION 2772 Enterprise applications require guarantees that an operation has 2773 either aborted or completed. NFSv4.1 provides this guarantee as long 2774 as the session is alive: simply send a SEQUENCE operation on the same 2775 slot with a new sequence number, and the successful return of 2776 SEQUENCE indicates the previous operation has completed. However, if 2777 the session is lost, there is no way to know when any in progress 2778 operations have aborted or completed. In hindsight, the NFSv4.1 2779 specification should have mandated that DESTROY_SESSION either abort 2780 or complete all outstanding operations. 2782 14.1.4. DESCRIPTION 2784 A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability 2785 when it sends an EXCHANGE_ID operation. The server SHOULD set this 2786 capability in the EXCHANGE_ID reply whether the client requests it or 2787 not. It is the server's return that determines whether this 2788 capability is in effect. When it is in effect, the following will 2789 occur: 2791 o The server will not reply to any DESTROY_SESSION invoked with the 2792 client ID until all operations in progress are completed or 2793 aborted. 2795 o The server will not reply to subsequent EXCHANGE_ID invoked on the 2796 same client owner with a new verifier until all operations in 2797 progress on the client ID's session are completed or aborted. 2799 o In implementations where the NFS server is deployed as a cluster, 2800 it does support client ID trunking, and the 2801 EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a session 2802 ID created on one node of the storage cluster MUST be destroyable 2803 via DESTROY_SESSION. In addition, DESTROY_CLIENTID and an 2804 EXCHANGE_ID with a new verifier affects all sessions regardless 2805 what node the sessions were created on. 2807 14.2. Operation 48: GETDEVICELIST - Get All Device Mappings for a File 2808 System 2810 14.2.1. ARGUMENT 2812 2814 struct GETDEVICELIST4args { 2815 /* CURRENT_FH: object belonging to the file system */ 2816 layouttype4 gdla_layout_type; 2818 /* number of deviceIDs to return */ 2819 count4 gdla_maxdevices; 2821 nfs_cookie4 gdla_cookie; 2822 verifier4 gdla_cookieverf; 2823 }; 2825 2827 14.2.2. RESULT 2829 2831 struct GETDEVICELIST4resok { 2832 nfs_cookie4 gdlr_cookie; 2833 verifier4 gdlr_cookieverf; 2834 deviceid4 gdlr_deviceid_list<>; 2835 bool gdlr_eof; 2836 }; 2838 union GETDEVICELIST4res switch (nfsstat4 gdlr_status) { 2839 case NFS4_OK: 2840 GETDEVICELIST4resok gdlr_resok4; 2841 default: 2842 void; 2843 }; 2845 2847 14.2.3. MOTIVATION 2849 The GETDEVICELIST operation was introduced in [RFC5661] specifically 2850 to request a list of devices at filesystem mount time from block 2851 layout type servers. However use of the GETDEVICELIST operation 2852 introduces a race condition versus notification about changes to pNFS 2853 device IDs as provided by CB_NOTIFY_DEVICEID. Implementation 2854 experience with block layout servers has shown there is no need for 2855 GETDEVICELIST. Clients have to be able to request new devices using 2856 GETDEVICEINFO at any time in response either to a new deviceid in 2857 LAYOUTGET results or to the CB_NOTIFY_DEVICEID callback operation. 2859 14.2.4. DESCRIPTION 2861 Clients and servers MUST NOT implement the GETDEVICELIST operation. 2863 15. NFSv4.2 Operations 2865 15.1. Operation 59: ALLOCATE - Reserve Space in A Region of a File 2867 15.1.1. ARGUMENT 2869 2871 struct ALLOCATE4args { 2872 /* CURRENT_FH: file */ 2873 stateid4 aa_stateid; 2874 offset4 aa_offset; 2875 length4 aa_length; 2876 }; 2878 2880 15.1.2. RESULT 2882 2884 struct ALLOCATE4res { 2885 nfsstat4 ar_status; 2886 }; 2888 2890 15.1.3. DESCRIPTION 2892 Whenever a client wishes to reserve space for a region in a file it 2893 calls the ALLOCATE operation with the current filehandle set to the 2894 filehandle of the file in question, and the start offset and length 2895 in bytes of the region set in aa_offset and aa_length respectively. 2897 CURRENT_FH must be a regular file. If CURRENT_FH is not a regular 2898 file, the operation MUST fail and return NFS4ERR_WRONG_TYPE. 2900 The aa_stateid MUST refer to a stateid that is valid for a WRITE 2901 operation and follows the rules for stateids in Sections 8.2.5 and 2902 18.32.3 of [RFC5661]. 2904 The server will ensure that backing blocks are reserved to the region 2905 specified by aa_offset and aa_length, and that no future writes into 2906 this region will return NFS4ERR_NOSPC. If the region lies partially 2907 or fully outside the current file size the file size will be set to 2908 aa_offset + aa_length implicitly. If the server cannot guarantee 2909 this, it must return NFS4ERR_NOSPC. 2911 The ALLOCATE operation can also be used to extend the size of a file 2912 if the region specified by aa_offset and aa_length extends beyond the 2913 current file size. In that case any data outside of the previous 2914 file size will return zeroes when read before data is written to it. 2916 It is not required that the server allocate the space to the file 2917 before returning success. The allocation can be deferred, however, 2918 it must be guaranteed that it will not fail for lack of space. The 2919 deferral does not result in an asynchronous reply. 2921 The ALLOCATE operation will result in the space_used attribute and 2922 space_freed attributes being increased by the number of bytes 2923 reserved unless they were previously reserved or written and not 2924 shared. 2926 15.2. Operation 60: COPY - Initiate a server-side copy 2928 15.2.1. ARGUMENT 2930 2932 struct COPY4args { 2933 /* SAVED_FH: source file */ 2934 /* CURRENT_FH: destination file */ 2935 stateid4 ca_src_stateid; 2936 stateid4 ca_dst_stateid; 2937 offset4 ca_src_offset; 2938 offset4 ca_dst_offset; 2939 length4 ca_count; 2940 bool ca_consecutive; 2941 bool ca_synchronous; 2942 netloc4 ca_source_server<>; 2943 }; 2945 2947 15.2.2. RESULT 2949 2951 struct write_response4 { 2952 stateid4 wr_callback_id<1>; 2953 length4 wr_count; 2954 stable_how4 wr_committed; 2955 verifier4 wr_writeverf; 2956 }; 2958 struct copy_requirements4 { 2959 bool cr_consecutive; 2960 bool cr_synchronous; 2961 }; 2963 struct COPY4resok { 2964 write_response4 cr_response; 2965 copy_requirements4 cr_requirements; 2966 }; 2968 union COPY4res switch (nfsstat4 cr_status) { 2969 case NFS4_OK: 2970 COPY4resok cr_resok4; 2971 case NFS4ERR_OFFLOAD_NO_REQS: 2972 copy_requirements4 cr_requirements; 2973 default: 2974 void; 2975 }; 2977 2979 15.2.3. DESCRIPTION 2981 The COPY operation is used for both intra-server and inter-server 2982 copies. In both cases, the COPY is always sent from the client to 2983 the destination server of the file copy. The COPY operation requests 2984 that a range in the file specified by SAVED_FH is copied to a range 2985 in the file specified by CURRENT_FH. 2987 Both SAVED_FH and CURRENT_FH must be regular files. If either 2988 SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail 2989 and return NFS4ERR_WRONG_TYPE. 2991 SAVED_FH and CURRENT_FH must be different files. If SAVED_FH and 2992 CURRENT_FH refer to the same file, the operation MUST fail with 2993 NFS4ERR_INVAL. 2995 If the request is for an inter-server-to-server copy, the source-fh 2996 is a filehandle from the source server and the compound procedure is 2997 being executed on the destination server. In this case, the source- 2998 fh is a foreign filehandle on the server receiving the COPY request. 2999 If either PUTFH or SAVEFH checked the validity of the filehandle, the 3000 operation would likely fail and return NFS4ERR_STALE. 3002 If a server supports the inter-server-to-server COPY feature, a PUTFH 3003 followed by a SAVEFH MUST NOT return NFS4ERR_STALE for either 3004 operation. These restrictions do not pose substantial difficulties 3005 for servers. CURRENT_FH and SAVED_FH may be validated in the context 3006 of the operation referencing them and an NFS4ERR_STALE error returned 3007 for an invalid file handle at that point. 3009 The ca_dst_stateid MUST refer to a stateid that is valid for a WRITE 3010 operation and follows the rules for stateids in Sections 8.2.5 and 3011 18.32.3 of [RFC5661]. For an inter-server copy, the ca_src_stateid 3012 MUST be the cnr_stateid returned from the earlier COPY_NOTIFY 3013 operation, while for an intra-server copy ca_src_stateid MUST refer 3014 to a stateid that is valid for a READ operations and follows the 3015 rules for stateids in Sections 8.2.5 and 18.22.3 of [RFC5661]. If 3016 either stateid is invalid, then the operation MUST fail. 3018 The ca_src_offset is the offset within the source file from which the 3019 data will be read, the ca_dst_offset is the offset within the 3020 destination file to which the data will be written, and the ca_count 3021 is the number of bytes that will be copied. An offset of 0 (zero) 3022 specifies the start of the file. A count of 0 (zero) requests that 3023 all bytes from ca_src_offset through EOF be copied to the 3024 destination. If concurrent modifications to the source file overlap 3025 with the source file region being copied, the data copied may include 3026 all, some, or none of the modifications. The client can use standard 3027 NFS operations (e.g., OPEN with OPEN4_SHARE_DENY_WRITE or mandatory 3028 byte range locks) to protect against concurrent modifications if the 3029 client is concerned about this. If the source file's end of file is 3030 being modified in parallel with a copy that specifies a count of 0 3031 (zero) bytes, the amount of data copied is implementation dependent 3032 (clients may guard against this case by specifying a non-zero count 3033 value or preventing modification of the source file as mentioned 3034 above). 3036 If the source offset or the source offset plus count is greater than 3037 the size of the source file, the operation MUST fail with 3038 NFS4ERR_INVAL. The destination offset or destination offset plus 3039 count may be greater than the size of the destination file. This 3040 allows for the client to issue parallel copies to implement 3041 operations such as 3043 3045 % cat file1 file2 file3 file4 > dest 3047 3049 If the ca_source_server list is specified, then this is an inter- 3050 server copy operation and the source file is on a remote server. The 3051 client is expected to have previously issued a successful COPY_NOTIFY 3052 request to the remote source server. The ca_source_server list MUST 3053 be the same as the COPY_NOTIFY response's cnr_source_server list. If 3054 the client includes the entries from the COPY_NOTIFY response's 3055 cnr_source_server list in the ca_source_server list, the source 3056 server can indicate a specific copy protocol for the destination 3057 server to use by returning a URL, which specifies both a protocol 3058 service and server name. Server-to-server copy protocol 3059 considerations are described in Section 4.6 and Section 4.9.1. 3061 If ca_consecutive is set, then the client has specified that the copy 3062 protocol selected MUST copy bytes in consecutive order from 3063 ca_src_offset to ca_count. If the destination server cannot meet 3064 this requirement, then it MUST return an error of 3065 NFS4ERR_OFFLOAD_NO_REQS and set cr_consecutive to be false. 3066 Likewise, if ca_synchronous is set, then the client has required that 3067 the copy protocol selected MUST perform a synchronous copy. If the 3068 destination server cannot meet this requirement, then it MUST return 3069 an error of NFS4ERR_OFFLOAD_NO_REQS and set cr_synchronous to be 3070 false. 3072 If both are set by the client, then the destination SHOULD try to 3073 determine if it can respond to both requirements at the same time. 3074 If it cannot make that determination, it must set to true the one it 3075 can and set to false the other. The client, upon getting an 3076 NFS4ERR_OFFLOAD_NO_REQS error, has to examine both cr_consecutive and 3077 cr_synchronous against the respective values of ca_consecutive and 3078 ca_synchronous to determine the possible requirement not met. It 3079 MUST be prepared for the destination server not being able to 3080 determine both requirements at the same time. 3082 Upon receiving the NFS4ERR_OFFLOAD_NO_REQS error, the client has to 3083 determine if it wants to either re-request the copy with a relaxed 3084 set of requirements or if it wants to revert to manually copying the 3085 data. If it decides to manually copy the data and this is a remote 3086 copy, then the client is responsible for informing the source that 3087 the earlier COPY_NOTIFY is no longer valid by sending it an 3088 OFFLOAD_CANCEL. 3090 If the operation does not result in an immediate failure, the server 3091 will return NFS4_OK. 3093 If the wr_callback_id is returned, this indicates that an 3094 asynchronous COPY operation was initiated and a CB_OFFLOAD callback 3095 will deliver the final results of the operation. The wr_callback_id 3096 stateid is termed a copy stateid in this context. The server is 3097 given the option of returning the results in a callback because the 3098 data may require a relatively long period of time to copy. 3100 If no wr_callback_id is returned, the operation completed 3101 synchronously and no callback will be issued by the server. The 3102 completion status of the operation is indicated by cr_status. 3104 If the copy completes successfully, either synchronously or 3105 asynchronously, the data copied from the source file to the 3106 destination file MUST appear identical to the NFS client. However, 3107 the NFS server's on disk representation of the data in the source 3108 file and destination file MAY differ. For example, the NFS server 3109 might encrypt, compress, deduplicate, or otherwise represent the on 3110 disk data in the source and destination file differently. 3112 If a failure does occur for a synchronous copy, wr_count will be set 3113 to the number of bytes copied to the destination file before the 3114 error occurred. If cr_consecutive is true, then the bytes were 3115 copied in order. If the failure occurred for an asynchronous copy, 3116 then the client will have gotten the notification of the consecutive 3117 copy order when it got the copy stateid. It will be able to 3118 determine the bytes copied from the coa_bytes_copied in the 3119 CB_OFFLOAD argument. 3121 In either case, if cr_consecutive was not true, there is no assurance 3122 as to exactly which bytes in the range were copied. The client MUST 3123 assume that there exists a mixture of the original contents of the 3124 range and the new bytes. If the COPY wrote past the end of the file 3125 on the destination, then the last byte written to will determine the 3126 new file size. The contents of any block not written to and past the 3127 original size of the file will be as if a normal WRITE extended the 3128 file. 3130 15.3. Operation 61: COPY_NOTIFY - Notify a source server of a future 3131 copy 3133 15.3.1. ARGUMENT 3135 3137 struct COPY_NOTIFY4args { 3138 /* CURRENT_FH: source file */ 3139 stateid4 cna_src_stateid; 3140 netloc4 cna_destination_server; 3141 }; 3143 3145 15.3.2. RESULT 3147 3149 struct COPY_NOTIFY4resok { 3150 nfstime4 cnr_lease_time; 3151 stateid4 cnr_stateid; 3152 netloc4 cnr_source_server<>; 3153 }; 3155 union COPY_NOTIFY4res switch (nfsstat4 cnr_status) { 3156 case NFS4_OK: 3157 COPY_NOTIFY4resok resok4; 3158 default: 3159 void; 3160 }; 3162 3164 15.3.3. DESCRIPTION 3166 This operation is used for an inter-server copy. A client sends this 3167 operation in a COMPOUND request to the source server to authorize a 3168 destination server identified by cna_destination_server to read the 3169 file specified by CURRENT_FH on behalf of the given user. 3171 The cna_src_stateid MUST refer to either open or locking states 3172 provided earlier by the server. If it is invalid, then the operation 3173 MUST fail. 3175 The cna_destination_server MUST be specified using the netloc4 3176 network location format. The server is not required to resolve the 3177 cna_destination_server address before completing this operation. 3179 If this operation succeeds, the source server will allow the 3180 cna_destination_server to copy the specified file on behalf of the 3181 given user as long as both of the following conditions are met: 3183 o The destination server begins reading the source file before the 3184 cnr_lease_time expires. If the cnr_lease_time expires while the 3185 destination server is still reading the source file, the 3186 destination server is allowed to finish reading the file. If the 3187 cnr_lease_time expires before the destination server uses READ or 3188 READ_PLUS to begin the transfer, the source server can use 3189 NFS4ERR_PARTNER_NO_AUTH to inform the destination server that the 3190 cnr_lease_time has expired. 3192 o The client has not issued a OFFLOAD_CANCEL for the same 3193 combination of user, filehandle, and destination server. 3195 The cnr_lease_time is chosen by the source server. A cnr_lease_time 3196 of 0 (zero) indicates an infinite lease. To avoid the need for 3197 synchronized clocks, copy lease times are granted by the server as a 3198 time delta. To renew the copy lease time the client should resend 3199 the same copy notification request to the source server. 3201 The cnr_stateid is a copy stateid which uniquely describes the state 3202 needed on the source server to track the proposed copy. As defined 3203 in Section 8.2 of [RFC5661], a stateid is tied to the current 3204 filehandle and if the same stateid is presented by two different 3205 clients, it may refer to different state. As the source does not 3206 know which netloc4 network location the destination might use to 3207 establish the copy operation, it can use the cnr_stateid to identify 3208 that the destination is operating on behalf of the client. Thus the 3209 source server MUST construct copy stateids such that they are 3210 distinct from all other stateids handed out to clients. These copy 3211 stateids MUST denote the same set of locks as each of the earlier 3212 delegation, locking, and open states for the client on the given file 3213 (see Section 4.3.1). 3215 A successful response will also contain a list of netloc4 network 3216 location formats called cnr_source_server, on which the source is 3217 willing to accept connections from the destination. These might not 3218 be reachable from the client and might be located on networks to 3219 which the client has no connection. 3221 For a copy only involving one server (the source and destination are 3222 on the same server), this operation is unnecessary. 3224 15.4. Operation 62: DEALLOCATE - Unreserve Space in a Region of a File 3226 15.4.1. ARGUMENT 3228 3230 struct DEALLOCATE4args { 3231 /* CURRENT_FH: file */ 3232 stateid4 da_stateid; 3233 offset4 da_offset; 3234 length4 da_length; 3235 }; 3237 3239 15.4.2. RESULT 3241 3243 struct DEALLOCATE4res { 3244 nfsstat4 dr_status; 3245 }; 3247 3249 15.4.3. DESCRIPTION 3251 Whenever a client wishes to unreserve space for a region in a file it 3252 calls the DEALLOCATE operation with the current filehandle set to the 3253 filehandle of the file in question, and the start offset and length 3254 in bytes of the region set in da_offset and da_length respectively. 3255 If no space was allocated or reserved for all or parts of the region, 3256 the DEALLOCATE operation will have no effect for the region that 3257 already is in unreserved state. All further reads from the region 3258 passed to DEALLOCATE MUST return zeros until overwritten. 3260 CURRENT_FH must be a regular file. If CURRENT_FH is not a regular 3261 file, the operation MUST fail and return NFS4ERR_WRONG_TYPE. 3263 The da_stateid MUST refer to a stateid that is valid for a WRITE 3264 operation and follows the rules for stateids in Sections 8.2.5 and 3265 18.32.3 of [RFC5661]. 3267 Situations may arise where da_offset and/or da_offset + da_length 3268 will not be aligned to a boundary for which the server does 3269 allocations or deallocations. For most file systems, this is the 3270 block size of the file system. In such a case, the server can 3271 deallocate as many bytes as it can in the region. The blocks that 3272 cannot be deallocated MUST be zeroed. 3274 DEALLOCATE will result in the space_used attribute being decreased by 3275 the number of bytes that were deallocated. The space_freed attribute 3276 may or may not decrease, depending on the support and whether the 3277 blocks backing the specified range were shared or not. The size 3278 attribute will remain unchanged. 3280 15.5. Operation 63: IO_ADVISE - Application I/O access pattern hints 3282 15.5.1. ARGUMENT 3284 3286 enum IO_ADVISE_type4 { 3287 IO_ADVISE4_NORMAL = 0, 3288 IO_ADVISE4_SEQUENTIAL = 1, 3289 IO_ADVISE4_SEQUENTIAL_BACKWARDS = 2, 3290 IO_ADVISE4_RANDOM = 3, 3291 IO_ADVISE4_WILLNEED = 4, 3292 IO_ADVISE4_WILLNEED_OPPORTUNISTIC = 5, 3293 IO_ADVISE4_DONTNEED = 6, 3294 IO_ADVISE4_NOREUSE = 7, 3295 IO_ADVISE4_READ = 8, 3296 IO_ADVISE4_WRITE = 9, 3297 IO_ADVISE4_INIT_PROXIMITY = 10 3298 }; 3300 struct IO_ADVISE4args { 3301 /* CURRENT_FH: file */ 3302 stateid4 iaa_stateid; 3303 offset4 iaa_offset; 3304 length4 iaa_count; 3305 bitmap4 iaa_hints; 3306 }; 3308 3310 15.5.2. RESULT 3312 3313 struct IO_ADVISE4resok { 3314 bitmap4 ior_hints; 3315 }; 3317 union IO_ADVISE4res switch (nfsstat4 ior_status) { 3318 case NFS4_OK: 3319 IO_ADVISE4resok resok4; 3320 default: 3321 void; 3322 }; 3324 3326 15.5.3. DESCRIPTION 3328 The IO_ADVISE operation sends an I/O access pattern hint to the 3329 server for the owner of the stateid for a given byte range specified 3330 by iar_offset and iar_count. The byte range specified by iaa_offset 3331 and iaa_count need not currently exist in the file, but the iaa_hints 3332 will apply to the byte range when it does exist. If iaa_count is 0, 3333 all data following iaa_offset is specified. The server MAY ignore 3334 the advice. 3336 The following are the allowed hints for a stateid holder: 3338 IO_ADVISE4_NORMAL There is no advice to give, this is the default 3339 behavior. 3341 IO_ADVISE4_SEQUENTIAL Expects to access the specified data 3342 sequentially from lower offsets to higher offsets. 3344 IO_ADVISE4_SEQUENTIAL_BACKWARDS Expects to access the specified data 3345 sequentially from higher offsets to lower offsets. 3347 IO_ADVISE4_RANDOM Expects to access the specified data in a random 3348 order. 3350 IO_ADVISE4_WILLNEED Expects to access the specified data in the near 3351 future. 3353 IO_ADVISE4_WILLNEED_OPPORTUNISTIC Expects to possibly access the 3354 data in the near future. This is a speculative hint, and 3355 therefore the server should prefetch data or indirect blocks only 3356 if it can be done at a marginal cost. 3358 IO_ADVISE_DONTNEED Expects that it will not access the specified 3359 data in the near future. 3361 IO_ADVISE_NOREUSE Expects to access the specified data once and then 3362 not reuse it thereafter. 3364 IO_ADVISE4_READ Expects to read the specified data in the near 3365 future. 3367 IO_ADVISE4_WRITE Expects to write the specified data in the near 3368 future. 3370 IO_ADVISE4_INIT_PROXIMITY Informs the server that the data in the 3371 byte range remains important to the client. 3373 Since IO_ADVISE is a hint, a server SHOULD NOT return an error and 3374 invalidate a entire Compound request if one of the sent hints in 3375 iar_hints is not supported by the server. Also, the server MUST NOT 3376 return an error if the client sends contradictory hints to the 3377 server, e.g., IO_ADVISE4_SEQUENTIAL and IO_ADVISE4_RANDOM in a single 3378 IO_ADVISE operation. In these cases, the server MUST return success 3379 and a ior_hints value that indicates the hint it intends to 3380 implement. This may mean simply returning IO_ADVISE4_NORMAL. 3382 The ior_hints returned by the server is primarily for debugging 3383 purposes since the server is under no obligation to carry out the 3384 hints that it describes in the ior_hints result. In addition, while 3385 the server may have intended to implement the hints returned in 3386 ior_hints, as time progresses, the server may need to change its 3387 handling of a given file due to several reasons including, but not 3388 limited to, memory pressure, additional IO_ADVISE hints sent by other 3389 clients, and heuristically detected file access patterns. 3391 The server MAY return different advice than what the client 3392 requested. If it does, then this might be due to one of several 3393 conditions, including, but not limited to another client advising of 3394 a different I/O access pattern; a different I/O access pattern from 3395 another client that that the server has heuristically detected; or 3396 the server is not able to support the requested I/O access pattern, 3397 perhaps due to a temporary resource limitation. 3399 Each issuance of the IO_ADVISE operation overrides all previous 3400 issuances of IO_ADVISE for a given byte range. This effectively 3401 follows a strategy of last hint wins for a given stateid and byte 3402 range. 3404 Clients should assume that hints included in an IO_ADVISE operation 3405 will be forgotten once the file is closed. 3407 15.5.4. IMPLEMENTATION 3409 The NFS client may choose to issue an IO_ADVISE operation to the 3410 server in several different instances. 3412 The most obvious is in direct response to an application's execution 3413 of posix_fadvise(). In this case, IO_ADVISE4_WRITE and 3414 IO_ADVISE4_READ may be set based upon the type of file access 3415 specified when the file was opened. 3417 15.5.5. IO_ADVISE4_INIT_PROXIMITY 3419 The IO_ADVISE4_INIT_PROXIMITY hint is non-posix in origin and can be 3420 used to convey that the client has recently accessed the byte range 3421 in its own cache. I.e., it has not accessed it on the server, but it 3422 has locally. When the server reaches resource exhaustion, knowing 3423 which data is more important allows the server to make better choices 3424 about which data to, for example purge from a cache, or move to 3425 secondary storage. It also informs the server which delegations are 3426 more important, since if delegations are working correctly, once 3427 delegated to a client and the client has read the content for that 3428 byte range, a server might never receive another read request for 3429 that byte range. 3431 The IO_ADVISE4_INIT_PROXIMITY hint can also be used in a pNFS setting 3432 to let the client inform the metadata server as to the I/O statistics 3433 between the client and the storage devices. The metadata server is 3434 then free to use this information about client I/O to optimize the 3435 data storage location. 3437 This hint is also useful in the case of NFS clients which are network 3438 booting from a server. If the first client to be booted sends this 3439 hint, then it keeps the cache warm for the remaining clients. 3441 15.5.6. pNFS File Layout Data Type Considerations 3443 The IO_ADVISE considerations for pNFS are very similar to the COMMIT 3444 considerations for pNFS (see Section 13.7 of [RFC5661]). That is, as 3445 with COMMIT, some NFS server implementations prefer IO_ADVISE be done 3446 on the storage device, and some prefer it be done on the metadata 3447 server. 3449 For the file's layout type, NFSv4.2 includes an additional hint 3450 NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on metadata servers 3451 running NFSv4.2 or higher. Any file's layout obtained from a NFSv4.1 3452 metadata server MUST NOT have NFL42_UFLG_IO_ADVISE_THRU_MDS set. Any 3453 file's layout obtained with a NFSv4.2 metadata server MAY have 3454 NFL42_UFLG_IO_ADVISE_THRU_MDS set. However, if the layout utilizes 3455 NFSv4.1 storage devices, the IO_ADVISE operation cannot be sent to 3456 them. 3458 If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, the client MUST send the 3459 IO_ADVISE operation to the metadata server in order for it to be 3460 honored by the storage device. Once the metadata server receives the 3461 IO_ADVISE operation, it will communicate the advice to each storage 3462 device. 3464 If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then the client SHOULD 3465 send an IO_ADVISE operation to the appropriate storage device for the 3466 specified byte range. While the client MAY always send IO_ADVISE to 3467 the metadata server, if the server has not set 3468 NFL42_UFLG_IO_ADVISE_THRU_MDS, the client should expect that such an 3469 IO_ADVISE is futile. Note that a client SHOULD use the same set of 3470 arguments on each IO_ADVISE sent to a storage device for the same 3471 open file reference. 3473 The server is not required to support different advice for different 3474 storage devices with the same open file reference. 3476 15.5.6.1. Dense and Sparse Packing Considerations 3478 The IO_ADVISE operation MUST use the iar_offset and byte range as 3479 dictated by the presence or absence of NFL4_UFLG_DENSE (see 3480 Section 13.4.4 of [RFC5661]). 3482 E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the 3483 storage device for iaa_offset 0 really means iaa_offset 10000 in the 3484 logical file, then an IO_ADVISE for iaa_offset 0 means iaa_offset 3485 10000. 3487 E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the 3488 storage device for iaa_offset 0 really means iaa_offset 0 in the 3489 logical file, then an IO_ADVISE for iaa_offset 0 means iaa_offset 0 3490 in the logical file. 3492 E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes 3493 and the stripe count is 10, and the dense storage device file is 3494 serving iar_offset 0. A READ or WRITE to the storage device for 3495 iaa_offsets 0, 1000, 2000, and 3000, really mean iaa_offsets 10000, 3496 20000, 30000, and 40000 (implying a stripe count of 10 and a stripe 3497 unit of 1000), then an IO_ADVISE sent to the same storage device with 3498 an iaa_offset of 500, and an iaa_count of 3000 means that the 3499 IO_ADVISE applies to these byte ranges of the dense storage device 3500 file: 3502 - 500 to 999 3503 - 1000 to 1999 3504 - 2000 to 2999 3505 - 3000 to 3499 3507 I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE. 3509 It also applies to these byte ranges of the logical file: 3511 - 10500 to 10999 (500 bytes) 3512 - 20000 to 20999 (1000 bytes) 3513 - 30000 to 30999 (1000 bytes) 3514 - 40000 to 40499 (500 bytes) 3515 (total 3000 bytes) 3517 E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the 3518 stripe count is 4, and the sparse storage device file is serving 3519 iaa_offset 0. Then a READ or WRITE to the storage device for 3520 iaa_offsets 0, 1000, 2000, and 3000, really means iaa_offsets 0, 3521 1000, 2000, and 3000 in the logical file, keeping in mind that on the 3522 storage device file, byte ranges 250 to 999, 1250 to 1999, 2250 to 3523 2999, and 3250 to 3999 are not accessible. Then an IO_ADVISE sent to 3524 the same storage device with an iaa_offset of 500, and a iaa_count of 3525 3000 means that the IO_ADVISE applies to these byte ranges of the 3526 logical file and the sparse storage device file: 3528 - 500 to 999 (500 bytes) - no effect 3529 - 1000 to 1249 (250 bytes) - effective 3530 - 1250 to 1999 (750 bytes) - no effect 3531 - 2000 to 2249 (250 bytes) - effective 3532 - 2250 to 2999 (750 bytes) - no effect 3533 - 3000 to 3249 (250 bytes) - effective 3534 - 3250 to 3499 (250 bytes) - no effect 3535 (subtotal 2250 bytes) - no effect 3536 (subtotal 750 bytes) - effective 3537 (grand total 3000 bytes) - no effect + effective 3539 If neither of the flags NFL42_UFLG_IO_ADVISE_THRU_MDS and 3540 NFL4_UFLG_DENSE are set in the layout, then any IO_ADVISE request 3541 sent to the data server with a byte range that overlaps stripe unit 3542 that the data server does not serve MUST NOT result in the status 3543 NFS4ERR_PNFS_IO_HOLE. Instead, the response SHOULD be successful and 3544 if the server applies IO_ADVISE hints on any stripe units that 3545 overlap with the specified range, those hints SHOULD be indicated in 3546 the response. 3548 15.6. Operation 64: LAYOUTERROR - Provide Errors for the Layout 3550 15.6.1. ARGUMENT 3552 3554 struct device_error4 { 3555 deviceid4 de_deviceid; 3556 nfsstat4 de_status; 3557 nfs_opnum4 de_opnum; 3558 }; 3560 struct LAYOUTERROR4args { 3561 /* CURRENT_FH: file */ 3562 offset4 lea_offset; 3563 length4 lea_length; 3564 stateid4 lea_stateid; 3565 device_error4 lea_errors<>; 3566 }; 3568 3570 15.6.2. RESULT 3572 3574 struct LAYOUTERROR4res { 3575 nfsstat4 ler_status; 3576 }; 3578 3580 15.6.3. DESCRIPTION 3582 The client can use LAYOUTERROR to inform the metadata server about 3583 errors in its interaction with the layout (see Section 12 of 3584 [RFC5661]) represented by the current filehandle, client ID (derived 3585 from the session ID in the preceding SEQUENCE operation), byte-range 3586 (lea_offset + lea_length), and lea_stateid. 3588 Each individual device_error4 describes a single error associated 3589 with a storage device, which is identified via de_deviceid. If the 3590 Layout Type (see Section 12.2.7 of [RFC5661]) supports NFSv4 3591 operations, then the operation which returned the error is identified 3592 via de_opnum. If the Layout Type does not support NFSv4 operations, 3593 then it MAY chose to either map the operation onto one of the allowed 3594 operations which can be sent to a storage device with the File Layout 3595 Type (see Section 3.3) or it can signal no support for operations by 3596 marking de_opnum with the ILLEGAL operation. Finally the NFS error 3597 value (nfsstat4) encountered is provided via de_status and may 3598 consist of the following error codes: 3600 NFS4ERR_NXIO: The client was unable to establish any communication 3601 with the storage device. 3603 NFS4ERR_*: The client was able to establish communication with the 3604 storage device and is returning one of the allowed error codes for 3605 the operation denoted by de_opnum. 3607 Note that while the metadata server may return an error associated 3608 with the layout stateid or the open file, it MUST NOT return an error 3609 in the processing of the errors. If LAYOUTERROR is in a compound 3610 before LAYOUTRETURN, it MUST NOT introduce an error other than what 3611 LAYOUTRETURN would already encounter. 3613 15.6.4. IMPLEMENTATION 3615 There are two broad classes of errors, transient and persistent. The 3616 client SHOULD strive to only use this new mechanism to report 3617 persistent errors. It MUST be able to deal with transient issues by 3618 itself. Also, while the client might consider an issue to be 3619 persistent, it MUST be prepared for the metadata server to consider 3620 such issues to be transient. A prime example of this is if the 3621 metadata server fences off a client from either a stateid or a 3622 filehandle. The client will get an error from the storage device and 3623 might relay either NFS4ERR_ACCESS or NFS4ERR_BAD_STATEID back to the 3624 metadata server, with the belief that this is a hard error. If the 3625 metadata server is informed by the client that there is an error, it 3626 can safely ignore that. For it, the mission is accomplished in that 3627 the client has returned a layout that the metadata server had most 3628 likely recalled. 3630 The client might also need to inform the metadata server that it 3631 cannot reach one or more of the storage devices. While the metadata 3632 server can detect the connectivity of both of these paths: 3634 o metadata server to storage device 3636 o metadata server to client 3637 it cannot determine if the client and storage device path is working. 3638 As with the case of the storage device passing errors to the client, 3639 it must be prepared for the metadata server to consider such outages 3640 as being transitory. 3642 Clients are expected to tolerate transient storage device errors, and 3643 hence clients SHOULD NOT use the LAYOUTERROR error handling for 3644 device access problems that may be transient. The methods by which a 3645 client decides whether a device access problem is transient vs 3646 persistent are implementation-specific, but may include retrying I/Os 3647 to a data server under appropriate conditions. 3649 When an I/O fails to a storage device, the client SHOULD retry the 3650 failed I/O via the metadata server. In this situation, before 3651 retrying the I/O, the client SHOULD return the layout, or the 3652 affected portion thereof, and SHOULD indicate which storage device or 3653 devices was problematic. The client needs to do this when the 3654 storage device is being unresponsive in order to fence off any failed 3655 write attempts, and ensure that they do not end up overwriting any 3656 later data being written through the metadata server. If the client 3657 does not do this, the metadata server MAY issue a layout recall 3658 callback in order to perform the retried I/O. 3660 The client needs to be cognizant that since this error handling is 3661 optional in the metadata server, the metadata server may silently 3662 ignore this functionality. Also, as the metadata server may consider 3663 some issues the client reports to be expected, the client might find 3664 it difficult to detect a metadata server which has not implemented 3665 error handling via LAYOUTERROR. 3667 If an metadata server is aware that a storage device is proving 3668 problematic to a client, the metadata server SHOULD NOT include that 3669 storage device in any pNFS layouts sent to that client. If the 3670 metadata server is aware that a storage device is affecting many 3671 clients, then the metadata server SHOULD NOT include that storage 3672 device in any pNFS layouts sent out. If a client asks for a new 3673 layout for the file from the metadata server, it MUST be prepared for 3674 the metadata server to return that storage device in the layout. The 3675 metadata server might not have any choice in using the storage 3676 device, i.e., there might only be one possible layout for the system. 3677 Also, in the case of existing files, the metadata server might have 3678 no choice in which storage devices to hand out to clients. 3680 The metadata server is not required to indefinitely retain per-client 3681 storage device error information. An metadata server is also not 3682 required to automatically reinstate use of a previously problematic 3683 storage device; administrative intervention may be required instead. 3685 15.7. Operation 65: LAYOUTSTATS - Provide Statistics for the Layout 3687 15.7.1. ARGUMENT 3689 3691 struct layoutupdate4 { 3692 layouttype4 lou_type; 3693 opaque lou_body<>; 3694 }; 3696 struct io_info4 { 3697 uint64_t ii_count; 3698 uint64_t ii_bytes; 3699 }; 3701 struct LAYOUTSTATS4args { 3702 /* CURRENT_FH: file */ 3703 offset4 lsa_offset; 3704 length4 lsa_length; 3705 stateid4 lsa_stateid; 3706 io_info4 lsa_read; 3707 io_info4 lsa_write; 3708 deviceid4 lsa_deviceid; 3709 layoutupdate4 lsa_layoutupdate; 3710 }; 3712 3714 15.7.2. RESULT 3716 3718 struct LAYOUTSTATS4res { 3719 nfsstat4 lsr_status; 3720 }; 3722 3724 15.7.3. DESCRIPTION 3726 The client can use LAYOUTSTATS to inform the metadata server about 3727 its interaction with the layout (see Section 12 of [RFC5661]) 3728 represented by the current filehandle, client ID (derived from the 3729 session ID in the preceding SEQUENCE operation), byte-range 3730 (lsa_offset and lsa_length), and lsa_stateid. lsa_read and lsa_write 3731 allow for non-Layout Type specific statistics to be reported. 3732 lsa_deviceid allows the client to specify to which storage device the 3733 statistics apply. The remaining information the client is presenting 3734 is specific to the Layout Type and presented in the lsa_layoutupdate 3735 field. Each Layout Type MUST define the contents of lsa_layoutupdate 3736 in their respective specifications. 3738 LAYOUTSTATS can be combined with IO_ADVISE (see Section 15.5) to 3739 augment the decision making process of how the metadata server 3740 handles a file. I.e., IO_ADVISE lets the server know that a byte 3741 range has a certain characteristic, but not necessarily the intensity 3742 of that characteristic. 3744 The statistics are cumulative, i.e., multiple LAYOUTSTATS updates can 3745 be in flight at the same time. The metadata server can examine the 3746 packet's timestamp to order the different calls. The first 3747 LAYOUTSTATS sent by the client SHOULD be from the opening of the 3748 file. The choice of how often to update the metadata server is made 3749 by the client. 3751 Note that while the metadata server may return an error associated 3752 with the layout stateid or the open file, it MUST NOT return an error 3753 in the processing of the statistics. 3755 15.8. Operation 66: OFFLOAD_CANCEL - Stop an Offloaded Operation 3757 15.8.1. ARGUMENT 3759 3761 struct OFFLOAD_CANCEL4args { 3762 /* CURRENT_FH: file to cancel */ 3763 stateid4 oca_stateid; 3764 }; 3766 3768 15.8.2. RESULT 3770 3772 struct OFFLOAD_CANCEL4res { 3773 nfsstat4 ocr_status; 3774 }; 3775 3777 15.8.3. DESCRIPTION 3779 OFFLOAD_CANCEL is used by the client to terminate an asynchronous 3780 operation, which is identified both by CURRENT_FH and the 3781 oca_stateid. I.e., there can be multiple offloaded operations acting 3782 on the file, the stateid will identify to the server exactly which 3783 one is to be stopped. Currently there are only two operations which 3784 can decide to be asynchronous: COPY and WRITE_SAME. 3786 In the context of server-to-server copy, the client can send 3787 OFFLOAD_CANCEL to either the source or destination server, albeit 3788 with a different stateid. The client uses OFFLOAD_CANCEL to inform 3789 the destination to stop the active transfer and uses the stateid it 3790 got back from the COPY operation. The client uses OFFLOAD_CANCEL and 3791 the stateid it used in the COPY_NOTIFY to inform the source to not 3792 allow any more copying from the destination. 3794 OFFLOAD_CANCEL is also useful in situations in which the source 3795 server granted a very long or infinite lease on the destination 3796 server's ability to read the source file and all copy operations on 3797 the source file have been completed. 3799 15.9. Operation 67: OFFLOAD_STATUS - Poll for Status of Asynchronous 3800 Operation 3802 15.9.1. ARGUMENT 3804 3806 struct OFFLOAD_STATUS4args { 3807 /* CURRENT_FH: destination file */ 3808 stateid4 osa_stateid; 3809 }; 3811 3813 15.9.2. RESULT 3815 3816 struct OFFLOAD_STATUS4resok { 3817 length4 osr_count; 3818 nfsstat4 osr_complete<1>; 3819 }; 3821 union OFFLOAD_STATUS4res switch (nfsstat4 osr_status) { 3822 case NFS4_OK: 3823 OFFLOAD_STATUS4resok osr_resok4; 3824 default: 3825 void; 3826 }; 3828 3830 15.9.3. DESCRIPTION 3832 OFFLOAD_STATUS can be used by the client to query the progress of an 3833 asynchronous operation, which is identified both by CURRENT_FH and 3834 the osa_stateid. If this operation is successful, the number of 3835 bytes processed are returned to the client in the osr_count field. 3837 If the optional osr_complete field is present, the asynchronous 3838 operation has completed. In this case the status value indicates the 3839 result of the asynchronous operation. In all cases, the server will 3840 also deliver the final results of the asynchronous operation in a 3841 CB_OFFLOAD operation. 3843 The failure of this operation does not indicate the result of the 3844 asynchronous operation in any way. 3846 15.10. Operation 68: READ_PLUS - READ Data or Holes from a File 3848 15.10.1. ARGUMENT 3850 3852 struct READ_PLUS4args { 3853 /* CURRENT_FH: file */ 3854 stateid4 rpa_stateid; 3855 offset4 rpa_offset; 3856 count4 rpa_count; 3857 }; 3859 3861 15.10.2. RESULT 3863 3865 enum data_content4 { 3866 NFS4_CONTENT_DATA = 0, 3867 NFS4_CONTENT_HOLE = 1 3868 }; 3870 struct data_info4 { 3871 offset4 di_offset; 3872 length4 di_length; 3873 }; 3875 struct data4 { 3876 offset4 d_offset; 3877 opaque d_data<>; 3878 }; 3880 union read_plus_content switch (data_content4 rpc_content) { 3881 case NFS4_CONTENT_DATA: 3882 data4 rpc_data; 3883 case NFS4_CONTENT_HOLE: 3884 data_info4 rpc_hole; 3885 default: 3886 void; 3887 }; 3889 /* 3890 * Allow a return of an array of contents. 3891 */ 3892 struct read_plus_res4 { 3893 bool rpr_eof; 3894 read_plus_content rpr_contents<>; 3895 }; 3897 union READ_PLUS4res switch (nfsstat4 rp_status) { 3898 case NFS4_OK: 3899 read_plus_res4 rp_resok4; 3900 default: 3901 void; 3902 }; 3904 3906 15.10.3. DESCRIPTION 3908 The READ_PLUS operation is based upon the NFSv4.1 READ operation (see 3909 Section 18.22 of [RFC5661]) and similarly reads data from the regular 3910 file identified by the current filehandle. 3912 The client provides a rpa_offset of where the READ_PLUS is to start 3913 and a rpa_count of how many bytes are to be read. A rpa_offset of 3914 zero means to read data starting at the beginning of the file. If 3915 rpa_offset is greater than or equal to the size of the file, the 3916 status NFS4_OK is returned with di_length (the data length) set to 3917 zero and eof set to TRUE. 3919 The READ_PLUS result is comprised of an array of rpr_contents, each 3920 of which describe a data_content4 type of data. For NFSv4.2, the 3921 allowed values are data and hole. A server MUST support both the 3922 data type and the hole if it uses READ_PLUS. If it does not want to 3923 support a hole, it MUST use READ. The array contents MUST be 3924 contiguous in the file. 3926 Holes SHOULD be returned in their entirety - clients must be prepared 3927 to get more information than they requested. Both the start and the 3928 end of the hole may exceed what was requested. If data to be 3929 returned is comprised entirely of zeros, then the server SHOULD 3930 return that data as a hole instead. 3932 The server may elect to return adjacent elements of the same type. 3933 For example, if the server has a range of data comprised entirely of 3934 zeros and then a hole, it might want to return two adjacent holes to 3935 the client. 3937 If the client specifies a rpa_count value of zero, the READ_PLUS 3938 succeeds and returns zero bytes of data. In all situations, the 3939 server may choose to return fewer bytes than specified by the client. 3940 The client needs to check for this condition and handle the condition 3941 appropriately. 3943 If the client specifies an rpa_offset and rpa_count value that is 3944 entirely contained within a hole of the file, then the di_offset and 3945 di_length returned MAY be for the entire hole. If the owner has a 3946 locked byte range covering rpa_offset and rpa_count entirely the 3947 di_offset and di_length MUST NOT be extended outside the locked byte 3948 range. This result is considered valid until the file is changed 3949 (detected via the change attribute). The server MUST provide the 3950 same semantics for the hole as if the client read the region and 3951 received zeroes; the implied holes contents lifetime MUST be exactly 3952 the same as any other read data. 3954 If the client specifies an rpa_offset and rpa_count value that begins 3955 in a non-hole of the file but extends into hole the server should 3956 return an array comprised of both data and a hole. The client MUST 3957 be prepared for the server to return a short read describing just the 3958 data. The client will then issue another READ_PLUS for the remaining 3959 bytes, which the server will respond with information about the hole 3960 in the file. 3962 Except when special stateids are used, the stateid value for a 3963 READ_PLUS request represents a value returned from a previous byte- 3964 range lock or share reservation request or the stateid associated 3965 with a delegation. The stateid identifies the associated owners if 3966 any and is used by the server to verify that the associated locks are 3967 still valid (e.g., have not been revoked). 3969 If the read ended at the end-of-file (formally, in a correctly formed 3970 READ_PLUS operation, if rpa_offset + rpa_count is equal to the size 3971 of the file), or the READ_PLUS operation extends beyond the size of 3972 the file (if rpa_offset + rpa_count is greater than the size of the 3973 file), eof is returned as TRUE; otherwise, it is FALSE. A successful 3974 READ_PLUS of an empty file will always return eof as TRUE. 3976 If the current filehandle is not an ordinary file, an error will be 3977 returned to the client. In the case that the current filehandle 3978 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If 3979 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 3980 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 3982 For a READ_PLUS with a stateid value of all bits equal to zero, the 3983 server MAY allow the READ_PLUS to be serviced subject to mandatory 3984 byte-range locks or the current share deny modes for the file. For a 3985 READ_PLUS with a stateid value of all bits equal to one, the server 3986 MAY allow READ_PLUS operations to bypass locking checks at the 3987 server. 3989 On success, the current filehandle retains its value. 3991 15.10.3.1. Note on Client Support of Arms of the Union 3993 It was decided not to add a means for the client to inform the server 3994 as to which arms of READ_PLUS it would support. In a later minor 3995 version, it may become necessary for the introduction of a new 3996 operation which would allow the client to inform the server as to 3997 whether it supported the new arms of the union of data types 3998 available in READ_PLUS. 4000 15.10.4. IMPLEMENTATION 4002 In general, the IMPLEMENTATION notes for READ in Section 18.22.4 of 4003 [RFC5661] also apply to READ_PLUS. 4005 15.10.4.1. Additional pNFS Implementation Information 4007 With pNFS, the semantics of using READ_PLUS remains the same. Any 4008 data server MAY return a hole result for a READ_PLUS request that it 4009 receives. When a data server chooses to return such a result, it has 4010 the option of returning information for the data stored on that data 4011 server (as defined by the data layout), but it MUST NOT return 4012 results for a byte range that includes data managed by another data 4013 server. 4015 If mandatory locking is enforced, then the data server must also 4016 ensure that to return only information that is within the owner's 4017 locked byte range. 4019 15.10.5. READ_PLUS with Sparse Files Example 4021 The following table describes a sparse file. For each byte range, 4022 the file contains either non-zero data or a hole. In addition, the 4023 server in this example will only create a hole if it is greater than 4024 32K. 4026 +-------------+----------+ 4027 | Byte-Range | Contents | 4028 +-------------+----------+ 4029 | 0-15999 | Hole | 4030 | 16K-31999 | Non-Zero | 4031 | 32K-255999 | Hole | 4032 | 256K-287999 | Non-Zero | 4033 | 288K-353999 | Hole | 4034 | 354K-417999 | Non-Zero | 4035 +-------------+----------+ 4037 Table 7 4039 Under the given circumstances, if a client was to read from the file 4040 with a max read size of 64K, the following will be the results for 4041 the given READ_PLUS calls. This assumes the client has already 4042 opened the file, acquired a valid stateid ('s' in the example), and 4043 just needs to issue READ_PLUS requests. 4045 1. READ_PLUS(s, 0, 64K) --> NFS_OK, eof = false, . Since the first hole is less than the server's 4047 minimum hole size, the first 32K of the file is returned as data 4048 and the remaining 32K is returned as a hole which actually 4049 extends to 256K. 4051 2. READ_PLUS(s, 32K, 64K) --> NFS_OK, eof = false, 4052 The requested range was all zeros, and the current hole begins at 4053 offset 32K and is 224K in length. Note that the client should 4054 not have followed up the previous READ_PLUS request with this one 4055 as the hole information from the previous call extended past what 4056 the client was requesting. 4058 3. READ_PLUS(s, 256K, 64K) --> NFS_OK, eof = false, . Returns an array of the 32K data and 4060 the hole which extends to 354K. 4062 4. READ_PLUS(s, 354K, 64K) --> NFS_OK, eof = true, . Returns the final 64K of data and informs the client 4064 there is no more data in the file. 4066 15.11. Operation 69: SEEK - Find the Next Data or Hole 4068 15.11.1. ARGUMENT 4070 4072 enum data_content4 { 4073 NFS4_CONTENT_DATA = 0, 4074 NFS4_CONTENT_HOLE = 1 4075 }; 4077 struct SEEK4args { 4078 /* CURRENT_FH: file */ 4079 stateid4 sa_stateid; 4080 offset4 sa_offset; 4081 data_content4 sa_what; 4082 }; 4084 4086 15.11.2. RESULT 4088 4090 struct seek_res4 { 4091 bool sr_eof; 4092 offset4 sr_offset; 4093 }; 4094 union SEEK4res switch (nfsstat4 sa_status) { 4095 case NFS4_OK: 4096 seek_res4 resok4; 4097 default: 4098 void; 4099 }; 4101 4103 15.11.3. DESCRIPTION 4105 SEEK is an operation that allows a client to determine the location 4106 of the next data_content4 in a file. It allows an implementation of 4107 the emerging extension to lseek(2) to allow clients to determine the 4108 next hole whilst in data or the next data whilst in a hole. 4110 From the given sa_offset, find the next data_content4 of type sa_what 4111 in the file. If the server can not find a corresponding sa_what, 4112 then the status will still be NFS4_OK, but sr_eof would be TRUE. If 4113 the server can find the sa_what, then the sr_offset is the start of 4114 that content. If the sa_offset is beyond the end of the file, then 4115 SEEK MUST return NFS4ERR_NXIO. 4117 All files MUST have a virtual hole at the end of the file. I.e., if 4118 a filesystem does not support sparse files, then a compound with 4119 {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;} 4120 where 'X' was the size of the file. 4122 SEEK must follow the same rules for stateids as READ_PLUS 4123 (Section 15.10.3). 4125 15.12. Operation 70: WRITE_SAME - WRITE an ADB Multiple Times to a File 4127 15.12.1. ARGUMENT 4129 4131 enum stable_how4 { 4132 UNSTABLE4 = 0, 4133 DATA_SYNC4 = 1, 4134 FILE_SYNC4 = 2 4135 }; 4136 struct app_data_block4 { 4137 offset4 adb_offset; 4138 length4 adb_block_size; 4139 length4 adb_block_count; 4140 length4 adb_reloff_blocknum; 4141 count4 adb_block_num; 4142 length4 adb_reloff_pattern; 4143 opaque adb_pattern<>; 4144 }; 4146 struct WRITE_SAME4args { 4147 /* CURRENT_FH: file */ 4148 stateid4 wsa_stateid; 4149 stable_how4 wsa_stable; 4150 app_data_block4 wsa_adb; 4151 }; 4153 4155 15.12.2. RESULT 4157 4159 struct write_response4 { 4160 stateid4 wr_callback_id<1>; 4161 length4 wr_count; 4162 stable_how4 wr_committed; 4163 verifier4 wr_writeverf; 4164 }; 4166 union WRITE_SAME4res switch (nfsstat4 wsr_status) { 4167 case NFS4_OK: 4168 write_response4 resok4; 4169 default: 4170 void; 4171 }; 4173 4175 15.12.3. DESCRIPTION 4177 The WRITE_SAME operation writes an application data block to the 4178 regular file identified by the current filehandle (see WRITE SAME 4179 (10) in [T10-SBC2]). The target file is specified by the current 4180 filehandle. The data to be written is specified by an 4181 app_data_block4 structure (Section 8.1.1). The client specifies with 4182 the wsa_stable parameter the method of how the data is to be 4183 processed by the server. It is treated like the stable parameter in 4184 the NFSv4.1 WRITE operation (see Section 18.2 of [RFC5661]). 4186 A successful WRITE_SAME will construct a reply for wr_count, 4187 wr_committed, and wr_writeverf as per the NFSv4.1 WRITE operation 4188 results. If wr_callback_id is set, it indicates an asynchronous 4189 reply (see Section 15.12.3.1). 4191 WRITE_SAME has to support all of the errors which are returned by 4192 WRITE plus NFS4ERR_NOTSUPP, i.e., it is an OPTIONAL operation. If 4193 the client supports WRITE_SAME, it MUST support CB_OFFLOAD. 4195 If the server supports ADBs, then it MUST support the WRITE_SAME 4196 operation. The server has no concept of the structure imposed by the 4197 application. It is only when the application writes to a section of 4198 the file does order get imposed. In order to detect corruption even 4199 before the application utilizes the file, the application will want 4200 to initialize a range of ADBs using WRITE_SAME. 4202 When the client invokes the WRITE_SAME operation, it wants to record 4203 the block structure described by the app_data_block4 on to the file. 4205 When the server receives the WRITE_SAME operation, it MUST populate 4206 adb_block_count ADBs in the file starting at adb_offset. The block 4207 size will be given by adb_block_size. The ADBN (if provided) will 4208 start at adb_reloff_blocknum and each block will be monotonically 4209 numbered starting from adb_block_num in the first block. The pattern 4210 (if provided) will be at adb_reloff_pattern of each block and will be 4211 provided in adb_pattern. 4213 The server SHOULD return an asynchronous result if it can determine 4214 the operation will be long running (see Section 15.12.3.1). Once 4215 either the WRITE_SAME finishes synchronously or the server uses 4216 CB_OFFLOAD to inform the client of the asynchronous completion of the 4217 WRITE_SAME, the server MUST return the ADBs to clients as data. 4219 15.12.3.1. Asynchronous Transactions 4221 ADB initialization may lead to server determining to service the 4222 operation asynchronously. If it decides to do so, it sets the 4223 stateid in wr_callback_id to be that of the wsa_stateid. If it does 4224 not set the wr_callback_id, then the result is synchronous. 4226 When the client determines that the reply will be given 4227 asynchronously, it should not assume anything about the contents of 4228 what it wrote until it is informed by the server that the operation 4229 is complete. It can use OFFLOAD_STATUS (Section 15.9) to monitor the 4230 operation and OFFLOAD_CANCEL (Section 15.8) to cancel the operation. 4231 An example of a asynchronous WRITE_SAME is shown in Figure 6. Note 4232 that as with the COPY operation, WRITE_SAME must provide a stateid 4233 for tracking the asynchronous operation. 4235 Client Server 4236 + + 4237 | | 4238 |--- OPEN ---------------------------->| Client opens 4239 |<------------------------------------/| the file 4240 | | 4241 |--- WRITE_SAME ----------------------->| Client initializes 4242 |<------------------------------------/| an ADB 4243 | | 4244 | | 4245 |--- OFFLOAD_STATUS ------------------>| Client may poll 4246 |<------------------------------------/| for status 4247 | | 4248 | . | Multiple OFFLOAD_STATUS 4249 | . | operations may be sent. 4250 | . | 4251 | | 4252 |<-- CB_OFFLOAD -----------------------| Server reports results 4253 |\------------------------------------>| 4254 | | 4255 |--- CLOSE --------------------------->| Client closes 4256 |<------------------------------------/| the file 4257 | | 4258 | | 4260 Figure 6: An asynchronous WRITE_SAME. 4262 When CB_OFFLOAD informs the client of the successful WRITE_SAME, the 4263 write_response4 embedded in the operation will provide the necessary 4264 information that a synchronous WRITE_SAME would have provided. 4266 Regardless of whether the operation is asynchronous or synchronous, 4267 it MUST still support the COMMIT operation semantics as outlined in 4268 Section 18.3 of [RFC5661]. I.e., COMMIT works on one or more WRITE 4269 operations and the WRITE_SAME operation can appear as several WRITE 4270 operations to the server. The client can use locking operations to 4271 control the behavior on the server with respect to long running 4272 asynchronous write operations. 4274 15.12.3.2. Error Handling of a Partially Complete WRITE_SAME 4276 WRITE_SAME will clone adb_block_count copies of the given ADB in 4277 consecutive order in the file starting at adb_offset. An error can 4278 occur after writing the Nth ADB to the file. WRITE_SAME MUST appear 4279 to populate the range of the file as if the client used WRITE to 4280 transfer the instantiated ADBs. I.e., the contents of the range will 4281 be easy for the client to determine in case of a partially complete 4282 WRITE_SAME. 4284 15.13. Operation 71: CLONE - Clone a range of file into another file 4286 15.13.1. ARGUMENT 4288 4290 struct CLONE4args { 4291 /* SAVED_FH: source file */ 4292 /* CURRENT_FH: destination file */ 4293 stateid4 cl_src_stateid; 4294 stateid4 cl_dst_stateid; 4295 offset4 cl_src_offset; 4296 offset4 cl_dst_offset; 4297 length4 cl_count; 4298 }; 4300 4302 15.13.2. RESULT 4304 4306 struct CLONE4res { 4307 nfsstat4 cl_status; 4308 }; 4310 4312 15.13.3. DESCRIPTION 4314 The CLONE operation is used to clone file content from a source file 4315 specified by the SAVED_FH value into a destination file specified by 4316 CURRENT_FH without actually copying the data, e.g., by using a copy- 4317 on-write mechanism. 4319 Both SAVED_FH and CURRENT_FH must be regular files. If either 4320 SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail 4321 and return NFS4ERR_WRONG_TYPE. 4323 The ca_dst_stateid MUST refer to a stateid that is valid for a WRITE 4324 operation and follows the rules for stateids in Sections 8.2.5 and 4325 18.32.3 of [RFC5661]. The ca_src_stateid MUST refer to a stateid 4326 that is valid for a READ operations and follows the rules for 4327 stateids in Sections 8.2.5 and 18.22.3 of [RFC5661]. If either 4328 stateid is invalid, then the operation MUST fail. 4330 The cl_src_offset is the starting offset within the source file from 4331 which the data to be cloned will be obtained and the cl_dst_offset is 4332 the starting offset of the target region into which the cloned data 4333 will be placed. An offset of 0 (zero) indicates the start of the 4334 respective file. The number of bytes to be cloned is obtained from 4335 cl_count, except that a cl_count of 0 (zero) indicates that the 4336 number of bytes to be cloned is the count of bytes between 4337 cl_src_offset and the EOF of the source file. Both cl_src_offset and 4338 cl_dst_offset must be aligned to the clone block size Section 12.2.1. 4339 The number of bytes to be cloned must be a multiple of the clone 4340 block size, except in the case in which cl_src_offset plus the number 4341 of bytes to be cloned is equal to the source file size. 4343 If the source offset or the source offset plus count is greater than 4344 the size of the source file, the operation MUST fail with 4345 NFS4ERR_INVAL. The destination offset or destination offset plus 4346 count may be greater than the size of the destination file. 4348 If SAVED_FH and CURRENT_FH refer to the same file and the source and 4349 target ranges overlap, the operation MUST fail with NFS4ERR_INVAL. 4351 If the target area of the clone operation ends beyond the end of the 4352 destination file, the offset at the end of the target area will 4353 determine the new size of the destination file. The contents of any 4354 block not part of the target area will be the same as if the file 4355 size were extended by a WRITE. 4357 If the area to be cloned is not a multiple of the clone block size 4358 and the size of the destination file is past the end of the target 4359 area, the area between the end of the target area and the next 4360 multiple of the clone block size will be zeroed. 4362 The CLONE operation is atomic in that other operations may not see 4363 any intermediate states between the state of the two files before the 4364 operation and that after the operation. READs of the destination 4365 file will never see some blocks of the target area cloned without all 4366 of them being cloned. WRITEs of the source area will either have no 4367 effect on the data of the target file or be fully reflected in the 4368 target area of the destination file. 4370 The completion status of the operation is indicated by cr_status. 4372 16. NFSv4.2 Callback Operations 4374 16.1. Operation 15: CB_OFFLOAD - Report results of an asynchronous 4375 operation 4377 16.1.1. ARGUMENT 4379 4381 struct write_response4 { 4382 stateid4 wr_callback_id<1>; 4383 length4 wr_count; 4384 stable_how4 wr_committed; 4385 verifier4 wr_writeverf; 4386 }; 4388 union offload_info4 switch (nfsstat4 coa_status) { 4389 case NFS4_OK: 4390 write_response4 coa_resok4; 4391 default: 4392 length4 coa_bytes_copied; 4393 }; 4395 struct CB_OFFLOAD4args { 4396 nfs_fh4 coa_fh; 4397 stateid4 coa_stateid; 4398 offload_info4 coa_offload_info; 4399 }; 4401 4403 16.1.2. RESULT 4405 4407 struct CB_OFFLOAD4res { 4408 nfsstat4 cor_status; 4409 }; 4411 4413 16.1.3. DESCRIPTION 4415 CB_OFFLOAD is used to report to the client the results of an 4416 asynchronous operation, e.g., Server Side Copy or WRITE_SAME. The 4417 coa_fh and coa_stateid identify the transaction and the coa_status 4418 indicates success or failure. The coa_resok4.wr_callback_id MUST NOT 4419 be set. If the transaction failed, then the coa_bytes_copied 4420 contains the number of bytes copied before the failure occurred. The 4421 coa_bytes_copied value indicates the number of bytes copied but not 4422 which specific bytes have been copied. 4424 If the client supports any of the following operations: 4426 COPY: for both intra-server and inter-server asynchronous copies 4428 WRITE_SAME: for ADB initialization 4430 then the client is REQUIRED to support the CB_OFFLOAD operation. 4432 There is a potential race between the reply to the original 4433 transaction on the forechannel and the CB_OFFLOAD callback on the 4434 backchannel. Sections 2.10.6.3 and 20.9.3 of [RFC5661] describe how 4435 to handle this type of issue. 4437 Upon success, the coa_resok4.wr_count presents for each operation: 4439 COPY: the total number of bytes copied 4441 WRITE_SAME: the same information that a synchronous WRITE_SAME would 4442 provide 4444 17. Security Considerations 4446 NFSv4.2 has all of the security concerns present in NFSv4.1 (see 4447 Section 21 of [RFC5661]) and those present in the Server Side Copy 4448 (see Section 4.9) and in Labeled NFS (see Section 9.6). 4450 18. IANA Considerations 4452 The IANA Considerations for Labeled NFS are addressed in [RFC7569]. 4454 19. References 4456 19.1. Normative References 4458 [I-D.ietf-nfsv4-minorversion2-dot-x] 4459 Haynes, T., "NFSv4 Minor Version 2 Protocol External Data 4460 Representation Standard (XDR) Description", draft-ietf- 4461 nfsv4-minorversion2-dot-x-40 (work in progress), January 4462 2016. 4464 [I-D.ietf-nfsv4-rpcsec-gssv3] 4465 Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 4466 Security Version 3", draft-ietf-nfsv4-rpcsec-gssv3-17 4467 (work in progress), January 2016. 4469 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 4470 Resource Identifier (URI): Generic Syntax", STD 66, RFC 4471 3986, January 2005. 4473 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 4474 System (NFS) Version 4 Minor Version 1 Protocol", RFC 4475 5661, January 2010. 4477 [RFC5662] Shepler, S., Eisler, M., and D. Noveck, "Network File 4478 System (NFS) Version 4 Minor Version 1 External Data 4479 Representation Standard (XDR) Description", RFC 5662, 4480 January 2010. 4482 [RFC7569] Quigley, D., Lu, J., and T. Haynes, "Registry 4483 Specification for Mandatory Access Control (MAC) Security 4484 Label Formats", RFC 7569, July 2015. 4486 [posix_fadvise] 4487 The Open Group, "Section 'posix_fadvise()' of System 4488 Interfaces of The Open Group Base Specifications Issue 6, 4489 IEEE Std 1003.1, 2004 Edition", 2004. 4491 [posix_fallocate] 4492 The Open Group, "Section 'posix_fallocate()' of System 4493 Interfaces of The Open Group Base Specifications Issue 6, 4494 IEEE Std 1003.1, 2004 Edition", 2004. 4496 19.2. Informative References 4498 [Ashdown08] 4499 Ashdown, L., "Chapter 15, Validating Database Files and 4500 Backups, of Oracle Database Backup and Recovery User's 4501 Guide 11g Release 1 (11.1)", August 2008. 4503 [Baira08] Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci- 4504 Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data 4505 Corruption in the Storage Stack", Proceedings of the 6th 4506 USENIX Symposium on File and Storage Technologies (FAST 4507 '08) , 2008. 4509 [I-D.ietf-nfsv4-versioning] 4510 Noveck, D., "NFSv4 Version Management", draft-ietf- 4511 nfsv4-versioning-03 (work in progress), January 2016. 4513 [IESG08] IESG, "IESG Processing of RFC Errata for the IETF Stream", 4514 2008. 4516 [LB96] LaPadula, L. and D. Bell, "MITRE technical report 2547, 4517 volume II", Journal of Computer Security, Volume 4, Issue 4518 2-3, 249-263 IOS Press, Amsterdam, The Netherlands, 4519 January 1996. 4521 [McDougall07] 4522 McDougall, R. and J. Mauro, "Section 11.4.3, Detecting 4523 Memory Corruption of Solaris Internals", 2007. 4525 [RFC1108] Kent, S., "Security Options for the Internet Protocol", 4526 RFC 1108, November 1991. 4528 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4529 Requirement Levels", March 1997. 4531 [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the 4532 Internet Protocol", RFC 2401, November 1998. 4534 [RFC4506] Eisler, M., "XDR: External Data Representation Standard", 4535 RFC 4506, May 2006. 4537 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 4538 4949, August 2007. 4540 [RFC5663] Black, D., Fridella, S., and J. Glasgow, "Parallel NFS 4541 (pNFS) Block/Volume Layout", RFC 5663, January 2010. 4543 [RFC7204] Haynes, T., "Requirements for Labeled NFS", RFC 7204, 4544 April 2014. 4546 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 4547 Protocol (HTTP/1.1): Message Syntax and Routing", RFC 4548 7230, DOI 10.17487/RFC7230, June 2014, 4549 . 4551 [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) 4552 version 4 Protocol", RFC 7530, March 2015. 4554 [RFC959] Postel, J. and J. Reynolds, "File Transfer Protocol", STD 4555 9, RFC 959, October 1985. 4557 [Strohm11] 4558 Strohm, R., "Chapter 2, Data Blocks, Extents, and 4559 Segments, of Oracle Database Concepts 11g Release 1 4560 (11.1)", January 2011. 4562 [T10-SBC2] 4563 Elliott, R., Ed., "ANSI INCITS 405-2005, Information 4564 Technology - SCSI Block Commands - 2 (SBC-2)", November 4565 2004. 4567 Appendix A. Acknowledgments 4569 Tom Haynes would like to thank NetApp, Inc. for its funding of his 4570 time on this project. 4572 For the Sharing change attribute implementation characteristics with 4573 NFSv4 clients, the original draft was by Trond Myklebust. 4575 For the NFS Server Side Copy, the original draft was by James 4576 Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul 4577 Iyer. Tom Talpey co-authored an unpublished version of that 4578 document. It was also was reviewed by a number of individuals: 4579 Pranoop Erasani, Tom Haynes, Arthur Lent, Trond Myklebust, Dave 4580 Noveck, Theresa Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani, 4581 and Nico Williams. Anna Schumaker's early prototyping experience 4582 helped us avoid some traps. Also, both Olga Kornievskaia and Andy 4583 Adamson brought implementation experience to the use of copy stateids 4584 in inter-server copy. Jorge Mora was able to optimize the handling 4585 of errors for the result of COPY. 4587 For the NFS space reservation operations, the original draft was by 4588 Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer. 4590 For the sparse file support, the original draft was by Dean 4591 Hildebrand and Marc Eshel. Valuable input and advice was received 4592 from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and 4593 Richard Scheffenegger. 4595 For the Application IO Hints, the original draft was by Dean 4596 Hildebrand, Mike Eisler, Trond Myklebust, and Sam Falkner. Some 4597 early reviewers included Benny Halevy and Pranoop Erasani. 4599 For Labeled NFS, the original draft was by David Quigley, James 4600 Morris, Jarret Lu, and Tom Haynes. Peter Staubach, Trond Myklebust, 4601 Stephen Smalley, Sorin Faibish, Nico Williams, and David Black also 4602 contributed in the final push to get this accepted. 4604 Christoph Hellwig was very helpful in getting the WRITE_SAME 4605 semantics to model more of what T10 was doing for WRITE SAME (10) 4606 [T10-SBC2]. And he led the push to get space reservations to more 4607 closely model the posix_fallocate. 4609 Andy Adamson picked up the RPCSEC_GSSv3 work, which enabled both 4610 Labeled NFS and Server Side Copy to be present more secure options. 4612 Christoph Hellwig provided the update to GETDEVICELIST. 4614 Jorge Mora provided a very detailed review and caught some important 4615 issues with the tables. 4617 During the review process, Talia Reyes-Ortiz helped the sessions run 4618 smoothly. While many people contributed here and there, the core 4619 reviewers were Andy Adamson, Pranoop Erasani, Bruce Fields, Chuck 4620 Lever, Trond Myklebust, David Noveck, Peter Staubach, and Mike 4621 Kupfer. 4623 Elwyn Davies was the General Area Reviewer for this document and her 4624 insights as to the relationship of this document and both [RFC5661] 4625 and [RFC7530] were very much appreciated! 4627 Appendix B. RFC Editor Notes 4629 [RFC Editor: please remove this section prior to publishing this 4630 document as an RFC] 4632 [RFC Editor: prior to publishing this document as an RFC, please 4633 replace all occurrences of I-D.ietf-nfsv4-minorversion2-dot-x with 4634 RFCxxxx where xxxx is the RFC number of the companion XDR document] 4636 Author's Address 4638 Thomas Haynes 4639 Primary Data, Inc. 4640 4300 El Camino Real Ste 100 4641 Los Altos, CA 94022 4642 USA 4644 Phone: +1 408 215 1519 4645 Email: thomas.haynes@primarydata.com