idnits 2.17.1 draft-ietf-nfsv4-minorversion2-40.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 06, 2016) is 3030 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 4005 == Missing Reference: '32K' is mentioned on line 4005, but not defined == Outdated reference: A later version (-41) exists of draft-ietf-nfsv4-minorversion2-dot-x-40 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) -- Obsolete informational reference (is this intentional?): RFC 2401 (Obsoleted by RFC 4301) -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Haynes 3 Internet-Draft Primary Data 4 Intended status: Standards Track January 06, 2016 5 Expires: July 9, 2016 7 NFS Version 4 Minor Version 2 8 draft-ietf-nfsv4-minorversion2-40.txt 10 Abstract 12 This Internet-Draft describes NFS version 4 minor version two, 13 describing the protocol extensions made from NFS version 4 minor 14 version 1. Major extensions introduced in NFS version 4 minor 15 version two include: Server Side Copy, Application Input/Output (I/O) 16 Advise, Space Reservations, Sparse Files, Application Data Blocks, 17 and Labeled NFS. 19 Requirements Language 21 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 22 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 23 document are to be interpreted as described in RFC 2119 [RFC2119]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on July 9, 2016. 42 Copyright Notice 44 Copyright (c) 2016 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.1. Scope of This Document . . . . . . . . . . . . . . . . . 5 61 1.2. NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . . 5 62 1.3. Overview of NFSv4.2 Features . . . . . . . . . . . . . . 5 63 1.3.1. Server Side Clone and Copy . . . . . . . . . . . . . 5 64 1.3.2. Application Input/Output (I/O) Advise . . . . . . . . 6 65 1.3.3. Sparse Files . . . . . . . . . . . . . . . . . . . . 6 66 1.3.4. Space Reservation . . . . . . . . . . . . . . . . . . 6 67 1.3.5. Application Data Block (ADB) Support . . . . . . . . 6 68 1.3.6. Labeled NFS . . . . . . . . . . . . . . . . . . . . . 7 69 1.3.7. Layout Enhancements . . . . . . . . . . . . . . . . . 7 70 1.4. Enhancements to Minor Versioning Model . . . . . . . . . 7 71 2. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 7 72 3. pNFS considerations for New Operations . . . . . . . . . . . 8 73 3.1. Atomicity for ALLOCATE and DEALLOCATE . . . . . . . . . . 8 74 3.2. Sharing of stateids with NFSv4.1 . . . . . . . . . . . . 8 75 3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout 76 Type . . . . . . . . . . . . . . . . . . . . . . . . . . 9 77 3.3.1. Operations Sent to NFSv4.2 Data Servers . . . . . . . 9 78 4. Server Side Copy . . . . . . . . . . . . . . . . . . . . . . 9 79 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 9 80 4.2. Protocol Overview . . . . . . . . . . . . . . . . . . . . 9 81 4.2.1. Copy Operations . . . . . . . . . . . . . . . . . . . 10 82 4.2.2. Requirements for Operations . . . . . . . . . . . . . 11 83 4.3. Requirements for Inter-Server Copy . . . . . . . . . . . 12 84 4.4. Implementation Considerations . . . . . . . . . . . . . . 12 85 4.4.1. Locking the Files . . . . . . . . . . . . . . . . . . 12 86 4.4.2. Client Caches . . . . . . . . . . . . . . . . . . . . 13 87 4.5. Intra-Server Copy . . . . . . . . . . . . . . . . . . . . 13 88 4.6. Inter-Server Copy . . . . . . . . . . . . . . . . . . . . 14 89 4.7. Server-to-Server Copy Protocol . . . . . . . . . . . . . 18 90 4.7.1. Considerations on Selecting a Copy Protocol . . . . . 18 91 4.7.2. Using NFSv4.x as the Copy Protocol . . . . . . . . . 18 92 4.7.3. Using an Alternative Copy Protocol . . . . . . . . . 18 93 4.8. netloc4 - Network Locations . . . . . . . . . . . . . . . 19 94 4.9. Copy Offload Stateids . . . . . . . . . . . . . . . . . . 20 95 4.10. Security Considerations . . . . . . . . . . . . . . . . . 20 96 4.10.1. Inter-Server Copy Security . . . . . . . . . . . . . 21 98 5. Support for Application I/O Hints . . . . . . . . . . . . . . 28 99 6. Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . 28 100 6.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 29 101 6.2. New Operations . . . . . . . . . . . . . . . . . . . . . 30 102 6.2.1. READ_PLUS . . . . . . . . . . . . . . . . . . . . . . 30 103 6.2.2. DEALLOCATE . . . . . . . . . . . . . . . . . . . . . 30 104 7. Space Reservation . . . . . . . . . . . . . . . . . . . . . . 30 105 8. Application Data Block Support . . . . . . . . . . . . . . . 32 106 8.1. Generic Framework . . . . . . . . . . . . . . . . . . . . 33 107 8.1.1. Data Block Representation . . . . . . . . . . . . . . 34 108 8.2. An Example of Detecting Corruption . . . . . . . . . . . 34 109 8.3. Example of READ_PLUS . . . . . . . . . . . . . . . . . . 36 110 8.4. An Example of Zeroing Space . . . . . . . . . . . . . . . 36 111 9. Labeled NFS . . . . . . . . . . . . . . . . . . . . . . . . . 37 112 9.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 37 113 9.2. MAC Security Attribute . . . . . . . . . . . . . . . . . 38 114 9.2.1. Delegations . . . . . . . . . . . . . . . . . . . . . 39 115 9.2.2. Permission Checking . . . . . . . . . . . . . . . . . 39 116 9.2.3. Object Creation . . . . . . . . . . . . . . . . . . . 39 117 9.2.4. Existing Objects . . . . . . . . . . . . . . . . . . 40 118 9.2.5. Label Changes . . . . . . . . . . . . . . . . . . . . 40 119 9.3. pNFS Considerations . . . . . . . . . . . . . . . . . . . 40 120 9.4. Discovery of Server Labeled NFS Support . . . . . . . . . 41 121 9.5. MAC Security NFS Modes of Operation . . . . . . . . . . . 41 122 9.5.1. Full Mode . . . . . . . . . . . . . . . . . . . . . . 41 123 9.5.2. Guest Mode . . . . . . . . . . . . . . . . . . . . . 43 124 9.6. Security Considerations for Labeled NFS . . . . . . . . . 43 125 10. Sharing change attribute implementation characteristics with 126 NFSv4 clients . . . . . . . . . . . . . . . . . . . . . . . . 43 127 11. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 44 128 11.1. Error Definitions . . . . . . . . . . . . . . . . . . . 44 129 11.1.1. General Errors . . . . . . . . . . . . . . . . . . . 45 130 11.1.2. Server to Server Copy Errors . . . . . . . . . . . . 45 131 11.1.3. Labeled NFS Errors . . . . . . . . . . . . . . . . . 46 132 11.2. New Operations and Their Valid Errors . . . . . . . . . 46 133 11.3. New Callback Operations and Their Valid Errors . . . . . 50 134 12. New File Attributes . . . . . . . . . . . . . . . . . . . . . 51 135 12.1. New RECOMMENDED Attributes - List and Definition 136 References . . . . . . . . . . . . . . . . . . . . . . . 51 137 12.2. Attribute Definitions . . . . . . . . . . . . . . . . . 52 138 13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 54 139 14. Modifications to NFSv4.1 Operations . . . . . . . . . . . . . 58 140 14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 58 141 14.2. Operation 48: GETDEVICELIST - Get All Device Mappings 142 for a File System . . . . . . . . . . . . . . . . . . . 59 143 15. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . 61 144 15.1. Operation 59: ALLOCATE - Reserve Space in A Region of a 145 File . . . . . . . . . . . . . . . . . . . . . . . . . . 61 147 15.2. Operation 60: COPY - Initiate a server-side copy . . . . 62 148 15.3. Operation 61: COPY_NOTIFY - Notify a source server of a 149 future copy . . . . . . . . . . . . . . . . . . . . . . 66 150 15.4. Operation 62: DEALLOCATE - Unreserve Space in a Region 151 of a File . . . . . . . . . . . . . . . . . . . . . . . 68 152 15.5. Operation 63: IO_ADVISE - Application I/O access pattern 153 hints . . . . . . . . . . . . . . . . . . . . . . . . . 70 154 15.6. Operation 64: LAYOUTERROR - Provide Errors for the 155 Layout . . . . . . . . . . . . . . . . . . . . . . . . . 76 156 15.7. Operation 65: LAYOUTSTATS - Provide Statistics for the 157 Layout . . . . . . . . . . . . . . . . . . . . . . . . . 79 158 15.8. Operation 66: OFFLOAD_CANCEL - Stop an Offloaded 159 Operation . . . . . . . . . . . . . . . . . . . . . . . 80 160 15.9. Operation 67: OFFLOAD_STATUS - Poll for Status of 161 Asynchronous Operation . . . . . . . . . . . . . . . . . 81 162 15.10. Operation 68: READ_PLUS - READ Data or Holes from a File 82 163 15.11. Operation 69: SEEK - Find the Next Data or Hole . . . . 87 164 15.12. Operation 70: WRITE_SAME - WRITE an ADB Multiple Times 165 to a File . . . . . . . . . . . . . . . . . . . . . . . 88 166 15.13. Operation 71: CLONE - Clone a range of file into another 167 file . . . . . . . . . . . . . . . . . . . . . . . . . . 92 168 16. NFSv4.2 Callback Operations . . . . . . . . . . . . . . . . . 94 169 16.1. Operation 15: CB_OFFLOAD - Report results of an 170 asynchronous operation . . . . . . . . . . . . . . . . . 94 171 17. Security Considerations . . . . . . . . . . . . . . . . . . . 95 172 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 96 173 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 96 174 19.1. Normative References . . . . . . . . . . . . . . . . . . 96 175 19.2. Informative References . . . . . . . . . . . . . . . . . 97 176 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 98 177 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 99 178 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 99 180 1. Introduction 182 The NFS version 4 minor version 2 (NFSv4.2) protocol is the third 183 minor version of the NFS version 4 (NFSv4) protocol. The first minor 184 version, NFSv4.0, is described in [RFC7530] and the second minor 185 version, NFSv4.1, is described in [RFC5661]. 187 As a minor version, NFSv4.2 is consistent with the overall goals for 188 NFSv4, but extends the protocol so as to better meet those goals, 189 based on experiences with NFSv4.1. In addition, NFSv4.2 has adopted 190 some additional goals, which motivate some of the major extensions in 191 NFSv4.2. 193 1.1. Scope of This Document 195 This document describes the NFSv4.2 protocol. With respect to 196 NFSv4.0 and NFSv4.1, this document does not: 198 o describe the NFSv4.0 or NFSv4.1 protocols, except where needed to 199 contrast with NFSv4.2 201 o modify the specification of the NFSv4.0 or NFSv4.1 protocols 203 o clarify the NFSv4.0 or NFSv4.1 protocols, that is any 204 clarifications made here apply only to NFSv4.2 and neither of the 205 prior protocols 207 NFSv4.2 is a superset of NFSv4.1, with all of the new features being 208 optional. As such, NFSv4.2 maintains the same compatibility that 209 NFSv4.1 had with NFSv4.0. Any interactions of a new feature with 210 NFSv4.1 semantics, is described in the relevant text. 212 The full External Data Representation (XDR) [RFC4506] for NFSv4.2 is 213 presented in [I-D.ietf-nfsv4-minorversion2-dot-x]. 215 1.2. NFSv4.2 Goals 217 A major goal of the design of NFSv4.2 is to take common local file 218 system features and offer them remotely. These features might 220 o already be available on the servers, e.g., sparse files 222 o be under development as a new standard, e.g., SEEK pulls in both 223 SEEK_HOLE and SEEK_DATA 225 o be used by clients with the servers via some proprietary means, 226 e.g., Labeled NFS 228 NFSv4.2 provides means for clients to leverage these features on the 229 server in cases in which that had previously not been possible within 230 the confines of the NFS protocol. 232 1.3. Overview of NFSv4.2 Features 234 1.3.1. Server Side Clone and Copy 236 A traditional file copy of a remotely accessed file, whether from one 237 server to another or between locations in the same server, results in 238 the data being put on the network twice - source to client and then 239 client to destination. New operations are introduced to allow 240 unnecessary traffic to be eliminated: 242 o The intra-server clone feature allows the client to request a 243 synchronous cloning, perhaps by copy-on-write semantics. 245 o The intra-server copy feature allows the client to request the 246 server to perform the copy internally, avoiding unnecessary 247 network traffic. 249 o The inter-server copy feature allows the client to authorize the 250 source and destination servers to interact directly. 252 As such copies can be lengthy, asynchronous support is also provided. 254 1.3.2. Application Input/Output (I/O) Advise 256 Applications and clients want to advise the server as to expected I/O 257 behavior. Using IO_ADVISE (see Section 15.5) to communicate future I 258 /O behavior such as whether a file will be accessed sequentially or 259 randomly, and whether a file will or will not be accessed in the near 260 future, allows servers to optimize future I/O requests for a file by, 261 for example, prefetching or evicting data. This operation can be 262 used to support the posix_fadvise [posix_fadvise] function. In 263 addition, it may be helpful to applications such as databases and 264 video editors. 266 1.3.3. Sparse Files 268 Sparse files are ones which have unallocated or uninitialized data 269 blocks as holes in the file. Such holes are typically transferred as 270 0s when read from the file. READ_PLUS (see Section 15.10) allows a 271 server to send back to the client metadata describing the hole and 272 DEALLOCATE (see Section 15.4) allows the client to punch holes into a 273 file. In addition, SEEK (see Section 15.11) is provided to scan for 274 the next hole or data from a given location. 276 1.3.4. Space Reservation 278 When a file is sparse, one concern applications have is ensuring that 279 there will always be enough data blocks available for the file during 280 future writes. ALLOCATE (see Section 15.1) allows a client to 281 request a guarantee that space will be available. Also DEALLOCATE 282 (see Section 15.4) allows the client to punch a hole into a file, 283 thus releasing a space reservation. 285 1.3.5. Application Data Block (ADB) Support 287 Some applications treat a file as if it were a disk and as such want 288 to initialize (or format) the file image. We introduce WRITE_SAME 289 (see Section 15.12) to send this metadata to the server to allow it 290 to write the block contents. 292 1.3.6. Labeled NFS 294 While both clients and servers can employ Mandatory Access Control 295 (MAC) security models to enforce data access, there has been no 296 protocol support for interoperability. A new file object attribute, 297 sec_label (see Section 12.2.4) allows for the server to store MAC 298 labels on files, which the client retrieves and uses to enforce data 299 access (see Section 9.5.2). The format of the sec_label accommodates 300 any MAC security system. 302 1.3.7. Layout Enhancements 304 In the parallel NFS implementations of NFSv4.1 (see Section 12 of 305 [RFC5661]), the client cannot communicate back to the metadata server 306 any errors or performance characteristics with the storage devices. 307 NFSv4.2 provides two new operations to do so respectively: 308 LAYOUTERROR (see Section 15.6) and LAYOUTSTATS (see Section 15.7). 310 1.4. Enhancements to Minor Versioning Model 312 In NFSv4.1, the only way to introduce new variants of an operation 313 was to introduce a new operation. For instance, READ would have to 314 be replaced or supplemented by, say, either READ2 or READ_PLUS. With 315 the use of discriminated unions as parameters to such functions in 316 NFSv4.2, it is possible to add a new arm in a subsequent minor 317 version. And it is also possible to move such an operation from 318 OPTIONAL/RECOMMENDED to REQUIRED. Forcing an implementation to adopt 319 each arm of a discriminated union at such a time does not meet the 320 spirit of the minor versioning rules. As such, new arms of a 321 discriminated union MUST follow the same guidelines for minor 322 versioning as operations in NFSv4.1 - i.e., they may not be made 323 REQUIRED. To support this, a new error code, NFS4ERR_UNION_NOTSUPP, 324 allows the server to communicate to the client that the operation is 325 supported, but the specific arm of the discriminated union is not. 327 2. Minor Versioning 329 NFSv4.2 is a minor version of NFSv4 and is built upon NFSv4.1 as 330 documented in [RFC5661] and [RFC5662]. 332 NFSv4.2 does not modify the rules applicable to the NFSv4 versioning 333 process and follows the rules set out in [RFC5661] or in standard- 334 track documents updating that document (e.g., in an RFC based on 335 [NFSv4-Versioning]). 337 NFSv4.2 only defines extensions to NFSv4.1, each of which may be 338 supported (or not) independently. It does not 340 o introduce infrastructural features 342 o make existing features MANDATORY to NOT implement 344 o change the status of existing features (i.e., by changing their 345 status among OPTIONAL, RECOMMENDED, REQUIRED). 347 The following versioning-related considerations should be noted. 349 o When a new case is added to an existing switch, servers need to 350 report non-support of that new case by returning 351 NFS4ERR_UNION_NOTSUPP. 353 o As regards the potential cross-minor-version transfer of stateids, 354 Parallel NFS (pNFS) (see Section 12 of [RFC5661]) implementations 355 of the file mapping type may support of use of an NFSv4.2 metadata 356 sever (see Sections 1.7.2.2 and 12.2.2 of [RFC5661]) with NFSv4.1 357 data servers. In this context, a stateid returned by an NFSv4.2 358 COMPOUND will be used in an NFSv4.1 COMPOUND directed to the data 359 server (see Sections 3.2 and 3.3). 361 3. pNFS considerations for New Operations 363 The interactions of the new operations with non-pNFS functionality is 364 straight forward and covered in the relevant sections. However, the 365 interactions of the new operations with pNFS is more complicated and 366 this section provides an overview. 368 3.1. Atomicity for ALLOCATE and DEALLOCATE 370 Both ALLOCATE (see Section 15.1) and DEALLOCATE (see Section 15.4) 371 are sent to the metadata server, which is responsible for 372 coordinating the changes onto the storage devices. In particular, 373 both operations must either fully succeed or fail, it cannot be the 374 case that one storage device succeeds whilst another fails. 376 3.2. Sharing of stateids with NFSv4.1 378 A NFSv4.2 metadata server can hand out a layout to a NFSv4.1 storage 379 device. Section 13.9.1 of [RFC5661] discusses how the client gets a 380 stateid from the metadata server to present to a storage device. 382 3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout Type 384 A file layout provided by a NFSv4.2 server may refer either to a 385 storage device that only implements NFSv4.1 as specified in 386 [RFC5661], or to a storage device that implements additions from 387 NFSv4.2, in which case the rules in Section 3.3.1 apply. As the File 388 Layout Type does not provide a means for informing the client as to 389 which minor version a particular storage device is providing, the 390 client will have to negotiate this with the storage device via the 391 normal Remote Procedure Call (RPC) semantics of major and minor 392 version discovery. E.g., as per Section 16.2.3 of [RFC5661], the 393 client could try a COMPOUND with a minorversion of 2 and if it gets 394 NFS4ERR_MINOR_VERS_MISMATCH, drop back to 1. 396 3.3.1. Operations Sent to NFSv4.2 Data Servers 398 In addition to the commands listed in [RFC5661], NFSv4.2 data servers 399 MAY accept a COMPOUND containing the following additional operations: 400 IO_ADVISE (see Section 15.5), READ_PLUS (see Section 15.10), 401 WRITE_SAME (see Section 15.12), and SEEK (see Section 15.11), which 402 will be treated like the subset specified as "Operations Sent to 403 NFSv4.1 Data Servers" in Section 13.6 of [RFC5661]. 405 Additional details on the implementation of these operations in a 406 pNFS context are documented in the operation specific sections. 408 4. Server Side Copy 410 4.1. Introduction 412 The server-side copy features provide mechanisms which allow an NFS 413 client to copy file data on a server or between two servers without 414 the data being transmitted back and forth over the network through 415 the NFS client. Without these features, an NFS client would copy 416 data from one location to another by reading the data from the source 417 server over the network, and then writing the data back over the 418 network to the destination server. 420 If the source object and destination object are on different file 421 servers, the file servers will communicate with one another to 422 perform the copy operation. The server-to-server protocol by which 423 this is accomplished is not defined in this document. 425 4.2. Protocol Overview 427 The server-side copy offload operations support both intra-server and 428 inter-server file copies. An intra-server copy is a copy in which 429 the source file and destination file reside on the same server. In 430 an inter-server copy, the source file and destination file are on 431 different servers. In both cases, the copy may be performed 432 synchronously or asynchronously. 434 In addition, the CLONE operation provides copy-like functionality in 435 the intra-sever case which is both synchronous and atomic, in that 436 other operations may not see the target file in any state between 437 that before the clone operation and after it. 439 Throughout the rest of this document, we refer to the NFS server 440 containing the source file as the "source server" and the NFS server 441 to which the file is transferred as the "destination server". In the 442 case of an intra-server copy, the source server and destination 443 server are the same server. Therefore in the context of an intra- 444 server copy, the terms source server and destination server refer to 445 the single server performing the copy. 447 The new operations are designed to copy files or regions within them. 448 Other file system objects can be copied by building on these 449 operations or using other techniques. For example, if a user wishes 450 to copy a directory, the client can synthesize a directory copy 451 operation by first creating the destination directory and the 452 individual (empty) files within it, and then copying the contents of 453 the source directory's files to files in the new destination 454 directory. 456 For the inter-server copy, the operations are defined to be 457 compatible with the traditional copy authorization approach. The 458 client and user are authorized at the source for reading. Then they 459 are authorized at the destination for writing. 461 4.2.1. Copy Operations 463 CLONE: Used by the client to request an synchronous atomic copy-like 464 operation. (Section 15.13) 466 COPY_NOTIFY: Used by the client to request the source server to 467 authorize a future file copy that will be made by a given 468 destination server on behalf of the given user. (Section 15.3) 470 COPY: Used by the client to request a file copy. (Section 15.2) 472 OFFLOAD_CANCEL: Used by the client to terminate an asynchronous file 473 copy. (Section 15.8) 475 OFFLOAD_STATUS: Used by the client to poll the status of an 476 asynchronous file copy. (Section 15.9) 478 CB_OFFLOAD: Used by the destination server to report the results of 479 an asynchronous file copy to the client. (Section 16.1) 481 4.2.2. Requirements for Operations 483 Three OPTIONAL features are provided relative to server-side copy. A 484 server may choose independently to implement any of them. A server 485 implementing any of these features may be REQUIRED to implement 486 certain operations. Other operations are OPTIONAL in the context of 487 a particular feature Section 13, but may become REQUIRED depending on 488 server behavior. Clients need to use these operations to 489 successfully copy a file. 491 For a client to do an intra-server file copy, it needs to use either 492 the COPY or the CLONE operation. If COPY is used the client MUST 493 support the CB_OFFLOAD operation. If COPY is used and it returns a 494 stateid, then the client MAY use the OFFLOAD_CANCEL and 495 OFFLOAD_STATUS operations. 497 For a client to do an inter-server file copy, then it needs to use 498 the COPY and COPY_NOTIFY operations and MUST support the CB_OFFLOAD 499 operation. If COPY returns a stateid, then the client MAY use the 500 OFFLOAD_CANCEL and OFFLOAD_STATUS operations. 502 If a server supports intra-server copy feature, then the server MUST 503 support the COPY operation. If a server's COPY operation returns a 504 stateid, then the server MUST also support these operations: 505 CB_OFFLOAD, OFFLOAD_CANCEL, and OFFLOAD_STATUS. 507 If a server supports the clone feature, then it MUST support the 508 CLONE operations and the clone_blksize attribute on any filesystem on 509 which CLONE is supported (as either source or destination file). 511 If a source server supports inter-server copy feature, then it MUST 512 support the operations COPY_NOTIFY and OFFLOAD_CANCEL. If a 513 destination server supports inter-server copy feature, then it MUST 514 support the COPY operation. If a destination server's COPY operation 515 returns a stateid, then the destination server MUST also support 516 these operations: CB_OFFLOAD, OFFLOAD_CANCEL, COPY_NOTIFY, and 517 OFFLOAD_STATUS. 519 Each operation is performed in the context of the user identified by 520 the Open Network Computing (ONC) RPC credential of its containing 521 COMPOUND or CB_COMPOUND request. For example, an OFFLOAD_CANCEL 522 operation issued by a given user indicates that a specified COPY 523 operation initiated by the same user be canceled. Therefore an 524 OFFLOAD_CANCEL MUST NOT interfere with a copy of the same file 525 initiated by another user. 527 An NFS server MAY allow an administrative user to monitor or cancel 528 copy operations using an implementation specific interface. 530 4.3. Requirements for Inter-Server Copy 532 The specification of inter-server copy is driven by several 533 requirements: 535 o The specification MUST NOT mandate the server-to-server protocol. 537 o The specification MUST provide guidance for using NFSv4.x as a 538 copy protocol. For those source and destination servers willing 539 to use NFSv4.x, there are specific security considerations that 540 this specification MUST address. 542 o The specification MUST NOT mandate preconfiguration between the 543 source and destination server. Requiring that the source and 544 destination first have a "copying relationship" increases the 545 administrative burden. However the specification MUST NOT 546 preclude implementations that require preconfiguration. 548 o The specification MUST NOT mandate a trust relationship between 549 the source and destination server. The NFSv4 security model 550 requires mutual authentication between a principal on an NFS 551 client and a principal on an NFS server. This model MUST continue 552 with the introduction of COPY. 554 4.4. Implementation Considerations 556 4.4.1. Locking the Files 558 Both the source and destination file may need to be locked to protect 559 the content during the copy operations. A client can achieve this by 560 a combination of OPEN and LOCK operations. I.e., either share or 561 byte range locks might be desired. 563 Note that when the client establishes a lock stateid on the source, 564 the context of that stateid is for the client and not the 565 destination. As such, there might already be an outstanding stateid, 566 issued to the destination as client of the source, with the same 567 value as that provided for the lock stateid. The source MUST 568 interpret the lock stateid as that of the client, i.e., when the 569 destination presents it in the context of a inter-server copy, it is 570 on behalf of the client. 572 4.4.2. Client Caches 574 In a traditional copy, if the client is in the process of writing to 575 the file before the copy (and perhaps with a write delegation), it 576 will be straightforward to update the destination server. With an 577 inter-server copy, the source has no insight into the changes cached 578 on the client. The client SHOULD write back the data to the source. 579 If it does not do so, it is possible that the destination will 580 receive a corrupt copy of file. 582 4.5. Intra-Server Copy 584 To copy a file on a single server, the client uses a COPY operation. 585 The server may respond to the copy operation with the final results 586 of the copy or it may perform the copy asynchronously and deliver the 587 results using a CB_OFFLOAD operation callback. If the copy is 588 performed asynchronously, the client may poll the status of the copy 589 using OFFLOAD_STATUS or cancel the copy using OFFLOAD_CANCEL. 591 A synchronous intra-server copy is shown in Figure 1. In this 592 example, the NFS server chooses to perform the copy synchronously. 593 The copy operation is completed, either successfully or 594 unsuccessfully, before the server replies to the client's request. 595 The server's reply contains the final result of the operation. 597 Client Server 598 + + 599 | | 600 |--- OPEN ---------------------------->| Client opens 601 |<------------------------------------/| the source file 602 | | 603 |--- OPEN ---------------------------->| Client opens 604 |<------------------------------------/| the destination file 605 | | 606 |--- COPY ---------------------------->| Client requests 607 |<------------------------------------/| a file copy 608 | | 609 |--- CLOSE --------------------------->| Client closes 610 |<------------------------------------/| the destination file 611 | | 612 |--- CLOSE --------------------------->| Client closes 613 |<------------------------------------/| the source file 614 | | 615 | | 617 Figure 1: A synchronous intra-server copy. 619 An asynchronous intra-server copy is shown in Figure 2. In this 620 example, the NFS server performs the copy asynchronously. The 621 server's reply to the copy request indicates that the copy operation 622 was initiated and the final result will be delivered at a later time. 623 The server's reply also contains a copy stateid. The client may use 624 this copy stateid to poll for status information (as shown) or to 625 cancel the copy using an OFFLOAD_CANCEL. When the server completes 626 the copy, the server performs a callback to the client and reports 627 the results. 629 Client Server 630 + + 631 | | 632 |--- OPEN ---------------------------->| Client opens 633 |<------------------------------------/| the source file 634 | | 635 |--- OPEN ---------------------------->| Client opens 636 |<------------------------------------/| the destination file 637 | | 638 |--- COPY ---------------------------->| Client requests 639 |<------------------------------------/| a file copy 640 | | 641 | | 642 |--- OFFLOAD_STATUS ------------------>| Client may poll 643 |<------------------------------------/| for status 644 | | 645 | . | Multiple OFFLOAD_STATUS 646 | . | operations may be sent. 647 | . | 648 | | 649 |<-- CB_OFFLOAD -----------------------| Server reports results 650 |\------------------------------------>| 651 | | 652 |--- CLOSE --------------------------->| Client closes 653 |<------------------------------------/| the destination file 654 | | 655 |--- CLOSE --------------------------->| Client closes 656 |<------------------------------------/| the source file 657 | | 658 | | 660 Figure 2: An asynchronous intra-server copy. 662 4.6. Inter-Server Copy 664 A copy may also be performed between two servers. The copy protocol 665 is designed to accommodate a variety of network topologies. As shown 666 in Figure 3, the client and servers may be connected by multiple 667 networks. In particular, the servers may be connected by a 668 specialized, high speed network (network 192.0.2.0/24 in the diagram) 669 that does not include the client. The protocol allows the client to 670 setup the copy between the servers (over network 203.0.113.0/24 in 671 the diagram) and for the servers to communicate on the high speed 672 network if they choose to do so. 674 192.0.2.0/24 675 +-------------------------------------+ 676 | | 677 | | 678 | 192.0.2.18 | 192.0.2.56 679 +-------+------+ +------+------+ 680 | Source | | Destination | 681 +-------+------+ +------+------+ 682 | 203.0.113.18 | 203.0.113.56 683 | | 684 | | 685 | 203.0.113.0/24 | 686 +------------------+------------------+ 687 | 688 | 689 | 203.0.113.243 690 +-----+-----+ 691 | Client | 692 +-----------+ 694 Figure 3: An example inter-server network topology. 696 For an inter-server copy, the client notifies the source server that 697 a file will be copied by the destination server using a COPY_NOTIFY 698 operation. The client then initiates the copy by sending the COPY 699 operation to the destination server. The destination server may 700 perform the copy synchronously or asynchronously. 702 A synchronous inter-server copy is shown in Figure 4. In this case, 703 the destination server chooses to perform the copy before responding 704 to the client's COPY request. 706 An asynchronous copy is shown in Figure 5. In this case, the 707 destination server chooses to respond to the client's COPY request 708 immediately and then perform the copy asynchronously. 710 Client Source Destination 711 + + + 712 | | | 713 |--- OPEN --->| | Returns 714 |<------------------/| | open state os1 715 | | | 716 |--- COPY_NOTIFY --->| | 717 |<------------------/| | 718 | | | 719 |--- OPEN ---------------------------->| Returns 720 |<------------------------------------/| open state os2 721 | | | 722 |--- COPY ---------------------------->| 723 | | | 724 | | | 725 | |<----- read -----| 726 | |\--------------->| 727 | | | 728 | | . | Multiple reads may 729 | | . | be necessary 730 | | . | 731 | | | 732 | | | 733 |<------------------------------------/| Destination replies 734 | | | to COPY 735 | | | 736 |--- CLOSE --------------------------->| Release os2 737 |<------------------------------------/| 738 | | | 739 |--- CLOSE --->| | Release os1 740 |<------------------/| | 742 Figure 4: A synchronous inter-server copy. 744 Client Source Destination 745 + + + 746 | | | 747 |--- OPEN --->| | Returns 748 |<------------------/| | open state os1 749 | | | 750 |--- LOCK --->| | Optional, could be done 751 |<------------------/| | with a share lock 752 | | | 753 |--- COPY_NOTIFY --->| | Need to pass in 754 |<------------------/| | os1 or lock state 755 | | | 756 | | | 757 | | | 758 |--- OPEN ---------------------------->| Returns 759 |<------------------------------------/| open state os2 760 | | | 761 |--- LOCK ---------------------------->| Optional ... 762 |<------------------------------------/| 763 | | | 764 |--- COPY ---------------------------->| Need to pass in 765 |<------------------------------------/| os2 or lock state 766 | | | 767 | | | 768 | |<----- read -----| 769 | |\--------------->| 770 | | | 771 | | . | Multiple reads may 772 | | . | be necessary 773 | | . | 774 | | | 775 | | | 776 |--- OFFLOAD_STATUS ------------------>| Client may poll 777 |<------------------------------------/| for status 778 | | | 779 | | . | Multiple OFFLOAD_STATUS 780 | | . | operations may be sent 781 | | . | 782 | | | 783 | | | 784 | | | 785 |<-- CB_OFFLOAD -----------------------| Destination reports 786 |\------------------------------------>| results 787 | | | 788 |--- LOCKU --------------------------->| Only if LOCK was done 789 |<------------------------------------/| 790 | | | 791 |--- CLOSE --------------------------->| Release os2 792 |<------------------------------------/| 793 | | | 794 |--- LOCKU --->| | Only if LOCK was done 795 |<------------------/| | 796 | | | 797 |--- CLOSE --->| | Release os1 798 |<------------------/| | 799 | | | 801 Figure 5: An asynchronous inter-server copy. 803 4.7. Server-to-Server Copy Protocol 805 The choice of what protocol to use in an inter-server copy is 806 ultimately the destination server's decision. However, the 807 destination server has to be cognizant that it is working on behalf 808 of the client. 810 4.7.1. Considerations on Selecting a Copy Protocol 812 The client can have requirements over both the size of transactions 813 and error recovery semantics. It may want to split the copy up such 814 that each chunk is synchronously transferred. It may want the copy 815 protocol to copy the bytes in consecutive order such that upon an 816 error, the client can restart the copy at the last known good offset. 817 If the destination server cannot meet these requirements, the client 818 may prefer the traditional copy mechanism such that it can meet those 819 requirements. 821 4.7.2. Using NFSv4.x as the Copy Protocol 823 The destination server MAY use standard NFSv4.x (where x >= 1) 824 operations to read the data from the source server. If NFSv4.x is 825 used for the server-to-server copy protocol, the destination server 826 can use the source filehandle and ca_src_stateid provided in the COPY 827 request with standard NFSv4.x operations to read data from the source 828 server. Note that the ca_src_stateid MUST be the cnr_stateid 829 returned from the source via the COPY_NOTIFY (Section 15.3). 831 4.7.3. Using an Alternative Copy Protocol 833 In a homogeneous environment, the source and destination servers 834 might be able to perform the file copy extremely efficiently using 835 specialized protocols. For example the source and destination 836 servers might be two nodes sharing a common file system format for 837 the source and destination file systems. Thus the source and 838 destination are in an ideal position to efficiently render the image 839 of the source file to the destination file by replicating the file 840 system formats at the block level. Another possibility is that the 841 source and destination might be two nodes sharing a common storage 842 area network, and thus there is no need to copy any data at all, and 843 instead ownership of the file and its contents might simply be re- 844 assigned to the destination. To allow for these possibilities, the 845 destination server is allowed to use a server-to-server copy protocol 846 of its choice. 848 In a heterogeneous environment, using a protocol other than NFSv4.x 849 (e.g., HTTP [RFC7230] or FTP [RFC959]) presents some challenges. In 850 particular, the destination server is presented with the challenge of 851 accessing the source file given only an NFSv4.x filehandle. 853 One option for protocols that identify source files with path names 854 is to use an ASCII hexadecimal representation of the source 855 filehandle as the file name. 857 Another option for the source server is to use URLs to direct the 858 destination server to a specialized service. For example, the 859 response to COPY_NOTIFY could include the URL ftp:// 860 s1.example.com:9999/_FH/0x12345, where 0x12345 is the ASCII 861 hexadecimal representation of the source filehandle. When the 862 destination server receives the source server's URL, it would use 863 "_FH/0x12345" as the file name to pass to the FTP server listening on 864 port 9999 of s1.example.com. On port 9999 there would be a special 865 instance of the FTP service that understands how to convert NFS 866 filehandles to an open file descriptor (in many operating systems, 867 this would require a new system call, one which is the inverse of the 868 makefh() function that the pre-NFSv4 MOUNT service needs). 870 Authenticating and identifying the destination server to the source 871 server is also a challenge. Recommendations for how to accomplish 872 this are given in Section 4.10.1.3. 874 4.8. netloc4 - Network Locations 876 The server-side copy operations specify network locations using the 877 netloc4 data type shown below: 879 881 enum netloc_type4 { 882 NL4_NAME = 1, 883 NL4_URL = 2, 884 NL4_NETADDR = 3 885 }; 886 union netloc4 switch (netloc_type4 nl_type) { 887 case NL4_NAME: utf8str_cis nl_name; 888 case NL4_URL: utf8str_cis nl_url; 889 case NL4_NETADDR: netaddr4 nl_addr; 890 }; 892 894 If the netloc4 is of type NL4_NAME, the nl_name field MUST be 895 specified as a UTF-8 string. The nl_name is expected to be resolved 896 to a network address via DNS, Lightweight Directory Access Protocol 897 (LDAP), Network Information Service (NIS), /etc/hosts, or some other 898 means. If the netloc4 is of type NL4_URL, a server URL [RFC3986] 899 appropriate for the server-to-server copy operation is specified as a 900 UTF-8 string. If the netloc4 is of type NL4_NETADDR, the nl_addr 901 field MUST contain a valid netaddr4 as defined in Section 3.3.9 of 902 [RFC5661]. 904 When netloc4 values are used for an inter-server copy as shown in 905 Figure 3, their values may be evaluated on the source server, 906 destination server, and client. The network environment in which 907 these systems operate should be configured so that the netloc4 values 908 are interpreted as intended on each system. 910 4.9. Copy Offload Stateids 912 A server may perform a copy offload operation asynchronously. An 913 asynchronous copy is tracked using a copy offload stateid. Copy 914 offload stateids are included in the COPY, OFFLOAD_CANCEL, 915 OFFLOAD_STATUS, and CB_OFFLOAD operations. 917 A copy offload stateid will be valid until either (A) the client or 918 server restarts or (B) the client returns the resource by issuing a 919 OFFLOAD_CANCEL operation or the client replies to a CB_OFFLOAD 920 operation. 922 A copy offload stateid's seqid MUST NOT be zero. In the context of a 923 copy offload operation, it is ambiguous to indicate the most recent 924 copy offload operation using a stateid with seqid of zero. Therefore 925 a copy offload stateid with seqid of zero MUST be considered invalid. 927 4.10. Security Considerations 929 The security considerations pertaining to NFSv4.1 [RFC5661] apply to 930 this section. And as such, the standard security mechanisms used by 931 the protocol can be used to secure the server-to-server operations. 933 NFSv4 clients and servers supporting the inter-server copy operations 934 described in this chapter are REQUIRED to implement the mechanism 935 described in Section 4.10.1.1, and to support rejecting COPY_NOTIFY 936 requests that do not use RPCSEC_GSS with privacy. If the server-to- 937 server copy protocol is ONC RPC based, the servers are also REQUIRED 938 to implement [rpcsec_gssv3] including the RPCSEC_GSSv3 copy_to_auth, 939 copy_from_auth, and copy_confirm_auth structured privileges. This 940 requirement to implement is not a requirement to use; for example, a 941 server may depending on configuration also allow COPY_NOTIFY requests 942 that use only AUTH_SYS. 944 If a server requires the use of RPCSEC_GSSv3 copy_to_auth, 945 copy_from_auth, or copy_confirm_auth and it is not used, the server 946 will reject the request with NFS4ERR_PARTNER_NO_AUTH. 948 4.10.1. Inter-Server Copy Security 950 4.10.1.1. Inter-Server Copy via ONC RPC with RPCSEC_GSSv3 952 When the client sends a COPY_NOTIFY to the source server to expect 953 the destination to attempt to copy data from the source server, it is 954 expected that this copy is being done on behalf of the principal 955 (called the "user principal") that sent the RPC request that encloses 956 the COMPOUND procedure that contains the COPY_NOTIFY operation. The 957 user principal is identified by the RPC credentials. A mechanism 958 that allows the user principal to authorize the destination server to 959 perform the copy, that lets the source server properly authenticate 960 the destination's copy, and does not allow the destination server to 961 exceed this authorization, is necessary. 963 An approach that sends delegated credentials of the client's user 964 principal to the destination server is not used for the following 965 reason. If the client's user delegated its credentials, the 966 destination would authenticate as the user principal. If the 967 destination were using the NFSv4 protocol to perform the copy, then 968 the source server would authenticate the destination server as the 969 user principal, and the file copy would securely proceed. However, 970 this approach would allow the destination server to copy other files. 971 The user principal would have to trust the destination server to not 972 do so. This is counter to the requirements, and therefore is not 973 considered. 975 Instead, a feature of the RPCSEC_GSSv3 [rpcsec_gssv3] protocol can be 976 used: RPC application defined structured privilege assertion. This 977 features allow the destination server to authenticate to the source 978 server as acting on behalf of the user principal, and to authorize 979 the destination server to perform READs of the file to be copied from 980 the source on behalf of the user principal. Once the copy is 981 complete, the client can destroy the RPCSEC_GSSv3 handles to end the 982 authorization of both the source and destination servers to copy. 984 We define three RPCSEC_GSSv3 structured privilege assertions that 985 work in tandem to authorize the copy: 987 copy_from_auth: A user principal is authorizing a source principal 988 ("nfs@") to allow a destination principal 989 ("nfs@") to setup the copy_confirm_auth privilege 990 required to copy a file from the source to the destination on 991 behalf of the user principal. This privilege is established on 992 the source server before the user principal sends a COPY_NOTIFY 993 operation to the source server, and the resultant RPCSEC_GSSv3 994 context is used to secure the COPY_NOTIFY operation. 996 998 struct copy_from_auth_priv { 999 secret4 cfap_shared_secret; 1000 netloc4 cfap_destination; 1001 /* the NFSv4 user name that the user principal maps to */ 1002 utf8str_mixed cfap_username; 1003 }; 1005 1007 cfap_shared_secret is an automatically generated random number 1008 secret value. 1010 copy_to_auth: A user principal is authorizing a destination 1011 principal ("nfs@") to setup a copy_confirm_auth 1012 privilege with a source principal ("nfs@") to allow it to 1013 copy a file from the source to the destination on behalf of the 1014 user principal. This privilege is established on the destination 1015 server before the user principal sends a COPY operation to the 1016 destination server, and the resultant RPCSEC_GSSv3 context is used 1017 to secure the COPY operation. 1019 1021 struct copy_to_auth_priv { 1022 /* equal to cfap_shared_secret */ 1023 secret4 ctap_shared_secret; 1024 netloc4 ctap_source<>; 1025 /* the NFSv4 user name that the user principal maps to */ 1026 utf8str_mixed ctap_username; 1027 }; 1029 1031 ctap_shared_secret is the automatically generated secret value 1032 used to establish the copy_from_auth privilege with the source 1033 principal. See Section 4.10.1.1.1. 1035 copy_confirm_auth: A destination principal ("nfs@") is 1036 confirming with the source principal ("nfs@") that it is 1037 authorized to copy data from the source. This privilege is 1038 established on the destination server before the file is copied 1039 from the source to the destination. The resultant RPCSEC_GSSv3 1040 context is used to secure the READ operations from the source to 1041 the destination server. 1043 1045 struct copy_confirm_auth_priv { 1046 /* equal to GSS_GetMIC() of cfap_shared_secret */ 1047 opaque ccap_shared_secret_mic<>; 1048 /* the NFSv4 user name that the user principal maps to */ 1049 utf8str_mixed ccap_username; 1050 }; 1052 1054 4.10.1.1.1. Establishing a Security Context 1056 When the user principal wants to COPY a file between two servers, if 1057 it has not established copy_from_auth and copy_to_auth privileges on 1058 the servers, it establishes them: 1060 o As noted in [rpcsec_gssv3] the client uses an existing 1061 RPCSEC_GSSv3 context termed the "parent" handle to establish and 1062 protect RPCSEC_GSSv3 structured privilege assertion exchanges. 1063 The copy_from_auth privilege will use the context established 1064 between the user principal and the source server used to OPEN the 1065 source file as the RPCSEC_GSSv3 parent handle. The copy_to_auth 1066 privilege will use the context established between the user 1067 principal and the destination server used to OPEN the destination 1068 file as the RPCSEC_GSSv3 parent handle. 1070 o A random number is generated to use as a secret to be shared 1071 between the two servers. This shared secret will be placed in the 1072 cfap_shared_secret and ctap_shared_secret fields of the 1073 appropriate privilege data types, copy_from_auth_priv and 1074 copy_to_auth_priv. Because of this shared_secret the 1075 RPCSEC_GSS3_CREATE control messages for copy_from_auth and 1076 copy_to_auth MUST use a Quality of Protection (QOP) of 1077 rpc_gss_svc_privacy. 1079 o An instance of copy_from_auth_priv is filled in with the shared 1080 secret, the destination server, and the NFSv4 user id of the user 1081 principal and is placed in rpc_gss3_create_args 1082 assertions[0].privs.privilege. The string "copy_from_auth" is 1083 placed in assertions[0].privs.name. The source server unwraps the 1084 rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies that 1085 the NFSv4 user id being asserted matches the source server's 1086 mapping of the user principal. If it does, the privilege is 1087 established on the source server as: <"copy_from_auth", user id, 1088 destination>. The field "handle" in a successful reply is the 1089 RPCSEC_GSSv3 copy_from_auth "child" handle that the client will 1090 use on COPY_NOTIFY requests to the source server. 1092 o An instance of copy_to_auth_priv is filled in with the shared 1093 secret, the cnr_source_server list returned by COPY_NOTIFY, and 1094 the NFSv4 user id of the user principal. The copy_to_auth_priv 1095 instance is placed in rpc_gss3_create_args 1096 assertions[0].privs.privilege. The string "copy_to_auth" is 1097 placed in assertions[0].privs.name. The destination server 1098 unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and 1099 verifies that the NFSv4 user id being asserted matches the 1100 destination server's mapping of the user principal. If it does, 1101 the privilege is established on the destination server as: 1102 <"copy_to_auth", user id, source list>. The field "handle" in a 1103 successful reply is the RPCSEC_GSSv3 copy_to_auth "child" handle 1104 that the client will use on COPY requests to the destination 1105 server involving the source server. 1107 As noted in [rpcsec_gssv3] Section 2.3.1 "Create Request", both the 1108 client and the source server should associate the RPCSEC_GSSv3 1109 "child" handle with the parent RPCSEC_GSSv3 handle used to create the 1110 RPCSEC_GSSv3 child handle. 1112 4.10.1.1.2. Starting a Secure Inter-Server Copy 1114 When the client sends a COPY_NOTIFY request to the source server, it 1115 uses the privileged "copy_from_auth" RPCSEC_GSSv3 handle. 1116 cna_destination_server in COPY_NOTIFY MUST be the same as 1117 cfap_destination specified in copy_from_auth_priv. Otherwise, 1118 COPY_NOTIFY will fail with NFS4ERR_ACCESS. The source server 1119 verifies that the privilege <"copy_from_auth", user id, destination> 1120 exists, and annotates it with the source filehandle, if the user 1121 principal has read access to the source file, and if administrative 1122 policies give the user principal and the NFS client read access to 1123 the source file (i.e., if the ACCESS operation would grant read 1124 access). Otherwise, COPY_NOTIFY will fail with NFS4ERR_ACCESS. 1126 When the client sends a COPY request to the destination server, it 1127 uses the privileged "copy_to_auth" RPCSEC_GSSv3 handle. 1128 ca_source_server list in COPY MUST be the same as ctap_source list 1129 specified in copy_to_auth_priv. Otherwise, COPY will fail with 1130 NFS4ERR_ACCESS. The destination server verifies that the privilege 1131 <"copy_to_auth", user id, source list> exists, and annotates it with 1132 the source and destination filehandles. If the COPY returns a 1133 wr_callback_id, then this is an asynchronous copy and the 1134 wr_callback_id must also must be annotated to the copy_to_auth 1135 privilege. If the client has failed to establish the "copy_to_auth" 1136 privilege it will reject the request with NFS4ERR_PARTNER_NO_AUTH. 1138 If either the COPY_NOTIFY, or the COPY operations fail, the 1139 associated "copy_from_auth" and "copy_to_auth" RPCSEC_GSSv3 handles 1140 MUST be destroyed. 1142 4.10.1.1.3. Securing ONC RPC Server-to-Server Copy Protocols 1144 After a destination server has a "copy_to_auth" privilege established 1145 on it, and it receives a COPY request, if it knows it will use an ONC 1146 RPC protocol to copy data, it will establish a "copy_confirm_auth" 1147 privilege on the source server prior to responding to the COPY 1148 operation as follows: 1150 o Before establishing an RPCSEC_GSSv3 context, a parent context 1151 needs to exist between nfs@ as the initiator 1152 principal, and nfs@ as the target principal. If NFS is to 1153 be used as the copy protocol, this means that the destination 1154 server must mount the source server using RPCSEC_GSSv3. 1156 o An instance of copy_confirm_auth_priv is filled in with 1157 information from the established "copy_to_auth" privilege. The 1158 value of the field ccap_shared_secret_mic is a GSS_GetMIC() of the 1159 ctap_shared_secret in the copy_to_auth privilege using the parent 1160 handle context. The field ccap_username is the mapping of the 1161 user principal to an NFSv4 user name ("user"@"domain" form), and 1162 MUST be the same as the ctap_username in the copy_to_auth 1163 privilege. The copy_confirm_auth_priv instance is placed in 1164 rpc_gss3_create_args assertions[0].privs.privilege. The string 1165 "copy_confirm_auth" is placed in assertions[0].privs.name. 1167 o The RPCSEC_GSS3_CREATE copy_from_auth message is sent to the 1168 source server with a QOP of rpc_gss_svc_privacy. The source 1169 server unwraps the rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload 1170 and verifies the cap_shared_secret_mic by calling GSS_VerifyMIC() 1171 using the parent context on the cfap_shared_secret from the 1172 established "copy_from_auth" privilege, and verifies the that the 1173 ccap_username equals the cfap_username. 1175 o If all verification succeeds, the "copy_confirm_auth" privilege is 1176 established on the source server as < "copy_confirm_auth", 1177 shared_secret_mic, user id> Because the shared secret has been 1178 verified, the resultant copy_confirm_auth RPCSEC_GSSv3 child 1179 handle is noted to be acting on behalf of the user principal. 1181 o If the source server fails to verify the copy_from_auth privilege 1182 the COPY_NOTIFY operation will be rejected with 1183 NFS4ERR_PARTNER_NO_AUTH. 1185 o If the destination server fails to verify the copy_to_auth or 1186 copy_confirm_auth privilege, the COPY will be rejeced with 1187 NFS4ERR_PARTNER_NO_AUTH, causing the client to destroy the 1188 associated copy_from_auth and copy_to_auth RPCSEC_GSSv3 structured 1189 privilege assertion handles. 1191 o All subsequent ONC RPC READ requests sent from the destination to 1192 copy data from the source to the destination will use the 1193 RPCSEC_GSSv3 copy_confirm_auth child handle. 1195 Note that the use of the "copy_confirm_auth" privilege accomplishes 1196 the following: 1198 o If a protocol like NFS is being used, with export policies, export 1199 policies can be overridden in case the destination server as-an- 1200 NFS-client is not authorized 1202 o Manual configuration to allow a copy relationship between the 1203 source and destination is not needed. 1205 4.10.1.1.4. Maintaining a Secure Inter-Server Copy 1207 If the client determines that either the copy_from_auth or the 1208 copy_to_auth handle becomes invalid during a copy, then the copy MUST 1209 be aborted by the client sending an OFFLOAD_CANCEL to both the source 1210 and destination servers and destroying the respective copy related 1211 context handles as described in Section 4.10.1.1.5. 1213 4.10.1.1.5. Finishing or Stopping a Secure Inter-Server Copy 1215 Under normal operation, the client MUST destroy the copy_from_auth 1216 and the copy_to_auth RPCSEC_GSSv3 handle once the COPY operation 1217 returns for a synchronous inter-server copy or a CB_OFFLOAD reports 1218 the result of an asynchronous copy. 1220 The copy_confirm_auth privilege constructed from information held by 1221 the copy_to_auth privilege, and MUST be destroyed by the destination 1222 server (via an RPCSEC_GSS3_DESTROY call) when the copy_to_auth 1223 RPCSEC_GSSv3 handle is destroyed. 1225 The copy_confirm_auth RPCSEC_GSS3 handle is associated with a 1226 copy_from_auth RPCSEC_GSS3 handle on the source server via the shared 1227 secret and MUST be locally destroyed (there is no RPCSEC_GSS3_DESTROY 1228 as the source server is not the initiator) when the copy_from_auth 1229 RPCSEC_GSSv3 handle is destroyed. 1231 If the client sends an OFFLOAD_CANCEL to the source server to rescind 1232 the destination server's synchronous copy privilege, it uses the 1233 privileged "copy_from_auth" RPCSEC_GSSv3 handle and the 1234 cra_destination_server in OFFLOAD_CANCEL MUST be the same as the name 1235 of the destination server specified in copy_from_auth_priv. The 1236 source server will then delete the <"copy_from_auth", user id, 1237 destination> privilege and fail any subsequent copy requests sent 1238 under the auspices of this privilege from the destination server. 1239 The client MUST destroy both the "copy_from_auth" and the 1240 "copy_to_auth" RPCSEC_GSSv3 handles. 1242 If the client sends an OFFLOAD_STATUS to the destination server to 1243 check on the status of an asynchronous copy, it uses the privileged 1244 "copy_to_auth" RPCSEC_GSSv3 handle and the osa_stateid in 1245 OFFLOAD_STATUS MUST be the same as the wr_callback_id specified in 1246 the "copy_to_auth" privilege stored on the destination server. 1248 If the client sends an OFFLOAD_CANCEL to the destination server to 1249 cancel an asynchronous copy, it uses the privileged "copy_to_auth" 1250 RPCSEC_GSSv3 handle and the oaa_stateid in OFFLOAD_CANCEL MUST be the 1251 same as the wr_callback_id specified in the "copy_to_auth" privilege 1252 stored on the destination server. The destination server will then 1253 delete the <"copy_to_auth", user id, source list, nounce, nounce MIC, 1254 context handle, handle version> privilege and the associated 1255 "copy_confirm_auth" RPCSEC_GSSv3 handle. The client MUST destroy 1256 both the copy_to_auth and copy_from_auth RPCSEC_GSSv3 handles. 1258 4.10.1.2. Inter-Server Copy via ONC RPC without RPCSEC_GSS 1260 ONC RPC security flavors other than RPCSEC_GSS MAY be used with the 1261 server-side copy offload operations described in this chapter. In 1262 particular, host-based ONC RPC security flavors such as AUTH_NONE and 1263 AUTH_SYS MAY be used. If a host-based security flavor is used, a 1264 minimal level of protection for the server-to-server copy protocol is 1265 possible. 1267 In the absence of a strong security mechanism designed for the 1268 purpose, the challenge is how the source server and destination 1269 server identify themselves to each other, especially in the presence 1270 of multi-homed source and destination servers. In a multi-homed 1271 environment, the destination server might not contact the source 1272 server from the same network address specified by the client in the 1273 COPY_NOTIFY. The cnr_stateid returned from the COPY_NOTIFY can be 1274 used to uniquely identify the destination server to the source 1275 server. The use of cnr_stateid provides initial authentication of 1276 the destination server, but cannot defend against man-in-the-middle 1277 attacks after authentication or an eavesdropper that observes the 1278 opaque stateid on the wire. Other secure communication techniques 1279 (e.g., IPsec) are necessary to block these attacks. 1281 Servers SHOULD reject COPY_NOTIFY requests that do not use RPCSEC_GSS 1282 with privacy, thus ensuring the cnr_stateid in the COPY_NOTIFY reply 1283 is encrypted. For the same reason, clients SHOULD send COPY requests 1284 to the destination using RPCSEC_GSS with privacy. 1286 4.10.1.3. Inter-Server Copy without ONC RPC 1288 The same techniques as Section 4.10.1.2, using unique URLs for each 1289 destination server, can be used for other protocols (e.g., HTTP 1290 [RFC7230] and FTP [RFC959]) as well. 1292 5. Support for Application I/O Hints 1294 Applications can issue client I/O hints via posix_fadvise() 1295 [posix_fadvise] to the NFS client. While this can help the NFS 1296 client optimize I/O and caching for a file, it does not allow the NFS 1297 server and its exported file system to do likewise. We add an 1298 IO_ADVISE procedure (Section 15.5) to communicate the client file 1299 access patterns to the NFS server. The NFS server upon receiving a 1300 IO_ADVISE operation MAY choose to alter its I/O and caching behavior, 1301 but is under no obligation to do so. 1303 Application specific NFS clients such as those used by hypervisors 1304 and databases can also leverage application hints to communicate 1305 their specialized requirements. 1307 6. Sparse Files 1309 A sparse file is a common way of representing a large file without 1310 having to utilize all of the disk space for it. Consequently, a 1311 sparse file uses less physical space than its size indicates. This 1312 means the file contains 'holes', byte ranges within the file that 1313 contain no data. Most modern file systems support sparse files, 1314 including most UNIX file systems and NTFS, but notably not Apple's 1315 HFS+. Common examples of sparse files include Virtual Machine (VM) 1316 OS/disk images, database files, log files, and even checkpoint 1317 recovery files most commonly used by the HPC community. 1319 In addition many modern file systems support the concept of 1320 'unwritten' or 'uninitialized' blocks, which have uninitialized space 1321 allocated to them on disk, but will return zeros until data is 1322 written to them. Such functionality is already present in the data 1323 model of the pNFS Block/Volume Layout (see [RFC5663]). Uninitialized 1324 blocks can be thought of as holes inside a space reservation window. 1326 If an application reads a hole in a sparse file, the file system must 1327 return all zeros to the application. For local data access there is 1328 little penalty, but with NFS these zeroes must be transferred back to 1329 the client. If an application uses the NFS client to read data into 1330 memory, this wastes time and bandwidth as the application waits for 1331 the zeroes to be transferred. 1333 A sparse file is typically created by initializing the file to be all 1334 zeros - nothing is written to the data in the file, instead the hole 1335 is recorded in the metadata for the file. So a 8G disk image might 1336 be represented initially by a few hundred bits in the metadata (on 1337 UNIX file systems, the inode) and nothing on the disk. If the VM 1338 then writes 100M to a file in the middle of the image, there would 1339 now be two holes represented in the metadata and 100M in the data. 1341 No new operation is needed to allow the creation of a sparsely 1342 populated file, when a file is created and a write occurs past the 1343 current size of the file, the non-allocated region will either be a 1344 hole or filled with zeros. The choice of behavior is dictated by the 1345 underlying file system and is transparent to the application. What 1346 is needed are the abilities to read sparse files and to punch holes 1347 to reinitialize the contents of a file. 1349 Two new operations DEALLOCATE (Section 15.4) and READ_PLUS 1350 (Section 15.10) are introduced. DEALLOCATE allows for the hole 1351 punching, where an application might want to reset the allocation and 1352 reservation status of a range of the file. READ_PLUS supports all 1353 the features of READ but includes an extension to support sparse 1354 files. READ_PLUS is guaranteed to perform no worse than READ, and 1355 can dramatically improve performance with sparse files. READ_PLUS 1356 does not depend on pNFS protocol features, but can be used by pNFS to 1357 support sparse files. 1359 6.1. Terminology 1361 Regular file: An object of file type NF4REG or NF4NAMEDATTR. 1363 Sparse file: A Regular file that contains one or more holes. 1365 Hole: A byte range within a Sparse file that contains regions of all 1366 zeroes. A hole might or might not have space allocated or 1367 reserved to it. 1369 6.2. New Operations 1371 6.2.1. READ_PLUS 1373 READ_PLUS is a new variant of the NFSv4.1 READ operation [RFC5661]. 1374 Besides being able to support all of the data semantics of the READ 1375 operation, it can also be used by the client and server to 1376 efficiently transfer holes. Note that as the client has no a priori 1377 knowledge of whether a hole is present or not, if the client supports 1378 READ_PLUS and so does the server, then it should always use the 1379 READ_PLUS operation in preference to the READ operation. 1381 READ_PLUS extends the response with a new arm representing holes to 1382 avoid returning data for portions of the file which are initialized 1383 to zero and may or may not contain a backing store. Returning data 1384 blocks of uninitialized data wastes computational and network 1385 resources, thus reducing performance. 1387 When a client sends a READ operation, it is not prepared to accept a 1388 READ_PLUS-style response providing a compact encoding of the scope of 1389 holes. If a READ occurs on a sparse file, then the server must 1390 expand such data to be raw bytes. If a READ occurs in the middle of 1391 a hole, the server can only send back bytes starting from that 1392 offset. By contrast, if a READ_PLUS occurs in the middle of a hole, 1393 the server can send back a range which starts before the offset and 1394 extends past the range. 1396 6.2.2. DEALLOCATE 1398 DEALLOCATE can be used to hole punch, which allows the client to 1399 avoid the transfer of a repetitive pattern of zeros across the 1400 network. 1402 7. Space Reservation 1404 Applications want to be able to reserve space for a file, report the 1405 amount of actual disk space a file occupies, and free-up the backing 1406 space of a file when it is not required. 1408 One example is the posix_fallocate operation ([posix_fallocate]) 1409 which allows applications to ask for space reservations from the 1410 operating system, usually to provide a better file layout and reduce 1411 overhead for random or slow growing file appending workloads. 1413 Another example is space reservation for virtual disks in a 1414 hypervisor. In virtualized environments, virtual disk files are 1415 often stored on NFS mounted volumes. When a hypervisor creates a 1416 virtual disk file, it often tries to preallocate the space for the 1417 file so that there are no future allocation related errors during the 1418 operation of the virtual machine. Such errors prevent a virtual 1419 machine from continuing execution and result in downtime. 1421 Currently, in order to achieve such a guarantee, applications zero 1422 the entire file. The initial zeroing allocates the backing blocks 1423 and all subsequent writes are overwrites of already allocated blocks. 1424 This approach is not only inefficient in terms of the amount of I/O 1425 done, it is also not guaranteed to work on file systems that are log 1426 structured or deduplicated. An efficient way of guaranteeing space 1427 reservation would be beneficial to such applications. 1429 The new ALLOCATE operation (see Section 15.1) allows a client to 1430 request a guarantee that space will be available. The ALLOCATE 1431 operation guarantees that any future writes to the region it was 1432 successfully called for will not fail with NFS4ERR_NOSPC. 1434 Another useful feature is the ability to report the number of blocks 1435 that would be freed when a file is deleted. Currently, NFS reports 1436 two size attributes: 1438 size The logical file size of the file. 1440 space_used The size in bytes that the file occupies on disk 1442 While these attributes are sufficient for space accounting in 1443 traditional file systems, they prove to be inadequate in modern file 1444 systems that support block sharing. In such file systems, multiple 1445 inodes (the metadata portion of the file system object) can point to 1446 a single block with a block reference count to guard against 1447 premature freeing. Having a way to tell the number of blocks that 1448 would be freed if the file was deleted would be useful to 1449 applications that wish to migrate files when a volume is low on 1450 space. 1452 Since virtual disks represent a hard drive in a virtual machine, a 1453 virtual disk can be viewed as a file system within a file. Since not 1454 all blocks within a file system are in use, there is an opportunity 1455 to reclaim blocks that are no longer in use. A call to deallocate 1456 blocks could result in better space efficiency. Lesser space MAY be 1457 consumed for backups after block deallocation. 1459 The following operations and attributes can be used to resolve these 1460 issues: 1462 space_freed This attribute reports the space that would be freed 1463 when a file is deleted, taking block sharing into consideration. 1465 DEALLOCATE This operation deallocates the blocks backing a region of 1466 the file. 1468 If space_used of a file is interpreted to mean the size in bytes of 1469 all disk blocks pointed to by the inode of the file, then shared 1470 blocks get double counted, over-reporting the space utilization. 1471 This also has the adverse effect that the deletion of a file with 1472 shared blocks frees up less than space_used bytes. 1474 On the other hand, if space_used is interpreted to mean the size in 1475 bytes of those disk blocks unique to the inode of the file, then 1476 shared blocks are not counted in any file, resulting in under- 1477 reporting of the space utilization. 1479 For example, two files A and B have 10 blocks each. Let 6 of these 1480 blocks be shared between them. Thus, the combined space utilized by 1481 the two files is 14 * BLOCK_SIZE bytes. In the former case, the 1482 combined space utilization of the two files would be reported as 20 * 1483 BLOCK_SIZE. However, deleting either would only result in 4 * 1484 BLOCK_SIZE being freed. Conversely, the latter interpretation would 1485 report that the space utilization is only 8 * BLOCK_SIZE. 1487 Adding another size attribute, space_freed (see Section 12.2.2), is 1488 helpful in solving this problem. space_freed is the number of blocks 1489 that are allocated to the given file that would be freed on its 1490 deletion. In the example, both A and B would report space_freed as 4 1491 * BLOCK_SIZE and space_used as 10 * BLOCK_SIZE. If A is deleted, B 1492 will report space_freed as 10 * BLOCK_SIZE as the deletion of B would 1493 result in the deallocation of all 10 blocks. 1495 The addition of these attributes does not solve the problem of space 1496 being over-reported. However, over-reporting is better than under- 1497 reporting. 1499 8. Application Data Block Support 1501 At the OS level, files are contained on disk blocks. Applications 1502 are also free to impose structure on the data contained in a file and 1503 we can define an Application Data Block (ADB) to be such a structure. 1504 From the application's viewpoint, it only wants to handle ADBs and 1505 not raw bytes (see [Strohm11]). An ADB is typically comprised of two 1506 sections: header and data. The header describes the characteristics 1507 of the block and can provide a means to detect corruption in the data 1508 payload. The data section is typically initialized to all zeros. 1510 The format of the header is application specific, but there are two 1511 main components typically encountered: 1513 1. An Application Data Block Number (ADBN) which allows the 1514 application to determine which data block is being referenced. 1515 This is useful when the client is not storing the blocks in 1516 contiguous memory, i.e., a logical block number. 1518 2. Fields to describe the state of the ADB and a means to detect 1519 block corruption. For both pieces of data, a useful property 1520 would be that the allowed values are specially selected so that 1521 if passed across the network, corruption due to translation 1522 between big and little endian architectures is detectable. For 1523 example, 0xF0DEDEF0 has the same (32 wide) bit pattern in both 1524 architectures, making it inappropriate. 1526 Applications already impose structures on files [Strohm11] and detect 1527 corruption in data blocks [Ashdown08]. What they are not able to do 1528 is efficiently transfer and store ADBs. To initialize a file with 1529 ADBs, the client must send each full ADB to the server and that must 1530 be stored on the server. 1532 In this section, we define a framework for transferring the ADB from 1533 client to server and present one approach to detecting corruption in 1534 a given ADB implementation. 1536 8.1. Generic Framework 1538 We want the representation of the ADB to be flexible enough to 1539 support many different applications. The most basic approach is no 1540 imposition of a block at all, which means we are working with the raw 1541 bytes. Such an approach would be useful for storing holes, punching 1542 holes, etc. In more complex deployments, a server might be 1543 supporting multiple applications, each with their own definition of 1544 the ADB. One might store the ADBN at the start of the block and then 1545 have a guard pattern to detect corruption [McDougall07]. The next 1546 might store the ADBN at an offset of 100 bytes within the block and 1547 have no guard pattern at all, i.e., existing applications might 1548 already have well defined formats for their data blocks. 1550 The guard pattern can be used to represent the state of the block, to 1551 protect against corruption, or both. Again, it needs to be able to 1552 be placed anywhere within the ADB. 1554 We need to be able to represent the starting offset of the block and 1555 the size of the block. Note that nothing prevents the application 1556 from defining different sized blocks in a file. 1558 8.1.1. Data Block Representation 1560 1562 struct app_data_block4 { 1563 offset4 adb_offset; 1564 length4 adb_block_size; 1565 length4 adb_block_count; 1566 length4 adb_reloff_blocknum; 1567 count4 adb_block_num; 1568 length4 adb_reloff_pattern; 1569 opaque adb_pattern<>; 1570 }; 1572 1574 The app_data_block4 structure captures the abstraction presented for 1575 the ADB. The additional fields present are to allow the transmission 1576 of adb_block_count ADBs at one time. We also use adb_block_num to 1577 convey the ADBN of the first block in the sequence. Each ADB will 1578 contain the same adb_pattern string. 1580 As both adb_block_num and adb_pattern are optional, if either 1581 adb_reloff_pattern or adb_reloff_blocknum is set to NFS4_UINT64_MAX, 1582 then the corresponding field is not set in any of the ADB. 1584 8.2. An Example of Detecting Corruption 1586 In this section, we define an ADB format in which corruption can be 1587 detected. Note that this is just one possible format and means to 1588 detect corruption. 1590 Consider a very basic implementation of an operating system's disk 1591 blocks. A block is either data or it is an indirect block which 1592 allows for files to be larger than one block. It is desired to be 1593 able to initialize a block. Lastly, to quickly unlink a file, a 1594 block can be marked invalid. The contents remain intact - which 1595 would enable this OS application to undelete a file. 1597 The application defines 4k sized data blocks, with an 8 byte block 1598 counter occurring at offset 0 in the block, and with the guard 1599 pattern occurring at offset 8 inside the block. Furthermore, the 1600 guard pattern can take one of four states: 1602 0xfeedface - This is the FREE state and indicates that the ADB 1603 format has been applied. 1605 0xcafedead - This is the DATA state and indicates that real data 1606 has been written to this block. 1608 0xe4e5c001 - This is the INDIRECT state and indicates that the 1609 block contains block counter numbers that are chained off of this 1610 block. 1612 0xba1ed4a3 - This is the INVALID state and indicates that the block 1613 contains data whose contents are garbage. 1615 Finally, it also defines an 8 byte checksum [Baira08] starting at 1616 byte 16 which applies to the remaining contents of the block. If the 1617 state is FREE, then that checksum is trivially zero. As such, the 1618 application has no need to transfer the checksum implicitly inside 1619 the ADB - it need not make the transfer layer aware of the fact that 1620 there is a checksum (see [Ashdown08] for an example of checksums used 1621 to detect corruption in application data blocks). 1623 Corruption in each ADB can thus be detected: 1625 o If the guard pattern is anything other than one of the allowed 1626 values, including all zeros. 1628 o If the guard pattern is FREE and any other byte in the remainder 1629 of the ADB is anything other than zero. 1631 o If the guard pattern is anything other than FREE, then if the 1632 stored checksum does not match the computed checksum. 1634 o If the guard pattern is INDIRECT and one of the stored indirect 1635 block numbers has a value greater than the number of ADBs in the 1636 file. 1638 o If the guard pattern is INDIRECT and one of the stored indirect 1639 block numbers is a duplicate of another stored indirect block 1640 number. 1642 As can be seen, the application can detect errors based on the 1643 combination of the guard pattern state and the checksum. But also, 1644 the application can detect corruption based on the state and the 1645 contents of the ADB. This last point is important in validating the 1646 minimum amount of data we incorporated into our generic framework. 1647 I.e., the guard pattern is sufficient in allowing applications to 1648 design their own corruption detection. 1650 Finally, it is important to note that none of these corruption checks 1651 occur in the transport layer. The server and client components are 1652 totally unaware of the file format and might report everything as 1653 being transferred correctly even in the case the application detects 1654 corruption. 1656 8.3. Example of READ_PLUS 1658 The hypothetical application presented in Section 8.2 can be used to 1659 illustrate how READ_PLUS would return an array of results. A file is 1660 created and initialized with 100 4k ADBs in the FREE state with the 1661 WRITE_SAME operation (see Section 15.12): 1663 WRITE_SAME {0, 4k, 100, 0, 0, 8, 0xfeedface} 1665 Further, assume the application writes a single ADB at 16k, changing 1666 the guard pattern to 0xcafedead, we would then have in memory: 1668 0k -> (4k - 1) : 00 00 00 00 ... fe ed fa ce 00 00 ... 00 1669 4k -> (8k - 1) : 00 00 00 01 ... fe ed fa ce 00 00 ... 00 1670 8k -> (12k - 1) : 00 00 00 02 ... fe ed fa ce 00 00 ... 00 1671 12k -> (16k - 1) : 00 00 00 03 ... fe ed fa ce 00 00 ... 00 1672 16k -> (20k - 1) : 00 00 00 04 ... ca fe de ad 00 00 ... 00 1673 20k -> (24k - 1) : 00 00 00 05 ... fe ed fa ce 00 00 ... 00 1674 24k -> (28k - 1) : 00 00 00 06 ... fe ed fa ce 00 00 ... 00 1675 ... 1676 396k -> (400k - 1) : 00 00 00 63 ... fe ed fa ce 00 00 ... 00 1678 And when the client did a READ_PLUS of 64k at the start of the file, 1679 it could get back a result of data: 1681 0k -> (4k - 1) : 00 00 00 00 ... fe ed fa ce 00 00 ... 00 1682 4k -> (8k - 1) : 00 00 00 01 ... fe ed fa ce 00 00 ... 00 1683 8k -> (12k - 1) : 00 00 00 02 ... fe ed fa ce 00 00 ... 00 1684 12k -> (16k - 1) : 00 00 00 03 ... fe ed fa ce 00 00 ... 00 1685 16k -> (20k - 1) : 00 00 00 04 ... ca fe de ad 00 00 ... 00 1686 20k -> (24k - 1) : 00 00 00 05 ... fe ed fa ce 00 00 ... 00 1687 24k -> (24k - 1) : 00 00 00 06 ... fe ed fa ce 00 00 ... 00 1688 ... 1689 62k -> (64k - 1) : 00 00 00 15 ... fe ed fa ce 00 00 ... 00 1691 8.4. An Example of Zeroing Space 1693 A simpler use case for WRITE_SAME are applications that want to 1694 efficiently zero out a file, but do not want to modify space 1695 reservations. This can easily be achieved by a call to WRITE_SAME 1696 without a ADB block numbers and pattern, e.g.: 1698 WRITE_SAME {0, 1k, 10000, 0, 0, 0, 0} 1700 9. Labeled NFS 1702 Access control models such as Unix permissions or Access Control 1703 Lists are commonly referred to as Discretionary Access Control (DAC) 1704 models. These systems base their access decisions on user identity 1705 and resource ownership. In contrast Mandatory Access Control (MAC) 1706 models base their access control decisions on the label on the 1707 subject (usually a process) and the object it wishes to access 1708 [RFC4949]. These labels may contain user identity information but 1709 usually contain additional information. In DAC systems users are 1710 free to specify the access rules for resources that they own. MAC 1711 models base their security decisions on a system wide policy 1712 established by an administrator or organization which the users do 1713 not have the ability to override. In this section, we add a MAC 1714 model to NFSv4.2. 1716 First we provide a method for transporting and storing security label 1717 data on NFSv4 file objects. Security labels have several semantics 1718 that are met by NFSv4 recommended attributes such as the ability to 1719 set the label value upon object creation. Access control on these 1720 attributes are done through a combination of two mechanisms. As with 1721 other recommended attributes on file objects the usual DAC checks, 1722 Access Control Lists (ACLs) and permission bits, will be performed to 1723 ensure that proper file ownership is enforced. In addition a MAC 1724 system MAY be employed on the client, server, or both to enforce 1725 additional policy on what subjects may modify security label 1726 information. 1728 Second, we describe a method for the client to determine if an NFSv4 1729 file object security label has changed. A client which needs to know 1730 if a label on a file or set of files is going to change SHOULD 1731 request a delegation on each labeled file. In order to change such a 1732 security label, the server will have to recall delegations on any 1733 file affected by the label change, so informing clients of the label 1734 change. 1736 An additional useful feature would be modification to the RPC layer 1737 used by NFSv4 to allow RPC calls to assert client process subject 1738 security labels and enable full mode enforcement as described in 1739 Section 9.5.1. Such modifications are outside the scope of this 1740 document (see [rpcsec_gssv3]). 1742 9.1. Definitions 1744 Label Format Specifier (LFS): is an identifier used by the client to 1745 establish the syntactic format of the security label and the 1746 semantic meaning of its components. These specifiers exist in a 1747 registry associated with documents describing the format and 1748 semantics of the label. 1750 Label Format Registry: is the IANA registry (see [RFC7569]) 1751 containing all registered LFSes along with references to the 1752 documents that describe the syntactic format and semantics of the 1753 security label. 1755 Policy Identifier (PI): is an optional part of the definition of a 1756 Label Format Specifier which allows for clients and server to 1757 identify specific security policies. 1759 Object: is a passive resource within the system that we wish to be 1760 protected. Objects can be entities such as files, directories, 1761 pipes, sockets, and many other system resources relevant to the 1762 protection of the system state. 1764 Subject: is an active entity usually a process which is requesting 1765 access to an object. 1767 MAC-Aware: is a server which can transmit and store object labels. 1769 MAC-Functional: is a client or server which is Labeled NFS enabled. 1770 Such a system can interpret labels and apply policies based on the 1771 security system. 1773 Multi-Level Security (MLS): is a traditional model where objects are 1774 given a sensitivity level (Unclassified, Secret, Top Secret, etc) 1775 and a category set (see [BL73], [RFC1108], and [RFC2401]). 1777 9.2. MAC Security Attribute 1779 MAC models base access decisions on security attributes bound to 1780 subjects (usually processes) and objects (for NFS, file objects). 1781 This information can range from a user identity for an identity based 1782 MAC model, sensitivity levels for Multi-level security, or a type for 1783 Type Enforcement. These models base their decisions on different 1784 criteria but the semantics of the security attribute remain the same. 1785 The semantics required by the security attributes are listed below: 1787 o MUST provide flexibility with respect to the MAC model. 1789 o MUST provide the ability to atomically set security information 1790 upon object creation. 1792 o MUST provide the ability to enforce access control decisions both 1793 on the client and the server. 1795 o MUST NOT expose an object to either the client or server name 1796 space before its security information has been bound to it. 1798 NFSv4 implements the security attribute as a recommended attribute. 1799 These attributes have a fixed format and semantics, which conflicts 1800 with the flexible nature of the security attribute. To resolve this 1801 the security attribute consists of two components. The first 1802 component is a LFS as defined in [RFC7569] to allow for 1803 interoperability between MAC mechanisms. The second component is an 1804 opaque field which is the actual security attribute data. To allow 1805 for various MAC models, NFSv4 should be used solely as a transport 1806 mechanism for the security attribute. It is the responsibility of 1807 the endpoints to consume the security attribute and make access 1808 decisions based on their respective models. In addition, creation of 1809 objects through OPEN and CREATE allows for the security attribute to 1810 be specified upon creation. By providing an atomic create and set 1811 operation for the security attribute it is possible to enforce the 1812 second and fourth requirements. The recommended attribute 1813 FATTR4_SEC_LABEL (see Section 12.2.4) will be used to satisfy this 1814 requirement. 1816 9.2.1. Delegations 1818 In the event that a security attribute is changed on the server while 1819 a client holds a delegation on the file, both the server and the 1820 client MUST follow the NFSv4.1 protocol (see Chapter 10 of [RFC5661]) 1821 with respect to attribute changes. It SHOULD flush all changes back 1822 to the server and relinquish the delegation. 1824 9.2.2. Permission Checking 1826 It is not feasible to enumerate all possible MAC models and even 1827 levels of protection within a subset of these models. This means 1828 that the NFSv4 client and servers cannot be expected to directly make 1829 access control decisions based on the security attribute. Instead 1830 NFSv4 should defer permission checking on this attribute to the host 1831 system. These checks are performed in addition to existing DAC and 1832 ACL checks outlined in the NFSv4 protocol. Section 9.5 gives a 1833 specific example of how the security attribute is handled under a 1834 particular MAC model. 1836 9.2.3. Object Creation 1838 When creating files in NFSv4 the OPEN and CREATE operations are used. 1839 One of the parameters to these operations is an fattr4 structure 1840 containing the attributes the file is to be created with. This 1841 allows NFSv4 to atomically set the security attribute of files upon 1842 creation. When a client is MAC-Functional it must always provide the 1843 initial security attribute upon file creation. In the event that the 1844 server is MAC-Functional as well, it should determine by policy 1845 whether it will accept the attribute from the client or instead make 1846 the determination itself. If the client is not MAC-Functional, then 1847 the MAC-Functional server must decide on a default label. A more in 1848 depth explanation can be found in Section 9.5. 1850 9.2.4. Existing Objects 1852 Note that under the MAC model, all objects must have labels. 1853 Therefore, if an existing server is upgraded to include Labeled NFS 1854 support, then it is the responsibility of the security system to 1855 define the behavior for existing objects. 1857 9.2.5. Label Changes 1859 Consider a guest mode system (Section 9.5.2) in which the clients 1860 enforce MAC checks and the server has only a DAC security system 1861 which stores the labels along with the file data. In this type of 1862 system, a user with the appropriate DAC credentials on a client with 1863 poorly configured or disabled MAC labeling enforcement is allowed 1864 access to the file label (and data) on the server and can change the 1865 label. 1867 Clients which need to know if a label on a file or set of files has 1868 changed SHOULD request a delegation on each labeled file so that a 1869 label change by another client will be known via the process 1870 described in Section 9.2.1 which must be followed: the delegation 1871 will be recalled, which effectively notifies the client of the 1872 change. 1874 Note that the MAC security policies on a client can be such that the 1875 client does not have access to the file unless it has a delegation. 1877 9.3. pNFS Considerations 1879 The new FATTR4_SEC_LABEL attribute is metadata information and as 1880 such the storage device is not aware of the value contained on the 1881 metadata server. Fortunately, the NFSv4.1 protocol [RFC5661] already 1882 has provisions for doing access level checks from the storage device 1883 to the metadata server. In order for the storage device to validate 1884 the subject label presented by the client, it SHOULD utilize this 1885 mechanism. 1887 9.4. Discovery of Server Labeled NFS Support 1889 The server can easily determine that a client supports Labeled NFS 1890 when it queries for the FATTR4_SEC_LABEL label for an object. The 1891 client might need to discover which LFS the server supports. 1893 The following compound MUST NOT be denied by any MAC label check: 1895 PUTROOTFH, GETATTR {FATTR4_SEC_LABEL} 1897 Note that the server might have imposed a security flavor on the root 1898 that precludes such access. I.e., if the server requires kerberized 1899 access and the client presents a compound with AUTH_SYS, then the 1900 server is allowed to return NFS4ERR_WRONGSEC in this case. But if 1901 the client presents a correct security flavor, then the server MUST 1902 return the FATTR4_SEC_LABEL attribute with the supported LFS filled 1903 in. 1905 9.5. MAC Security NFS Modes of Operation 1907 A system using Labeled NFS may operate in two modes. The first mode 1908 provides the most protection and is called "full mode". In this mode 1909 both the client and server implement a MAC model allowing each end to 1910 make an access control decision. The remaining mode is called the 1911 "guest mode" and in this mode one end of the connection is not 1912 implementing a MAC model and thus offers less protection than full 1913 mode. 1915 9.5.1. Full Mode 1917 Full mode environments consist of MAC-Functional NFSv4 servers and 1918 clients and may be composed of mixed MAC models and policies. The 1919 system requires that both the client and server have an opportunity 1920 to perform an access control check based on all relevant information 1921 within the network. The file object security attribute is provided 1922 using the mechanism described in Section 9.2. 1924 Fully MAC-Functional NFSv4 servers are not possible in the absence of 1925 RPCSEC_GSSv3 [rpcsec_gssv3] support for client process subject label 1926 assertion. However, servers may make decisions based on the RPC 1927 credential information available. 1929 9.5.1.1. Initial Labeling and Translation 1931 The ability to create a file is an action that a MAC model may wish 1932 to mediate. The client is given the responsibility to determine the 1933 initial security attribute to be placed on a file. This allows the 1934 client to make a decision as to the acceptable security attributes to 1935 create a file with before sending the request to the server. Once 1936 the server receives the creation request from the client it may 1937 choose to evaluate if the security attribute is acceptable. 1939 Security attributes on the client and server may vary based on MAC 1940 model and policy. To handle this the security attribute field has an 1941 LFS component. This component is a mechanism for the host to 1942 identify the format and meaning of the opaque portion of the security 1943 attribute. A full mode environment may contain hosts operating in 1944 several different LFSes. In this case a mechanism for translating 1945 the opaque portion of the security attribute is needed. The actual 1946 translation function will vary based on MAC model and policy and is 1947 out of the scope of this document. If a translation is unavailable 1948 for a given LFS then the request MUST be denied. Another recourse is 1949 to allow the host to provide a fallback mapping for unknown security 1950 attributes. 1952 9.5.1.2. Policy Enforcement 1954 In full mode access control decisions are made by both the clients 1955 and servers. When a client makes a request it takes the security 1956 attribute from the requesting process and makes an access control 1957 decision based on that attribute and the security attribute of the 1958 object it is trying to access. If the client denies that access an 1959 RPC call to the server is never made. If however the access is 1960 allowed the client will make a call to the NFS server. 1962 When the server receives the request from the client it uses any 1963 credential information conveyed in the RPC request and the attributes 1964 of the object the client is trying to access to make an access 1965 control decision. If the server's policy allows this access it will 1966 fulfill the client's request, otherwise it will return 1967 NFS4ERR_ACCESS. 1969 Future protocol extensions may also allow the server to factor into 1970 the decision a security label extracted from the RPC request. 1972 Implementations MAY validate security attributes supplied over the 1973 network to ensure that they are within a set of attributes permitted 1974 from a specific peer, and if not, reject them. Note that a system 1975 may permit a different set of attributes to be accepted from each 1976 peer. 1978 9.5.1.3. Limited Server 1980 A Limited Server mode (see Section 4.2 of [RFC7204]) consists of a 1981 server which is label aware, but does not enforce policies. Such a 1982 server will store and retrieve all object labels presented by 1983 clients, utilize the methods described in Section 9.2.5 to allow the 1984 clients to detect changing labels, but may not factor the label into 1985 access decisions. Instead, it will expect the clients to enforce all 1986 such access locally. 1988 9.5.2. Guest Mode 1990 Guest mode implies that either the client or the server does not 1991 handle labels. If the client is not Labeled NFS aware, then it will 1992 not offer subject labels to the server. The server is the only 1993 entity enforcing policy, and may selectively provide standard NFS 1994 services to clients based on their authentication credentials and/or 1995 associated network attributes (e.g., IP address, network interface). 1996 The level of trust and access extended to a client in this mode is 1997 configuration-specific. If the server is not Labeled NFS aware, then 1998 it will not return object labels to the client. Clients in this 1999 environment are may consist of groups implementing different MAC 2000 model policies. The system requires that all clients in the 2001 environment be responsible for access control checks. 2003 9.6. Security Considerations for Labeled NFS 2005 This entire chapter deals with security issues. 2007 Depending on the level of protection the MAC system offers there may 2008 be a requirement to tightly bind the security attribute to the data. 2010 When only one of the client or server enforces labels, it is 2011 important to realize that the other side is not enforcing MAC 2012 protections. Alternate methods might be in use to handle the lack of 2013 MAC support and care should be taken to identify and mitigate threats 2014 from possible tampering outside of these methods. 2016 An example of this is that a server that modifies READDIR or LOOKUP 2017 results based on the client's subject label might want to always 2018 construct the same subject label for a client which does not present 2019 one. This will prevent a non-Labeled NFS client from mixing entries 2020 in the directory cache. 2022 10. Sharing change attribute implementation characteristics with NFSv4 2023 clients 2025 Although both the NFSv4 [RFC7530] and NFSv4.1 protocol [RFC5661], 2026 define the change attribute as being mandatory to implement, there is 2027 little in the way of guidance as to its construction. The only 2028 mandated constraint is that the value must change whenever the file 2029 data or metadata change. 2031 While this allows for a wide range of implementations, it also leaves 2032 the client with no way to determine which is the most recent value 2033 for the change attribute in a case where several RPC calls have been 2034 issued in parallel. In other words if two COMPOUNDs, both containing 2035 WRITE and GETATTR requests for the same file, have been issued in 2036 parallel, how does the client determine which of the two change 2037 attribute values returned in the replies to the GETATTR requests 2038 correspond to the most recent state of the file? In some cases, the 2039 only recourse may be to send another COMPOUND containing a third 2040 GETATTR that is fully serialized with the first two. 2042 NFSv4.2 avoids this kind of inefficiency by allowing the server to 2043 share details about how the change attribute is expected to evolve, 2044 so that the client may immediately determine which, out of the 2045 several change attribute values returned by the server, is the most 2046 recent. change_attr_type is defined as a new recommended attribute 2047 (see Section 12.2.3), and is per file system. 2049 11. Error Values 2051 NFS error numbers are assigned to failed operations within a Compound 2052 (COMPOUND or CB_COMPOUND) request. A Compound request contains a 2053 number of NFS operations that have their results encoded in sequence 2054 in a Compound reply. The results of successful operations will 2055 consist of an NFS4_OK status followed by the encoded results of the 2056 operation. If an NFS operation fails, an error status will be 2057 entered in the reply and the Compound request will be terminated. 2059 11.1. Error Definitions 2061 Protocol Error Definitions 2063 +-------------------------+--------+------------------+ 2064 | Error | Number | Description | 2065 +-------------------------+--------+------------------+ 2066 | NFS4ERR_BADLABEL | 10093 | Section 11.1.3.1 | 2067 | NFS4ERR_OFFLOAD_DENIED | 10091 | Section 11.1.2.1 | 2068 | NFS4ERR_OFFLOAD_NO_REQS | 10094 | Section 11.1.2.2 | 2069 | NFS4ERR_PARTNER_NO_AUTH | 10089 | Section 11.1.2.3 | 2070 | NFS4ERR_PARTNER_NOTSUPP | 10088 | Section 11.1.2.4 | 2071 | NFS4ERR_UNION_NOTSUPP | 10090 | Section 11.1.1.1 | 2072 | NFS4ERR_WRONG_LFS | 10092 | Section 11.1.3.2 | 2073 +-------------------------+--------+------------------+ 2075 Table 1 2077 11.1.1. General Errors 2079 This section deals with errors that are applicable to a broad set of 2080 different purposes. 2082 11.1.1.1. NFS4ERR_UNION_NOTSUPP (Error Code 10090) 2084 One of the arguments to the operation is a discriminated union and 2085 while the server supports the given operation, it does not support 2086 the selected arm of the discriminated union. 2088 11.1.2. Server to Server Copy Errors 2090 These errors deal with the interaction between server to server 2091 copies. 2093 11.1.2.1. NFS4ERR_OFFLOAD_DENIED (Error Code 10091) 2095 The copy offload operation is supported by both the source and the 2096 destination, but the destination is not allowing it for this file. 2097 If the client sees this error, it should fall back to the normal copy 2098 semantics. 2100 11.1.2.2. NFS4ERR_OFFLOAD_NO_REQS (Error Code 10094) 2102 The copy offload operation is supported by both the source and the 2103 destination, but the destination can not meet the client requirements 2104 for either consecutive byte copy or synchronous copy. If the client 2105 sees this error, it should either relax the requirements (if any) or 2106 fall back to the normal copy semantics. 2108 11.1.2.3. NFS4ERR_PARTNER_NO_AUTH (Error Code 10089) 2110 The source server does not authorize a server-to-server copy offload 2111 operation. This may be due to the client's failure to send the 2112 COPY_NOTIFY operation to the source server, the source server 2113 receiving a server-to-server copy offload request after the copy 2114 lease time expired, or for some other permission problem. 2116 The destination server does not authorize a server-to-server copy 2117 offload operation. This may be due to an inter-server COPY request 2118 where the destination server requires RPCSEC_GSSv3 and it is not 2119 used, or some other permissions problem. 2121 11.1.2.4. NFS4ERR_PARTNER_NOTSUPP (Error Code 10088) 2123 The remote server does not support the server-to-server copy offload 2124 protocol. 2126 11.1.3. Labeled NFS Errors 2128 These errors are used in Labeled NFS. 2130 11.1.3.1. NFS4ERR_BADLABEL (Error Code 10093) 2132 The label specified is invalid in some manner. 2134 11.1.3.2. NFS4ERR_WRONG_LFS (Error Code 10092) 2136 The LFS specified in the subject label is not compatible with the LFS 2137 in the object label. 2139 11.2. New Operations and Their Valid Errors 2141 This section contains a table that gives the valid error returns for 2142 each new NFSv4.2 protocol operation. The error code NFS4_OK 2143 (indicating no error) is not listed but should be understood to be 2144 returnable by all new operations. The error values for all other 2145 operations are defined in Section 15.2 of [RFC5661]. 2147 Valid Error Returns for Each New Protocol Operation 2149 +----------------+--------------------------------------------------+ 2150 | Operation | Errors | 2151 +----------------+--------------------------------------------------+ 2152 | ALLOCATE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2153 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2154 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2155 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2156 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2157 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2158 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | 2159 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 2160 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2161 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2162 | | NFS4ERR_REP_TOO_BIG, | 2163 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2164 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2165 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2166 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2167 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2168 +----------------+--------------------------------------------------+ 2169 | CLONE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2170 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2171 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2172 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2173 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2174 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2175 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | 2176 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | 2177 | | NFS4ERR_NOSPC, NFS4ERR_OLD_STATEID, | 2178 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2179 | | NFS4ERR_REP_TOO_BIG, | 2180 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2181 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2182 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2183 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2184 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE, | 2185 | | NFS4ERR_XDEV | 2186 +----------------+--------------------------------------------------+ 2187 | COPY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2188 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2189 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2190 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2191 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2192 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2193 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, | 2194 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2195 | | NFS4ERR_NOSPC, NFS4ERR_OFFLOAD_DENIED, | 2196 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 2197 | | NFS4ERR_OP_NOT_IN_SESSION, | 2198 | | NFS4ERR_PARTNER_NO_AUTH, | 2199 | | NFS4ERR_PARTNER_NOTSUPP, NFS4ERR_PNFS_IO_HOLE, | 2200 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2201 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2202 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2203 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2204 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2205 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2206 +----------------+--------------------------------------------------+ 2207 | COPY_NOTIFY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2208 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2209 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2210 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2211 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2212 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | 2213 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2214 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 2215 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PNFS_IO_HOLE, | 2216 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2217 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2218 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2219 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2220 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 2221 | | NFS4ERR_WRONG_TYPE | 2222 +----------------+--------------------------------------------------+ 2223 | DEALLOCATE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2224 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2225 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2226 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2227 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 2228 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 2229 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2230 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2231 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2232 | | NFS4ERR_REP_TOO_BIG, | 2233 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2234 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2235 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2236 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2237 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2238 +----------------+--------------------------------------------------+ 2239 | GETDEVICELIST | NFS4ERR_NOTSUPP | 2240 +----------------+--------------------------------------------------+ 2241 | IO_ADVISE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2242 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2243 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2244 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2245 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 2246 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 2247 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2248 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2249 | | NFS4ERR_OP_NOT_IN_SESSION, | 2250 | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | 2251 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2252 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2253 +----------------+--------------------------------------------------+ 2254 | LAYOUTERROR | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2255 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 2256 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 2257 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 2258 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, | 2259 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2260 | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | 2261 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2262 | | NFS4ERR_REP_TOO_BIG, | 2263 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2264 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2265 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2266 | | NFS4ERR_TOO_MANY_OPS, | 2267 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | 2268 | | NFS4ERR_WRONG_TYPE | 2269 +----------------+--------------------------------------------------+ 2270 | LAYOUTSTATS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2271 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 2272 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 2273 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 2274 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, | 2275 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2276 | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | 2277 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2278 | | NFS4ERR_REP_TOO_BIG, | 2279 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2280 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2281 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2282 | | NFS4ERR_TOO_MANY_OPS, | 2283 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | 2284 | | NFS4ERR_WRONG_TYPE | 2285 +----------------+--------------------------------------------------+ 2286 | OFFLOAD_CANCEL | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2287 | | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, | 2288 | | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, | 2289 | | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, | 2290 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2291 | | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS | 2292 +----------------+--------------------------------------------------+ 2293 | OFFLOAD_STATUS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 2294 | | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, | 2295 | | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, | 2296 | | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, | 2297 | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | 2298 | | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS | 2299 +----------------+--------------------------------------------------+ 2300 | READ_PLUS | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2301 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2302 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2303 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2304 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2305 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | 2306 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2307 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2308 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2309 | | NFS4ERR_PARTNER_NO_AUTH, NFS4ERR_PNFS_IO_HOLE, | 2310 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2311 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2312 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2313 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2314 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 2315 | | NFS4ERR_WRONG_TYPE | 2316 +----------------+--------------------------------------------------+ 2317 | SEEK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2318 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2319 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2320 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 2321 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2322 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | 2323 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2324 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 2325 | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | 2326 | | NFS4ERR_PNFS_IO_HOLE, NFS4ERR_PNFS_NO_LAYOUT, | 2327 | | NFS4ERR_REP_TOO_BIG, | 2328 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2329 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2330 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 2331 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 2332 | | NFS4ERR_UNION_NOTSUPP, NFS4ERR_WRONG_TYPE | 2333 +----------------+--------------------------------------------------+ 2334 | WRITE_SAME | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 2335 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 2336 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 2337 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 2338 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 2339 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | 2340 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, | 2341 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 2342 | | NFS4ERR_NOSPC, NFS4ERR_NOTSUPP, | 2343 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 2344 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PNFS_IO_HOLE, | 2345 | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | 2346 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 2347 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | 2348 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 2349 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 2350 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 2351 +----------------+--------------------------------------------------+ 2353 Table 2 2355 11.3. New Callback Operations and Their Valid Errors 2357 This section contains a table that gives the valid error returns for 2358 each new NFSv4.2 callback operation. The error code NFS4_OK 2359 (indicating no error) is not listed but should be understood to be 2360 returnable by all new callback operations. The error values for all 2361 other callback operations are defined in Section 15.3 of [RFC5661]. 2363 Valid Error Returns for Each New Protocol Callback Operation 2365 +------------+------------------------------------------------------+ 2366 | Callback | Errors | 2367 | Operation | | 2368 +------------+------------------------------------------------------+ 2369 | CB_OFFLOAD | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 2370 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 2371 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_REP_TOO_BIG, | 2372 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, NFS4ERR_REQ_TOO_BIG, | 2373 | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | 2374 | | NFS4ERR_TOO_MANY_OPS | 2375 +------------+------------------------------------------------------+ 2377 Table 3 2379 12. New File Attributes 2381 12.1. New RECOMMENDED Attributes - List and Definition References 2383 The list of new RECOMMENDED attributes appears in Table 4. The 2384 meaning of the columns of the table are: 2386 Name: The name of the attribute. 2388 Id: The number assigned to the attribute. In the event of conflicts 2389 between the assigned number and 2390 [I-D.ietf-nfsv4-minorversion2-dot-x], the latter is likely 2391 authoritative, but should be resolved with Errata to this document 2392 and/or [I-D.ietf-nfsv4-minorversion2-dot-x]. See [IESG08] for the 2393 Errata process. 2395 Data Type: The XDR data type of the attribute. 2397 Acc: Access allowed to the attribute. 2399 R means read-only (GETATTR may retrieve, SETATTR may not set). 2401 W means write-only (SETATTR may set, GETATTR may not retrieve). 2403 R W means read/write (GETATTR may retrieve, SETATTR may set). 2405 Defined in: The section of this specification that describes the 2406 attribute. 2408 +------------------+----+-------------------+-----+----------------+ 2409 | Name | Id | Data Type | Acc | Defined in | 2410 +------------------+----+-------------------+-----+----------------+ 2411 | clone_blksize | 77 | uint32_t | R | Section 12.2.1 | 2412 | space_freed | 78 | length4 | R | Section 12.2.2 | 2413 | change_attr_type | 79 | change_attr_type4 | R | Section 12.2.3 | 2414 | sec_label | 80 | sec_label4 | R W | Section 12.2.4 | 2415 +------------------+----+-------------------+-----+----------------+ 2417 Table 4 2419 12.2. Attribute Definitions 2421 12.2.1. Attribute 77: clone_blksize 2423 The clone_blksize attribute indicates the granularity of a CLONE 2424 operation. 2426 12.2.2. Attribute 78: space_freed 2428 space_freed gives the number of bytes freed if the file is deleted. 2429 This attribute is read only and is of type length4. It is a per file 2430 attribute. 2432 12.2.3. Attribute 79: change_attr_type 2434 2436 enum change_attr_type4 { 2437 NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR = 0, 2438 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER = 1, 2439 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2, 2440 NFS4_CHANGE_TYPE_IS_TIME_METADATA = 3, 2441 NFS4_CHANGE_TYPE_IS_UNDEFINED = 4 2442 }; 2444 2446 change_attr_type is a per file system attribute which enables the 2447 NFSv4.2 server to provide additional information about how it expects 2448 the change attribute value to evolve after the file data, or metadata 2449 has changed. While Section 5.4 of [RFC5661] discusses per file 2450 system attributes, it is expected that the value of change_attr_type 2451 not depend on the value of "homogeneous" and only changes in the 2452 event of a migration. 2454 NFS4_CHANGE_TYPE_IS_UNDEFINED: The change attribute does not take 2455 values that fit into any of these categories. 2457 NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR: The change attribute value MUST 2458 monotonically increase for every atomic change to the file 2459 attributes, data, or directory contents. 2461 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER: The change attribute value MUST 2462 be incremented by one unit for every atomic change to the file 2463 attributes, data, or directory contents. This property is 2464 preserved when writing to pNFS data servers. 2466 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS: The change attribute 2467 value MUST be incremented by one unit for every atomic change to 2468 the file attributes, data, or directory contents. In the case 2469 where the client is writing to pNFS data servers, the number of 2470 increments is not guaranteed to exactly match the number of 2471 writes. 2473 NFS4_CHANGE_TYPE_IS_TIME_METADATA: The change attribute is 2474 implemented as suggested in [RFC7530] in terms of the 2475 time_metadata attribute. 2477 If either NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR, 2478 NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, or 2479 NFS4_CHANGE_TYPE_IS_TIME_METADATA are set, then the client knows at 2480 the very least that the change attribute is monotonically increasing, 2481 which is sufficient to resolve the question of which value is the 2482 most recent. 2484 If the client sees the value NFS4_CHANGE_TYPE_IS_TIME_METADATA, then 2485 by inspecting the value of the 'time_delta' attribute it additionally 2486 has the option of detecting rogue server implementations that use 2487 time_metadata in violation of the spec. 2489 If the client sees NFS4_CHANGE_TYPE_IS_VERSION_COUNTER, it has the 2490 ability to predict what the resulting change attribute value should 2491 be after a COMPOUND containing a SETATTR, WRITE, or CREATE. This 2492 again allows it to detect changes made in parallel by another client. 2493 The value NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS permits the 2494 same, but only if the client is not doing pNFS WRITEs. 2496 Finally, if the server does not support change_attr_type or if 2497 NFS4_CHANGE_TYPE_IS_UNDEFINED is set, then the server SHOULD make an 2498 effort to implement the change attribute in terms of the 2499 time_metadata attribute. 2501 12.2.4. Attribute 80: sec_label 2503 2505 typedef uint32_t policy4; 2507 struct labelformat_spec4 { 2508 policy4 lfs_lfs; 2509 policy4 lfs_pi; 2510 }; 2512 struct sec_label4 { 2513 labelformat_spec4 slai_lfs; 2514 opaque slai_data<>; 2515 }; 2517 2519 The FATTR4_SEC_LABEL contains an array of two components with the 2520 first component being an LFS. It serves to provide the receiving end 2521 with the information necessary to translate the security attribute 2522 into a form that is usable by the endpoint. Label Formats assigned 2523 an LFS may optionally choose to include a Policy Identifier field to 2524 allow for complex policy deployments. The LFS and Label Format 2525 Registry are described in detail in [RFC7569]. The translation used 2526 to interpret the security attribute is not specified as part of the 2527 protocol as it may depend on various factors. The second component 2528 is an opaque section which contains the data of the attribute. This 2529 component is dependent on the MAC model to interpret and enforce. 2531 In particular, it is the responsibility of the LFS specification to 2532 define a maximum size for the opaque section, slai_data<>. When 2533 creating or modifying a label for an object, the client needs to be 2534 guaranteed that the server will accept a label that is sized 2535 correctly. By both client and server being part of a specific MAC 2536 model, the client will be aware of the size. 2538 13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL 2540 The following tables summarize the operations of the NFSv4.2 protocol 2541 and the corresponding designation of REQUIRED, RECOMMENDED, and 2542 OPTIONAL to implement or MUST NOT implement. The designation of MUST 2543 NOT implement is reserved for those operations that were defined in 2544 either NFSv4.0 or NFSV4.1 and MUST NOT be implemented in NFSv4.2. 2546 For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation 2547 for operations sent by the client is for the server implementation. 2548 The client is generally required to implement the operations needed 2549 for the operating environment for which it serves. For example, a 2550 read-only NFSv4.2 client would have no need to implement the WRITE 2551 operation and is not required to do so. 2553 The REQUIRED or OPTIONAL designation for callback operations sent by 2554 the server is for both the client and server. Generally, the client 2555 has the option of creating the backchannel and sending the operations 2556 on the fore channel that will be a catalyst for the server sending 2557 callback operations. A partial exception is CB_RECALL_SLOT; the only 2558 way the client can avoid supporting this operation is by not creating 2559 a backchannel. 2561 Since this is a summary of the operations and their designation, 2562 there are subtleties that are not presented here. Therefore, if 2563 there is a question of the requirements of implementation, the 2564 operation descriptions themselves must be consulted along with other 2565 relevant explanatory text within this either specification or that of 2566 NFSv4.1 [RFC5661]. 2568 The abbreviations used in the second and third columns of the table 2569 are defined as follows. 2571 REQ: REQUIRED to implement 2573 REC: RECOMMENDED to implement 2575 OPT: OPTIONAL to implement 2577 MNI: MUST NOT implement 2579 For the NFSv4.2 features that are OPTIONAL, the operations that 2580 support those features are OPTIONAL, and the server MUST return 2581 NFS4ERR_NOTSUPP in response to the client's use of those operations, 2582 when those operations are not implemented by the server. If an 2583 OPTIONAL feature is supported, it is possible that a set of 2584 operations related to the feature become REQUIRED to implement. The 2585 third column of the table designates the feature(s) and if the 2586 operation is REQUIRED or OPTIONAL in the presence of support for the 2587 feature. 2589 The OPTIONAL features identified and their abbreviations are as 2590 follows: 2592 pNFS: Parallel NFS 2594 FDELG: File Delegations 2596 DDELG: Directory Delegations 2597 COPYra: Intra-server Server Side Copy 2599 COPYer: Inter-server Server Side Copy 2601 ADB: Application Data Blocks 2603 Operations 2605 +----------------------+--------------------+-----------------------+ 2606 | Operation | REQ, REC, OPT, or | Feature (REQ, REC, or | 2607 | | MNI | OPT) | 2608 +----------------------+--------------------+-----------------------+ 2609 | ALLOCATE | OPT | | 2610 | ACCESS | REQ | | 2611 | BACKCHANNEL_CTL | REQ | | 2612 | BIND_CONN_TO_SESSION | REQ | | 2613 | CLONE | OPT | | 2614 | CLOSE | REQ | | 2615 | COMMIT | REQ | | 2616 | COPY | OPT | COPYer (REQ), COPYra | 2617 | | | (REQ) | 2618 | COPY_NOTIFY | OPT | COPYer (REQ) | 2619 | DEALLOCATE | OPT | | 2620 | CREATE | REQ | | 2621 | CREATE_SESSION | REQ | | 2622 | DELEGPURGE | OPT | FDELG (REQ) | 2623 | DELEGRETURN | OPT | FDELG, DDELG, pNFS | 2624 | | | (REQ) | 2625 | DESTROY_CLIENTID | REQ | | 2626 | DESTROY_SESSION | REQ | | 2627 | EXCHANGE_ID | REQ | | 2628 | FREE_STATEID | REQ | | 2629 | GETATTR | REQ | | 2630 | GETDEVICEINFO | OPT | pNFS (REQ) | 2631 | GETDEVICELIST | MNI | pNFS (MNI) | 2632 | GETFH | REQ | | 2633 | GET_DIR_DELEGATION | OPT | DDELG (REQ) | 2634 | ILLEGAL | REQ | | 2635 | IO_ADVISE | OPT | | 2636 | LAYOUTCOMMIT | OPT | pNFS (REQ) | 2637 | LAYOUTGET | OPT | pNFS (REQ) | 2638 | LAYOUTRETURN | OPT | pNFS (REQ) | 2639 | LAYOUTERROR | OPT | pNFS (OPT) | 2640 | LAYOUTSTATS | OPT | pNFS (OPT) | 2641 | LINK | OPT | | 2642 | LOCK | REQ | | 2643 | LOCKT | REQ | | 2644 | LOCKU | REQ | | 2645 | LOOKUP | REQ | | 2646 | LOOKUPP | REQ | | 2647 | NVERIFY | REQ | | 2648 | OFFLOAD_CANCEL | OPT | COPYer (OPT), COPYra | 2649 | | | (OPT) | 2650 | OFFLOAD_STATUS | OPT | COPYer (OPT), COPYra | 2651 | | | (OPT) | 2652 | OPEN | REQ | | 2653 | OPENATTR | OPT | | 2654 | OPEN_CONFIRM | MNI | | 2655 | OPEN_DOWNGRADE | REQ | | 2656 | PUTFH | REQ | | 2657 | PUTPUBFH | REQ | | 2658 | PUTROOTFH | REQ | | 2659 | READ | REQ | | 2660 | READDIR | REQ | | 2661 | READLINK | OPT | | 2662 | READ_PLUS | OPT | | 2663 | RECLAIM_COMPLETE | REQ | | 2664 | RELEASE_LOCKOWNER | MNI | | 2665 | REMOVE | REQ | | 2666 | RENAME | REQ | | 2667 | RENEW | MNI | | 2668 | RESTOREFH | REQ | | 2669 | SAVEFH | REQ | | 2670 | SECINFO | REQ | | 2671 | SECINFO_NO_NAME | REC | pNFS file layout | 2672 | | | (REQ) | 2673 | SEEK | OPT | | 2674 | SEQUENCE | REQ | | 2675 | SETATTR | REQ | | 2676 | SETCLIENTID | MNI | | 2677 | SETCLIENTID_CONFIRM | MNI | | 2678 | SET_SSV | REQ | | 2679 | TEST_STATEID | REQ | | 2680 | VERIFY | REQ | | 2681 | WANT_DELEGATION | OPT | FDELG (OPT) | 2682 | WRITE | REQ | | 2683 | WRITE_SAME | OPT | ADB (REQ) | 2684 +----------------------+--------------------+-----------------------+ 2686 Table 5 2688 Callback Operations 2690 +-------------------------+------------------+----------------------+ 2691 | Operation | REQ, REC, OPT, | Feature (REQ, REC, | 2692 | | or MNI | or OPT) | 2693 +-------------------------+------------------+----------------------+ 2694 | CB_GETATTR | OPT | FDELG (REQ) | 2695 | CB_ILLEGAL | REQ | | 2696 | CB_LAYOUTRECALL | OPT | pNFS (REQ) | 2697 | CB_NOTIFY | OPT | DDELG (REQ) | 2698 | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | 2699 | CB_NOTIFY_LOCK | OPT | | 2700 | CB_OFFLOAD | OPT | COPYer (REQ), COPYra | 2701 | | | (REQ) | 2702 | CB_PUSH_DELEG | OPT | FDELG (OPT) | 2703 | CB_RECALL | OPT | FDELG, DDELG, pNFS | 2704 | | | (REQ) | 2705 | CB_RECALL_ANY | OPT | FDELG, DDELG, pNFS | 2706 | | | (REQ) | 2707 | CB_RECALL_SLOT | REQ | | 2708 | CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS (REQ) | 2709 | CB_SEQUENCE | OPT | FDELG, DDELG, pNFS | 2710 | | | (REQ) | 2711 | CB_WANTS_CANCELLED | OPT | FDELG, DDELG, pNFS | 2712 | | | (REQ) | 2713 +-------------------------+------------------+----------------------+ 2715 Table 6 2717 14. Modifications to NFSv4.1 Operations 2719 14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID 2721 14.1.1. ARGUMENT 2723 2725 /* new */ 2726 const EXCHGID4_FLAG_SUPP_FENCE_OPS = 0x00000004; 2728 2730 14.1.2. RESULT 2732 Unchanged 2734 14.1.3. MOTIVATION 2736 Enterprise applications require guarantees that an operation has 2737 either aborted or completed. NFSv4.1 provides this guarantee as long 2738 as the session is alive: simply send a SEQUENCE operation on the same 2739 slot with a new sequence number, and the successful return of 2740 SEQUENCE indicates the previous operation has completed. However, if 2741 the session is lost, there is no way to know when any in progress 2742 operations have aborted or completed. In hindsight, the NFSv4.1 2743 specification should have mandated that DESTROY_SESSION either abort 2744 or complete all outstanding operations. 2746 14.1.4. DESCRIPTION 2748 A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability 2749 when it sends an EXCHANGE_ID operation. The server SHOULD set this 2750 capability in the EXCHANGE_ID reply whether the client requests it or 2751 not. It is the server's return that determines whether this 2752 capability is in effect. When it is in effect, the following will 2753 occur: 2755 o The server will not reply to any DESTROY_SESSION invoked with the 2756 client ID until all operations in progress are completed or 2757 aborted. 2759 o The server will not reply to subsequent EXCHANGE_ID invoked on the 2760 same client owner with a new verifier until all operations in 2761 progress on the client ID's session are completed or aborted. 2763 o In implementations where the NFS server is deployed as a cluster, 2764 it does support client ID trunking, and the 2765 EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a session 2766 ID created on one node of the storage cluster MUST be destroyable 2767 via DESTROY_SESSION. In addition, DESTROY_CLIENTID and an 2768 EXCHANGE_ID with a new verifier affects all sessions regardless 2769 what node the sessions were created on. 2771 14.2. Operation 48: GETDEVICELIST - Get All Device Mappings for a File 2772 System 2774 14.2.1. ARGUMENT 2776 2777 struct GETDEVICELIST4args { 2778 /* CURRENT_FH: object belonging to the file system */ 2779 layouttype4 gdla_layout_type; 2781 /* number of deviceIDs to return */ 2782 count4 gdla_maxdevices; 2784 nfs_cookie4 gdla_cookie; 2785 verifier4 gdla_cookieverf; 2786 }; 2788 2790 14.2.2. RESULT 2792 2794 struct GETDEVICELIST4resok { 2795 nfs_cookie4 gdlr_cookie; 2796 verifier4 gdlr_cookieverf; 2797 deviceid4 gdlr_deviceid_list<>; 2798 bool gdlr_eof; 2799 }; 2801 union GETDEVICELIST4res switch (nfsstat4 gdlr_status) { 2802 case NFS4_OK: 2803 GETDEVICELIST4resok gdlr_resok4; 2804 default: 2805 void; 2806 }; 2808 2810 14.2.3. MOTIVATION 2812 The GETDEVICELIST operation was introduced in [RFC5661] specifically 2813 to request a list of devices at filesystem mount time from block 2814 layout type servers. However use of the GETDEVICELIST operation 2815 introduces a race condition versus notification about changes to pNFS 2816 device IDs as provided by CB_NOTIFY_DEVICEID. Implementation 2817 experience with block layout servers has shown there is no need for 2818 GETDEVICELIST. Clients have to be able to request new devices using 2819 GETDEVICEINFO at any time in response either to a new deviceid in 2820 LAYOUTGET results or to the CB_NOTIFY_DEVICEID callback operation. 2822 14.2.4. DESCRIPTION 2824 Clients and servers MUST NOT implement the GETDEVICELIST operation. 2826 15. NFSv4.2 Operations 2828 15.1. Operation 59: ALLOCATE - Reserve Space in A Region of a File 2830 15.1.1. ARGUMENT 2832 2834 struct ALLOCATE4args { 2835 /* CURRENT_FH: file */ 2836 stateid4 aa_stateid; 2837 offset4 aa_offset; 2838 length4 aa_length; 2839 }; 2841 2843 15.1.2. RESULT 2845 2847 struct ALLOCATE4res { 2848 nfsstat4 ar_status; 2849 }; 2851 2853 15.1.3. DESCRIPTION 2855 Whenever a client wishes to reserve space for a region in a file it 2856 calls the ALLOCATE operation with the current filehandle set to the 2857 filehandle of the file in question, and the start offset and length 2858 in bytes of the region set in aa_offset and aa_length respectively. 2860 CURRENT_FH must be a regular file. If CURRENT_FH is not a regular 2861 file, the operation MUST fail and return NFS4ERR_WRONG_TYPE. 2863 The aa_stateid MUST refer to a stateid that is valid for a WRITE 2864 operation and follows the rules for stateids in Sections 8.2.5 and 2865 18.32.3 of [RFC5661]. 2867 The server will ensure that backing blocks are reserved to the region 2868 specified by aa_offset and aa_length, and that no future writes into 2869 this region will return NFS4ERR_NOSPC. If the region lies partially 2870 or fully outside the current file size the file size will be set to 2871 aa_offset + aa_length implicitly. If the server cannot guarantee 2872 this, it must return NFS4ERR_NOSPC. 2874 The ALLOCATE operation can also be used to extend the size of a file 2875 if the region specified by aa_offset and aa_length extends beyond the 2876 current file size. In that case any data outside of the previous 2877 file size will return zeroes when read before data is written to it. 2879 It is not required that the server allocate the space to the file 2880 before returning success. The allocation can be deferred, however, 2881 it must be guaranteed that it will not fail for lack of space. The 2882 deferral does not result in an asynchronous reply. 2884 The ALLOCATE operation will result in the space_used attribute and 2885 space_freed attributes being increased by the number of bytes 2886 reserved unless they were previously reserved or written and not 2887 shared. 2889 15.2. Operation 60: COPY - Initiate a server-side copy 2891 15.2.1. ARGUMENT 2893 2895 struct COPY4args { 2896 /* SAVED_FH: source file */ 2897 /* CURRENT_FH: destination file */ 2898 stateid4 ca_src_stateid; 2899 stateid4 ca_dst_stateid; 2900 offset4 ca_src_offset; 2901 offset4 ca_dst_offset; 2902 length4 ca_count; 2903 bool ca_consecutive; 2904 bool ca_synchronous; 2905 netloc4 ca_source_server<>; 2906 }; 2908 2910 15.2.2. RESULT 2912 2913 struct write_response4 { 2914 stateid4 wr_callback_id<1>; 2915 length4 wr_count; 2916 stable_how4 wr_committed; 2917 verifier4 wr_writeverf; 2918 }; 2920 struct copy_requirements4 { 2921 bool cr_consecutive; 2922 bool cr_synchronous; 2923 }; 2925 struct COPY4resok { 2926 write_response4 cr_response; 2927 copy_requirements4 cr_requirements; 2928 }; 2930 union COPY4res switch (nfsstat4 cr_status) { 2931 case NFS4_OK: 2932 COPY4resok cr_resok4; 2933 case NFS4ERR_OFFLOAD_NO_REQS: 2934 copy_requirements4 cr_requirements; 2935 default: 2936 void; 2937 }; 2939 2941 15.2.3. DESCRIPTION 2943 The COPY operation is used for both intra-server and inter-server 2944 copies. In both cases, the COPY is always sent from the client to 2945 the destination server of the file copy. The COPY operation requests 2946 that a range in the file specified by SAVED_FH is copied to a range 2947 in the file specified by CURRENT_FH. 2949 Both SAVED_FH and CURRENT_FH must be regular files. If either 2950 SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail 2951 and return NFS4ERR_WRONG_TYPE. 2953 SAVED_FH and CURRENT_FH must be different files. If SAVED_FH and 2954 CURRENT_FH refer to the same file, the operation MUST fail with 2955 NFS4ERR_INVAL. 2957 If the request is for an inter-server-to-server copy, the source-fh 2958 is a filehandle from the source server and the compound procedure is 2959 being executed on the destination server. In this case, the source- 2960 fh is a foreign filehandle on the server receiving the COPY request. 2961 If either PUTFH or SAVEFH checked the validity of the filehandle, the 2962 operation would likely fail and return NFS4ERR_STALE. 2964 If a server supports the inter-server-to-server COPY feature, a PUTFH 2965 followed by a SAVEFH MUST NOT return NFS4ERR_STALE for either 2966 operation. These restrictions do not pose substantial difficulties 2967 for servers. CURRENT_FH and SAVED_FH may be validated in the context 2968 of the operation referencing them and an NFS4ERR_STALE error returned 2969 for an invalid file handle at that point. 2971 The ca_dst_stateid MUST refer to a stateid that is valid for a WRITE 2972 operation and follows the rules for stateids in Sections 8.2.5 and 2973 18.32.3 of [RFC5661]. For an inter-server copy, the ca_src_stateid 2974 MUST be the cnr_stateid returned from the earlier COPY_NOTIFY 2975 operation, while for an intra-server copy ca_src_stateid MUST refer 2976 to a stateid that is valid for a READ operations and follows the 2977 rules for stateids in Sections 8.2.5 and 18.22.3 of [RFC5661]. If 2978 either stateid is invalid, then the operation MUST fail. 2980 The ca_src_offset is the offset within the source file from which the 2981 data will be read, the ca_dst_offset is the offset within the 2982 destination file to which the data will be written, and the ca_count 2983 is the number of bytes that will be copied. An offset of 0 (zero) 2984 specifies the start of the file. A count of 0 (zero) requests that 2985 all bytes from ca_src_offset through EOF be copied to the 2986 destination. If concurrent modifications to the source file overlap 2987 with the source file region being copied, the data copied may include 2988 all, some, or none of the modifications. The client can use standard 2989 NFS operations (e.g., OPEN with OPEN4_SHARE_DENY_WRITE or mandatory 2990 byte range locks) to protect against concurrent modifications if the 2991 client is concerned about this. If the source file's end of file is 2992 being modified in parallel with a copy that specifies a count of 0 2993 (zero) bytes, the amount of data copied is implementation dependent 2994 (clients may guard against this case by specifying a non-zero count 2995 value or preventing modification of the source file as mentioned 2996 above). 2998 If the source offset or the source offset plus count is greater than 2999 the size of the source file, the operation MUST fail with 3000 NFS4ERR_INVAL. The destination offset or destination offset plus 3001 count may be greater than the size of the destination file. This 3002 allows for the client to issue parallel copies to implement 3003 operations such as 3005 3007 % cat file1 file2 file3 file4 > dest 3009 3011 If the ca_source_server list is specified, then this is an inter- 3012 server copy operation and the source file is on a remote server. The 3013 client is expected to have previously issued a successful COPY_NOTIFY 3014 request to the remote source server. The ca_source_server list MUST 3015 be the same as the COPY_NOTIFY response's cnr_source_server list. If 3016 the client includes the entries from the COPY_NOTIFY response's 3017 cnr_source_server list in the ca_source_server list, the source 3018 server can indicate a specific copy protocol for the destination 3019 server to use by returning a URL, which specifies both a protocol 3020 service and server name. Server-to-server copy protocol 3021 considerations are described in Section 4.7 and Section 4.10.1. 3023 If ca_consecutive is set, then the client has specified that the copy 3024 protocol selected MUST copy bytes in consecutive order from 3025 ca_src_offset to ca_count. If the destination server cannot meet 3026 this requirement, then it MUST return an error of 3027 NFS4ERR_OFFLOAD_NO_REQS and set cr_consecutive to be false. 3028 Likewise, if ca_synchronous is set, then the client has required that 3029 the copy protocol selected MUST perform a synchronous copy. If the 3030 destination server cannot meet this requirement, then it MUST return 3031 an error of NFS4ERR_OFFLOAD_NO_REQS and set cr_synchronous to be 3032 false. 3034 If both are set by the client, then the destination SHOULD try to 3035 determine if it can respond to both requirements at the same time. 3036 If it cannot make that determination, it must set to true the one it 3037 can and set to false the other. The client, upon getting an 3038 NFS4ERR_OFFLOAD_NO_REQS error, has to examine both cr_consecutive and 3039 cr_synchronous against the respective values of ca_consecutive and 3040 ca_synchronous to determine the possible requirement not met. It 3041 MUST be prepared for the destination server not being able to 3042 determine both requirements at the same time. 3044 Upon receiving the NFS4ERR_OFFLOAD_NO_REQS error, the client has to 3045 determine if it wants to either re-request the copy with a relaxed 3046 set of requirements or if it wants to revert to manually copying the 3047 data. If it decides to manually copy the data and this is a remote 3048 copy, then the client is responsible for informing the source that 3049 the earlier COPY_NOTIFY is no longer valid by sending it an 3050 OFFLOAD_CANCEL. 3052 If the operation does not result in an immediate failure, the server 3053 will return NFS4_OK. 3055 If the wr_callback_id is returned, this indicates that an 3056 asynchronous COPY operation was initiated and a CB_OFFLOAD callback 3057 will deliver the final results of the operation. The wr_callback_id 3058 stateid is termed a copy stateid in this context. The server is 3059 given the option of returning the results in a callback because the 3060 data may require a relatively long period of time to copy. 3062 If no wr_callback_id is returned, the operation completed 3063 synchronously and no callback will be issued by the server. The 3064 completion status of the operation is indicated by cr_status. 3066 If the copy completes successfully, either synchronously or 3067 asynchronously, the data copied from the source file to the 3068 destination file MUST appear identical to the NFS client. However, 3069 the NFS server's on disk representation of the data in the source 3070 file and destination file MAY differ. For example, the NFS server 3071 might encrypt, compress, deduplicate, or otherwise represent the on 3072 disk data in the source and destination file differently. 3074 If a failure does occur for a synchronous copy, wr_count will be set 3075 to the number of bytes copied to the destination file before the 3076 error occurred. If cr_consecutive is true, then the bytes were 3077 copied in order. If the failure occurred for an asynchronous copy, 3078 then the client will have gotten the notification of the consecutive 3079 copy order when it got the copy stateid. It will be able to 3080 determine the bytes copied from the coa_bytes_copied in the 3081 CB_OFFLOAD argument. 3083 In either case, if cr_consecutive was not true, there is no assurance 3084 as to exactly which bytes in the range were copied. The client MUST 3085 assume that there exists a mixture of the original contents of the 3086 range and the new bytes. If the COPY wrote past the end of the file 3087 on the destination, then the last byte written to will determine the 3088 new file size. The contents of any block not written to and past the 3089 original size of the file will be as if a normal WRITE extended the 3090 file. 3092 15.3. Operation 61: COPY_NOTIFY - Notify a source server of a future 3093 copy 3095 15.3.1. ARGUMENT 3097 3098 struct COPY_NOTIFY4args { 3099 /* CURRENT_FH: source file */ 3100 stateid4 cna_src_stateid; 3101 netloc4 cna_destination_server; 3102 }; 3104 3106 15.3.2. RESULT 3108 3110 struct COPY_NOTIFY4resok { 3111 nfstime4 cnr_lease_time; 3112 stateid4 cnr_stateid; 3113 netloc4 cnr_source_server<>; 3114 }; 3116 union COPY_NOTIFY4res switch (nfsstat4 cnr_status) { 3117 case NFS4_OK: 3118 COPY_NOTIFY4resok resok4; 3119 default: 3120 void; 3121 }; 3123 3125 15.3.3. DESCRIPTION 3127 This operation is used for an inter-server copy. A client sends this 3128 operation in a COMPOUND request to the source server to authorize a 3129 destination server identified by cna_destination_server to read the 3130 file specified by CURRENT_FH on behalf of the given user. 3132 The cna_src_stateid MUST refer to either open or locking states 3133 provided earlier by the server. If it is invalid, then the operation 3134 MUST fail. 3136 The cna_destination_server MUST be specified using the netloc4 3137 network location format. The server is not required to resolve the 3138 cna_destination_server address before completing this operation. 3140 If this operation succeeds, the source server will allow the 3141 cna_destination_server to copy the specified file on behalf of the 3142 given user as long as both of the following conditions are met: 3144 o The destination server begins reading the source file before the 3145 cnr_lease_time expires. If the cnr_lease_time expires while the 3146 destination server is still reading the source file, the 3147 destination server is allowed to finish reading the file. If the 3148 cnr_lease_time expires before the destination server uses READ or 3149 READ_PLUS to begin the transfer, the source server can use 3150 NFS4ERR_PARTNER_NO_AUTH to inform the destination server that the 3151 cnr_lease_time has expired. 3153 o The client has not issued a OFFLOAD_CANCEL for the same 3154 combination of user, filehandle, and destination server. 3156 The cnr_lease_time is chosen by the source server. A cnr_lease_time 3157 of 0 (zero) indicates an infinite lease. To avoid the need for 3158 synchronized clocks, copy lease times are granted by the server as a 3159 time delta. To renew the copy lease time the client should resend 3160 the same copy notification request to the source server. 3162 The cnr_stateid is a copy stateid which uniquely describes the state 3163 needed on the source server to track the proposed copy. As defined 3164 in Section 8.2 of [RFC5661], a stateid is tied to the current 3165 filehandle and if the same stateid is presented by two different 3166 clients, it may refer to different state. As the source does not 3167 know which netloc4 network location the destination might use to 3168 establish the copy operation, it can use the cnr_stateid to identify 3169 that the destination is operating on behalf of the client. Thus the 3170 source server MUST construct copy stateids such that they are 3171 distinct from all other stateids handed out to clients. These copy 3172 stateids MUST denote the same set of locks as each of the earlier 3173 delegation, locking, and open states for the client on the given file 3174 (see Section 4.4.1). 3176 A successful response will also contain a list of netloc4 network 3177 location formats called cnr_source_server, on which the source is 3178 willing to accept connections from the destination. These might not 3179 be reachable from the client and might be located on networks to 3180 which the client has no connection. 3182 For a copy only involving one server (the source and destination are 3183 on the same server), this operation is unnecessary. 3185 15.4. Operation 62: DEALLOCATE - Unreserve Space in a Region of a File 3187 15.4.1. ARGUMENT 3189 3190 struct DEALLOCATE4args { 3191 /* CURRENT_FH: file */ 3192 stateid4 da_stateid; 3193 offset4 da_offset; 3194 length4 da_length; 3195 }; 3197 3199 15.4.2. RESULT 3201 3203 struct DEALLOCATE4res { 3204 nfsstat4 dr_status; 3205 }; 3207 3209 15.4.3. DESCRIPTION 3211 Whenever a client wishes to unreserve space for a region in a file it 3212 calls the DEALLOCATE operation with the current filehandle set to the 3213 filehandle of the file in question, and the start offset and length 3214 in bytes of the region set in da_offset and da_length respectively. 3215 If no space was allocated or reserved for all or parts of the region, 3216 the DEALLOCATE operation will have no effect for the region that 3217 already is in unreserved state. All further reads from the region 3218 passed to DEALLOCATE MUST return zeros until overwritten. 3220 CURRENT_FH must be a regular file. If CURRENT_FH is not a regular 3221 file, the operation MUST fail and return NFS4ERR_WRONG_TYPE. 3223 The da_stateid MUST refer to a stateid that is valid for a WRITE 3224 operation and follows the rules for stateids in Sections 8.2.5 and 3225 18.32.3 of [RFC5661]. 3227 Situations may arise where da_offset and/or da_offset + da_length 3228 will not be aligned to a boundary for which the server does 3229 allocations or deallocations. For most file systems, this is the 3230 block size of the file system. In such a case, the server can 3231 deallocate as many bytes as it can in the region. The blocks that 3232 cannot be deallocated MUST be zeroed. 3234 DEALLOCATE will result in the space_used attribute being decreased by 3235 the number of bytes that were deallocated. The space_freed attribute 3236 may or may not decrease, depending on the support and whether the 3237 blocks backing the specified range were shared or not. The size 3238 attribute will remain unchanged. 3240 15.5. Operation 63: IO_ADVISE - Application I/O access pattern hints 3242 15.5.1. ARGUMENT 3244 3246 enum IO_ADVISE_type4 { 3247 IO_ADVISE4_NORMAL = 0, 3248 IO_ADVISE4_SEQUENTIAL = 1, 3249 IO_ADVISE4_SEQUENTIAL_BACKWARDS = 2, 3250 IO_ADVISE4_RANDOM = 3, 3251 IO_ADVISE4_WILLNEED = 4, 3252 IO_ADVISE4_WILLNEED_OPPORTUNISTIC = 5, 3253 IO_ADVISE4_DONTNEED = 6, 3254 IO_ADVISE4_NOREUSE = 7, 3255 IO_ADVISE4_READ = 8, 3256 IO_ADVISE4_WRITE = 9, 3257 IO_ADVISE4_INIT_PROXIMITY = 10 3258 }; 3260 struct IO_ADVISE4args { 3261 /* CURRENT_FH: file */ 3262 stateid4 iaa_stateid; 3263 offset4 iaa_offset; 3264 length4 iaa_count; 3265 bitmap4 iaa_hints; 3266 }; 3268 3270 15.5.2. RESULT 3272 3273 struct IO_ADVISE4resok { 3274 bitmap4 ior_hints; 3275 }; 3277 union IO_ADVISE4res switch (nfsstat4 ior_status) { 3278 case NFS4_OK: 3279 IO_ADVISE4resok resok4; 3280 default: 3281 void; 3282 }; 3284 3286 15.5.3. DESCRIPTION 3288 The IO_ADVISE operation sends an I/O access pattern hint to the 3289 server for the owner of the stateid for a given byte range specified 3290 by iar_offset and iar_count. The byte range specified by iaa_offset 3291 and iaa_count need not currently exist in the file, but the iaa_hints 3292 will apply to the byte range when it does exist. If iaa_count is 0, 3293 all data following iaa_offset is specified. The server MAY ignore 3294 the advice. 3296 The following are the allowed hints for a stateid holder: 3298 IO_ADVISE4_NORMAL There is no advice to give, this is the default 3299 behavior. 3301 IO_ADVISE4_SEQUENTIAL Expects to access the specified data 3302 sequentially from lower offsets to higher offsets. 3304 IO_ADVISE4_SEQUENTIAL_BACKWARDS Expects to access the specified data 3305 sequentially from higher offsets to lower offsets. 3307 IO_ADVISE4_RANDOM Expects to access the specified data in a random 3308 order. 3310 IO_ADVISE4_WILLNEED Expects to access the specified data in the near 3311 future. 3313 IO_ADVISE4_WILLNEED_OPPORTUNISTIC Expects to possibly access the 3314 data in the near future. This is a speculative hint, and 3315 therefore the server should prefetch data or indirect blocks only 3316 if it can be done at a marginal cost. 3318 IO_ADVISE_DONTNEED Expects that it will not access the specified 3319 data in the near future. 3321 IO_ADVISE_NOREUSE Expects to access the specified data once and then 3322 not reuse it thereafter. 3324 IO_ADVISE4_READ Expects to read the specified data in the near 3325 future. 3327 IO_ADVISE4_WRITE Expects to write the specified data in the near 3328 future. 3330 IO_ADVISE4_INIT_PROXIMITY Informs the server that the data in the 3331 byte range remains important to the client. 3333 Since IO_ADVISE is a hint, a server SHOULD NOT return an error and 3334 invalidate a entire Compound request if one of the sent hints in 3335 iar_hints is not supported by the server. Also, the server MUST NOT 3336 return an error if the client sends contradictory hints to the 3337 server, e.g., IO_ADVISE4_SEQUENTIAL and IO_ADVISE4_RANDOM in a single 3338 IO_ADVISE operation. In these cases, the server MUST return success 3339 and a ior_hints value that indicates the hint it intends to 3340 implement. This may mean simply returning IO_ADVISE4_NORMAL. 3342 The ior_hints returned by the server is primarily for debugging 3343 purposes since the server is under no obligation to carry out the 3344 hints that it describes in the ior_hints result. In addition, while 3345 the server may have intended to implement the hints returned in 3346 ior_hints, as time progresses, the server may need to change its 3347 handling of a given file due to several reasons including, but not 3348 limited to, memory pressure, additional IO_ADVISE hints sent by other 3349 clients, and heuristically detected file access patterns. 3351 The server MAY return different advice than what the client 3352 requested. If it does, then this might be due to one of several 3353 conditions, including, but not limited to another client advising of 3354 a different I/O access pattern; a different I/O access pattern from 3355 another client that that the server has heuristically detected; or 3356 the server is not able to support the requested I/O access pattern, 3357 perhaps due to a temporary resource limitation. 3359 Each issuance of the IO_ADVISE operation overrides all previous 3360 issuances of IO_ADVISE for a given byte range. This effectively 3361 follows a strategy of last hint wins for a given stateid and byte 3362 range. 3364 Clients should assume that hints included in an IO_ADVISE operation 3365 will be forgotten once the file is closed. 3367 15.5.4. IMPLEMENTATION 3369 The NFS client may choose to issue an IO_ADVISE operation to the 3370 server in several different instances. 3372 The most obvious is in direct response to an application's execution 3373 of posix_fadvise(). In this case, IO_ADVISE4_WRITE and 3374 IO_ADVISE4_READ may be set based upon the type of file access 3375 specified when the file was opened. 3377 15.5.5. IO_ADVISE4_INIT_PROXIMITY 3379 The IO_ADVISE4_INIT_PROXIMITY hint is non-posix in origin and can be 3380 used to convey that the client has recently accessed the byte range 3381 in its own cache. I.e., it has not accessed it on the server, but it 3382 has locally. When the server reaches resource exhaustion, knowing 3383 which data is more important allows the server to make better choices 3384 about which data to, for example purge from a cache, or move to 3385 secondary storage. It also informs the server which delegations are 3386 more important, since if delegations are working correctly, once 3387 delegated to a client and the client has read the content for that 3388 byte range, a server might never receive another read request for 3389 that byte range. 3391 The IO_ADVISE4_INIT_PROXIMITY hint can also be used in a pNFS setting 3392 to let the client inform the metadata server as to the I/O statistics 3393 between the client and the storage devices. The metadata server is 3394 then free to use this information about client I/O to optimize the 3395 data storage location. 3397 This hint is also useful in the case of NFS clients which are network 3398 booting from a server. If the first client to be booted sends this 3399 hint, then it keeps the cache warm for the remaining clients. 3401 15.5.6. pNFS File Layout Data Type Considerations 3403 The IO_ADVISE considerations for pNFS are very similar to the COMMIT 3404 considerations for pNFS (see Section 13.7 of [RFC5661]). That is, as 3405 with COMMIT, some NFS server implementations prefer IO_ADVISE be done 3406 on the storage device, and some prefer it be done on the metadata 3407 server. 3409 For the file's layout type, NFSv4.2 includes an additional hint 3410 NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on metadata servers 3411 running NFSv4.2 or higher. Any file's layout obtained from a NFSv4.1 3412 metadata server MUST NOT have NFL42_UFLG_IO_ADVISE_THRU_MDS set. Any 3413 file's layout obtained with a NFSv4.2 metadata server MAY have 3414 NFL42_UFLG_IO_ADVISE_THRU_MDS set. However, if the layout utilizes 3415 NFSv4.1 storage devices, the IO_ADVISE operation cannot be sent to 3416 them. 3418 If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, the client MUST send the 3419 IO_ADVISE operation to the metadata server in order for it to be 3420 honored by the storage device. Once the metadata server receives the 3421 IO_ADVISE operation, it will communicate the advice to each storage 3422 device. 3424 If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then the client SHOULD 3425 send an IO_ADVISE operation to the appropriate storage device for the 3426 specified byte range. While the client MAY always send IO_ADVISE to 3427 the metadata server, if the server has not set 3428 NFL42_UFLG_IO_ADVISE_THRU_MDS, the client should expect that such an 3429 IO_ADVISE is futile. Note that a client SHOULD use the same set of 3430 arguments on each IO_ADVISE sent to a storage device for the same 3431 open file reference. 3433 The server is not required to support different advice for different 3434 storage devices with the same open file reference. 3436 15.5.6.1. Dense and Sparse Packing Considerations 3438 The IO_ADVISE operation MUST use the iar_offset and byte range as 3439 dictated by the presence or absence of NFL4_UFLG_DENSE (see 3440 Section 13.4.4 of [RFC5661]). 3442 E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the 3443 storage device for iaa_offset 0 really means iaa_offset 10000 in the 3444 logical file, then an IO_ADVISE for iaa_offset 0 means iaa_offset 3445 10000. 3447 E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the 3448 storage device for iaa_offset 0 really means iaa_offset 0 in the 3449 logical file, then an IO_ADVISE for iaa_offset 0 means iaa_offset 0 3450 in the logical file. 3452 E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes 3453 and the stripe count is 10, and the dense storage device file is 3454 serving iar_offset 0. A READ or WRITE to the storage device for 3455 iaa_offsets 0, 1000, 2000, and 3000, really mean iaa_offsets 10000, 3456 20000, 30000, and 40000 (implying a stripe count of 10 and a stripe 3457 unit of 1000), then an IO_ADVISE sent to the same storage device with 3458 an iaa_offset of 500, and an iaa_count of 3000 means that the 3459 IO_ADVISE applies to these byte ranges of the dense storage device 3460 file: 3462 - 500 to 999 3463 - 1000 to 1999 3464 - 2000 to 2999 3465 - 3000 to 3499 3467 I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE. 3469 It also applies to these byte ranges of the logical file: 3471 - 10500 to 10999 (500 bytes) 3472 - 20000 to 20999 (1000 bytes) 3473 - 30000 to 30999 (1000 bytes) 3474 - 40000 to 40499 (500 bytes) 3475 (total 3000 bytes) 3477 E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the 3478 stripe count is 4, and the sparse storage device file is serving 3479 iaa_offset 0. Then a READ or WRITE to the storage device for 3480 iaa_offsets 0, 1000, 2000, and 3000, really means iaa_offsets 0, 3481 1000, 2000, and 3000 in the logical file, keeping in mind that on the 3482 storage device file, byte ranges 250 to 999, 1250 to 1999, 2250 to 3483 2999, and 3250 to 3999 are not accessible. Then an IO_ADVISE sent to 3484 the same storage device with an iaa_offset of 500, and a iaa_count of 3485 3000 means that the IO_ADVISE applies to these byte ranges of the 3486 logical file and the sparse storage device file: 3488 - 500 to 999 (500 bytes) - no effect 3489 - 1000 to 1249 (250 bytes) - effective 3490 - 1250 to 1999 (750 bytes) - no effect 3491 - 2000 to 2249 (250 bytes) - effective 3492 - 2250 to 2999 (750 bytes) - no effect 3493 - 3000 to 3249 (250 bytes) - effective 3494 - 3250 to 3499 (250 bytes) - no effect 3495 (subtotal 2250 bytes) - no effect 3496 (subtotal 750 bytes) - effective 3497 (grand total 3000 bytes) - no effect + effective 3499 If neither of the flags NFL42_UFLG_IO_ADVISE_THRU_MDS and 3500 NFL4_UFLG_DENSE are set in the layout, then any IO_ADVISE request 3501 sent to the data server with a byte range that overlaps stripe unit 3502 that the data server does not serve MUST NOT result in the status 3503 NFS4ERR_PNFS_IO_HOLE. Instead, the response SHOULD be successful and 3504 if the server applies IO_ADVISE hints on any stripe units that 3505 overlap with the specified range, those hints SHOULD be indicated in 3506 the response. 3508 15.6. Operation 64: LAYOUTERROR - Provide Errors for the Layout 3510 15.6.1. ARGUMENT 3512 3514 struct device_error4 { 3515 deviceid4 de_deviceid; 3516 nfsstat4 de_status; 3517 nfs_opnum4 de_opnum; 3518 }; 3520 struct LAYOUTERROR4args { 3521 /* CURRENT_FH: file */ 3522 offset4 lea_offset; 3523 length4 lea_length; 3524 stateid4 lea_stateid; 3525 device_error4 lea_errors<>; 3526 }; 3528 3530 15.6.2. RESULT 3532 3534 struct LAYOUTERROR4res { 3535 nfsstat4 ler_status; 3536 }; 3538 3540 15.6.3. DESCRIPTION 3542 The client can use LAYOUTERROR to inform the metadata server about 3543 errors in its interaction with the layout (see Section 12 of 3544 [RFC5661]) represented by the current filehandle, client ID (derived 3545 from the session ID in the preceding SEQUENCE operation), byte-range 3546 (lea_offset + lea_length), and lea_stateid. 3548 Each individual device_error4 describes a single error associated 3549 with a storage device, which is identified via de_deviceid. If the 3550 Layout Type (see Section 12.2.7 of [RFC5661]) supports NFSv4 3551 operations, then the operation which returned the error is identified 3552 via de_opnum. If the Layout Type does not support NFSv4 operations, 3553 then it MAY chose to either map the operation onto one of the allowed 3554 operations which can be sent to a storage device with the File Layout 3555 Type (see Section 3.3) or it can signal no support for operations by 3556 marking de_opnum with the ILLEGAL operation. Finally the NFS error 3557 value (nfsstat4) encountered is provided via de_status and may 3558 consist of the following error codes: 3560 NFS4ERR_NXIO: The client was unable to establish any communication 3561 with the storage device. 3563 NFS4ERR_*: The client was able to establish communication with the 3564 storage device and is returning one of the allowed error codes for 3565 the operation denoted by de_opnum. 3567 Note that while the metadata server may return an error associated 3568 with the layout stateid or the open file, it MUST NOT return an error 3569 in the processing of the errors. If LAYOUTERROR is in a compound 3570 before LAYOUTRETURN, it MUST NOT introduce an error other than what 3571 LAYOUTRETURN would already encounter. 3573 15.6.4. IMPLEMENTATION 3575 There are two broad classes of errors, transient and persistent. The 3576 client SHOULD strive to only use this new mechanism to report 3577 persistent errors. It MUST be able to deal with transient issues by 3578 itself. Also, while the client might consider an issue to be 3579 persistent, it MUST be prepared for the metadata server to consider 3580 such issues to be transient. A prime example of this is if the 3581 metadata server fences off a client from either a stateid or a 3582 filehandle. The client will get an error from the storage device and 3583 might relay either NFS4ERR_ACCESS or NFS4ERR_BAD_STATEID back to the 3584 metadata server, with the belief that this is a hard error. If the 3585 metadata server is informed by the client that there is an error, it 3586 can safely ignore that. For it, the mission is accomplished in that 3587 the client has returned a layout that the metadata server had most 3588 likely recalled. 3590 The client might also need to inform the metadata server that it 3591 cannot reach one or more of the storage devices. While the metadata 3592 server can detect the connectivity of both of these paths: 3594 o metadata server to storage device 3596 o metadata server to client 3597 it cannot determine if the client and storage device path is working. 3598 As with the case of the storage device passing errors to the client, 3599 it must be prepared for the metadata server to consider such outages 3600 as being transitory. 3602 Clients are expected to tolerate transient storage device errors, and 3603 hence clients SHOULD NOT use the LAYOUTERROR error handling for 3604 device access problems that may be transient. The methods by which a 3605 client decides whether a device access problem is transient vs 3606 persistent are implementation-specific, but may include retrying I/Os 3607 to a data server under appropriate conditions. 3609 When an I/O fails to a storage device, the client SHOULD retry the 3610 failed I/O via the metadata server. In this situation, before 3611 retrying the I/O, the client SHOULD return the layout, or the 3612 affected portion thereof, and SHOULD indicate which storage device or 3613 devices was problematic. The client needs to do this when the 3614 storage device is being unresponsive in order to fence off any failed 3615 write attempts, and ensure that they do not end up overwriting any 3616 later data being written through the metadata server. If the client 3617 does not do this, the metadata server MAY issue a layout recall 3618 callback in order to perform the retried I/O. 3620 The client needs to be cognizant that since this error handling is 3621 optional in the metadata server, the metadata server may silently 3622 ignore this functionality. Also, as the metadata server may consider 3623 some issues the client reports to be expected, the client might find 3624 it difficult to detect a metadata server which has not implemented 3625 error handling via LAYOUTERROR. 3627 If an metadata server is aware that a storage device is proving 3628 problematic to a client, the metadata server SHOULD NOT include that 3629 storage device in any pNFS layouts sent to that client. If the 3630 metadata server is aware that a storage device is affecting many 3631 clients, then the metadata server SHOULD NOT include that storage 3632 device in any pNFS layouts sent out. If a client asks for a new 3633 layout for the file from the metadata server, it MUST be prepared for 3634 the metadata server to return that storage device in the layout. The 3635 metadata server might not have any choice in using the storage 3636 device, i.e., there might only be one possible layout for the system. 3637 Also, in the case of existing files, the metadata server might have 3638 no choice in which storage devices to hand out to clients. 3640 The metadata server is not required to indefinitely retain per-client 3641 storage device error information. An metadata server is also not 3642 required to automatically reinstate use of a previously problematic 3643 storage device; administrative intervention may be required instead. 3645 15.7. Operation 65: LAYOUTSTATS - Provide Statistics for the Layout 3647 15.7.1. ARGUMENT 3649 3651 struct layoutupdate4 { 3652 layouttype4 lou_type; 3653 opaque lou_body<>; 3654 }; 3656 struct io_info4 { 3657 uint64_t ii_count; 3658 uint64_t ii_bytes; 3659 }; 3661 struct LAYOUTSTATS4args { 3662 /* CURRENT_FH: file */ 3663 offset4 lsa_offset; 3664 length4 lsa_length; 3665 stateid4 lsa_stateid; 3666 io_info4 lsa_read; 3667 io_info4 lsa_write; 3668 deviceid4 lsa_deviceid; 3669 layoutupdate4 lsa_layoutupdate; 3670 }; 3672 3674 15.7.2. RESULT 3676 3678 struct LAYOUTSTATS4res { 3679 nfsstat4 lsr_status; 3680 }; 3682 3684 15.7.3. DESCRIPTION 3686 The client can use LAYOUTSTATS to inform the metadata server about 3687 its interaction with the layout (see Section 12 of [RFC5661]) 3688 represented by the current filehandle, client ID (derived from the 3689 session ID in the preceding SEQUENCE operation), byte-range 3690 (lsa_offset and lsa_length), and lsa_stateid. lsa_read and lsa_write 3691 allow for non-Layout Type specific statistics to be reported. 3692 lsa_deviceid allows the client to specify to which storage device the 3693 statistics apply. The remaining information the client is presenting 3694 is specific to the Layout Type and presented in the lsa_layoutupdate 3695 field. Each Layout Type MUST define the contents of lsa_layoutupdate 3696 in their respective specifications. 3698 LAYOUTSTATS can be combined with IO_ADVISE (see Section 15.5) to 3699 augment the decision making process of how the metadata server 3700 handles a file. I.e., IO_ADVISE lets the server know that a byte 3701 range has a certain characteristic, but not necessarily the intensity 3702 of that characteristic. 3704 The statistics are cumulative, i.e., multiple LAYOUTSTATS updates can 3705 be in flight at the same time. The metadata server can examine the 3706 packet's timestamp to order the different calls. The first 3707 LAYOUTSTATS sent by the client SHOULD be from the opening of the 3708 file. The choice of how often to update the metadata server is made 3709 by the client. 3711 Note that while the metadata server may return an error associated 3712 with the layout stateid or the open file, it MUST NOT return an error 3713 in the processing of the statistics. 3715 15.8. Operation 66: OFFLOAD_CANCEL - Stop an Offloaded Operation 3717 15.8.1. ARGUMENT 3719 3721 struct OFFLOAD_CANCEL4args { 3722 /* CURRENT_FH: file to cancel */ 3723 stateid4 oca_stateid; 3724 }; 3726 3728 15.8.2. RESULT 3730 3732 struct OFFLOAD_CANCEL4res { 3733 nfsstat4 ocr_status; 3734 }; 3735 3737 15.8.3. DESCRIPTION 3739 OFFLOAD_CANCEL is used by the client to terminate an asynchronous 3740 operation, which is identified both by CURRENT_FH and the 3741 oca_stateid. I.e., there can be multiple offloaded operations acting 3742 on the file, the stateid will identify to the server exactly which 3743 one is to be stopped. Currently there are only two operations which 3744 can decide to be asynchronous: COPY and WRITE_SAME. 3746 In the context of server-to-server copy, the client can send 3747 OFFLOAD_CANCEL to either the source or destination server, albeit 3748 with a different stateid. The client uses OFFLOAD_CANCEL to inform 3749 the destination to stop the active transfer and uses the stateid it 3750 got back from the COPY operation. The client uses OFFLOAD_CANCEL and 3751 the stateid it used in the COPY_NOTIFY to inform the source to not 3752 allow any more copying from the destination. 3754 OFFLOAD_CANCEL is also useful in situations in which the source 3755 server granted a very long or infinite lease on the destination 3756 server's ability to read the source file and all copy operations on 3757 the source file have been completed. 3759 15.9. Operation 67: OFFLOAD_STATUS - Poll for Status of Asynchronous 3760 Operation 3762 15.9.1. ARGUMENT 3764 3766 struct OFFLOAD_STATUS4args { 3767 /* CURRENT_FH: destination file */ 3768 stateid4 osa_stateid; 3769 }; 3771 3773 15.9.2. RESULT 3775 3776 struct OFFLOAD_STATUS4resok { 3777 length4 osr_count; 3778 nfsstat4 osr_complete<1>; 3779 }; 3781 union OFFLOAD_STATUS4res switch (nfsstat4 osr_status) { 3782 case NFS4_OK: 3783 OFFLOAD_STATUS4resok osr_resok4; 3784 default: 3785 void; 3786 }; 3788 3790 15.9.3. DESCRIPTION 3792 OFFLOAD_STATUS can be used by the client to query the progress of an 3793 asynchronous operation, which is identified both by CURRENT_FH and 3794 the osa_stateid. If this operation is successful, the number of 3795 bytes processed are returned to the client in the osr_count field. 3797 If the optional osr_complete field is present, the asynchronous 3798 operation has completed. In this case the status value indicates the 3799 result of the asynchronous operation. In all cases, the server will 3800 also deliver the final results of the asynchronous operation in a 3801 CB_OFFLOAD operation. 3803 The failure of this operation does not indicate the result of the 3804 asynchronous operation in any way. 3806 15.10. Operation 68: READ_PLUS - READ Data or Holes from a File 3808 15.10.1. ARGUMENT 3810 3812 struct READ_PLUS4args { 3813 /* CURRENT_FH: file */ 3814 stateid4 rpa_stateid; 3815 offset4 rpa_offset; 3816 count4 rpa_count; 3817 }; 3819 3821 15.10.2. RESULT 3823 3825 enum data_content4 { 3826 NFS4_CONTENT_DATA = 0, 3827 NFS4_CONTENT_HOLE = 1 3828 }; 3830 struct data_info4 { 3831 offset4 di_offset; 3832 length4 di_length; 3833 }; 3835 struct data4 { 3836 offset4 d_offset; 3837 opaque d_data<>; 3838 }; 3840 union read_plus_content switch (data_content4 rpc_content) { 3841 case NFS4_CONTENT_DATA: 3842 data4 rpc_data; 3843 case NFS4_CONTENT_HOLE: 3844 data_info4 rpc_hole; 3845 default: 3846 void; 3847 }; 3849 /* 3850 * Allow a return of an array of contents. 3851 */ 3852 struct read_plus_res4 { 3853 bool rpr_eof; 3854 read_plus_content rpr_contents<>; 3855 }; 3857 union READ_PLUS4res switch (nfsstat4 rp_status) { 3858 case NFS4_OK: 3859 read_plus_res4 rp_resok4; 3860 default: 3861 void; 3862 }; 3864 3866 15.10.3. DESCRIPTION 3868 The READ_PLUS operation is based upon the NFSv4.1 READ operation (see 3869 Section 18.22 of [RFC5661]) and similarly reads data from the regular 3870 file identified by the current filehandle. 3872 The client provides a rpa_offset of where the READ_PLUS is to start 3873 and a rpa_count of how many bytes are to be read. A rpa_offset of 3874 zero means to read data starting at the beginning of the file. If 3875 rpa_offset is greater than or equal to the size of the file, the 3876 status NFS4_OK is returned with di_length (the data length) set to 3877 zero and eof set to TRUE. 3879 The READ_PLUS result is comprised of an array of rpr_contents, each 3880 of which describe a data_content4 type of data. For NFSv4.2, the 3881 allowed values are data and hole. A server MUST support both the 3882 data type and the hole if it uses READ_PLUS. If it does not want to 3883 support a hole, it MUST use READ. The array contents MUST be 3884 contiguous in the file. 3886 Holes SHOULD be returned in their entirety - clients must be prepared 3887 to get more information than they requested. Both the start and the 3888 end of the hole may exceed what was requested. If data to be 3889 returned is comprised entirely of zeros, then the server SHOULD 3890 return that data as a hole instead. 3892 The server may elect to return adjacent elements of the same type. 3893 For example, if the server has a range of data comprised entirely of 3894 zeros and then a hole, it might want to return two adjacent holes to 3895 the client. 3897 If the client specifies a rpa_count value of zero, the READ_PLUS 3898 succeeds and returns zero bytes of data. In all situations, the 3899 server may choose to return fewer bytes than specified by the client. 3900 The client needs to check for this condition and handle the condition 3901 appropriately. 3903 If the client specifies an rpa_offset and rpa_count value that is 3904 entirely contained within a hole of the file, then the di_offset and 3905 di_length returned MAY be for the entire hole. If the owner has a 3906 locked byte range covering rpa_offset and rpa_count entirely the 3907 di_offset and di_length MUST NOT be extended outside the locked byte 3908 range. This result is considered valid until the file is changed 3909 (detected via the change attribute). The server MUST provide the 3910 same semantics for the hole as if the client read the region and 3911 received zeroes; the implied holes contents lifetime MUST be exactly 3912 the same as any other read data. 3914 If the client specifies an rpa_offset and rpa_count value that begins 3915 in a non-hole of the file but extends into hole the server should 3916 return an array comprised of both data and a hole. The client MUST 3917 be prepared for the server to return a short read describing just the 3918 data. The client will then issue another READ_PLUS for the remaining 3919 bytes, which the server will respond with information about the hole 3920 in the file. 3922 Except when special stateids are used, the stateid value for a 3923 READ_PLUS request represents a value returned from a previous byte- 3924 range lock or share reservation request or the stateid associated 3925 with a delegation. The stateid identifies the associated owners if 3926 any and is used by the server to verify that the associated locks are 3927 still valid (e.g., have not been revoked). 3929 If the read ended at the end-of-file (formally, in a correctly formed 3930 READ_PLUS operation, if rpa_offset + rpa_count is equal to the size 3931 of the file), or the READ_PLUS operation extends beyond the size of 3932 the file (if rpa_offset + rpa_count is greater than the size of the 3933 file), eof is returned as TRUE; otherwise, it is FALSE. A successful 3934 READ_PLUS of an empty file will always return eof as TRUE. 3936 If the current filehandle is not an ordinary file, an error will be 3937 returned to the client. In the case that the current filehandle 3938 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If 3939 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 3940 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 3942 For a READ_PLUS with a stateid value of all bits equal to zero, the 3943 server MAY allow the READ_PLUS to be serviced subject to mandatory 3944 byte-range locks or the current share deny modes for the file. For a 3945 READ_PLUS with a stateid value of all bits equal to one, the server 3946 MAY allow READ_PLUS operations to bypass locking checks at the 3947 server. 3949 On success, the current filehandle retains its value. 3951 15.10.3.1. Note on Client Support of Arms of the Union 3953 It was decided not to add a means for the client to inform the server 3954 as to which arms of READ_PLUS it would support. In a later minor 3955 version, it may become necessary for the introduction of a new 3956 operation which would allow the client to inform the server as to 3957 whether it supported the new arms of the union of data types 3958 available in READ_PLUS. 3960 15.10.4. IMPLEMENTATION 3962 In general, the IMPLEMENTATION notes for READ in Section 18.22.4 of 3963 [RFC5661] also apply to READ_PLUS. 3965 15.10.4.1. Additional pNFS Implementation Information 3967 With pNFS, the semantics of using READ_PLUS remains the same. Any 3968 data server MAY return a hole result for a READ_PLUS request that it 3969 receives. When a data server chooses to return such a result, it has 3970 the option of returning information for the data stored on that data 3971 server (as defined by the data layout), but it MUST NOT return 3972 results for a byte range that includes data managed by another data 3973 server. 3975 If mandatory locking is enforced, then the data server must also 3976 ensure that to return only information that is within the owner's 3977 locked byte range. 3979 15.10.5. READ_PLUS with Sparse Files Example 3981 The following table describes a sparse file. For each byte range, 3982 the file contains either non-zero data or a hole. In addition, the 3983 server in this example will only create a hole if it is greater than 3984 32K. 3986 +-------------+----------+ 3987 | Byte-Range | Contents | 3988 +-------------+----------+ 3989 | 0-15999 | Hole | 3990 | 16K-31999 | Non-Zero | 3991 | 32K-255999 | Hole | 3992 | 256K-287999 | Non-Zero | 3993 | 288K-353999 | Hole | 3994 | 354K-417999 | Non-Zero | 3995 +-------------+----------+ 3997 Table 7 3999 Under the given circumstances, if a client was to read from the file 4000 with a max read size of 64K, the following will be the results for 4001 the given READ_PLUS calls. This assumes the client has already 4002 opened the file, acquired a valid stateid ('s' in the example), and 4003 just needs to issue READ_PLUS requests. 4005 1. READ_PLUS(s, 0, 64K) --> NFS_OK, eof = false, . Since the first hole is less than the server's 4007 minimum hole size, the first 32K of the file is returned as data 4008 and the remaining 32K is returned as a hole which actually 4009 extends to 256K. 4011 2. READ_PLUS(s, 32K, 64K) --> NFS_OK, eof = false, 4012 The requested range was all zeros, and the current hole begins at 4013 offset 32K and is 224K in length. Note that the client should 4014 not have followed up the previous READ_PLUS request with this one 4015 as the hole information from the previous call extended past what 4016 the client was requesting. 4018 3. READ_PLUS(s, 256K, 64K) --> NFS_OK, eof = false, . Returns an array of the 32K data and 4020 the hole which extends to 354K. 4022 4. READ_PLUS(s, 354K, 64K) --> NFS_OK, eof = true, . Returns the final 64K of data and informs the client 4024 there is no more data in the file. 4026 15.11. Operation 69: SEEK - Find the Next Data or Hole 4028 15.11.1. ARGUMENT 4030 4032 enum data_content4 { 4033 NFS4_CONTENT_DATA = 0, 4034 NFS4_CONTENT_HOLE = 1 4035 }; 4037 struct SEEK4args { 4038 /* CURRENT_FH: file */ 4039 stateid4 sa_stateid; 4040 offset4 sa_offset; 4041 data_content4 sa_what; 4042 }; 4044 4046 15.11.2. RESULT 4048 4050 struct seek_res4 { 4051 bool sr_eof; 4052 offset4 sr_offset; 4053 }; 4054 union SEEK4res switch (nfsstat4 sa_status) { 4055 case NFS4_OK: 4056 seek_res4 resok4; 4057 default: 4058 void; 4059 }; 4061 4063 15.11.3. DESCRIPTION 4065 SEEK is an operation that allows a client to determine the location 4066 of the next data_content4 in a file. It allows an implementation of 4067 the emerging extension to lseek(2) to allow clients to determine the 4068 next hole whilst in data or the next data whilst in a hole. 4070 From the given sa_offset, find the next data_content4 of type sa_what 4071 in the file. If the server can not find a corresponding sa_what, 4072 then the status will still be NFS4_OK, but sr_eof would be TRUE. If 4073 the server can find the sa_what, then the sr_offset is the start of 4074 that content. If the sa_offset is beyond the end of the file, then 4075 SEEK MUST return NFS4ERR_NXIO. 4077 All files MUST have a virtual hole at the end of the file. I.e., if 4078 a filesystem does not support sparse files, then a compound with 4079 {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;} 4080 where 'X' was the size of the file. 4082 SEEK must follow the same rules for stateids as READ_PLUS 4083 (Section 15.10.3). 4085 15.12. Operation 70: WRITE_SAME - WRITE an ADB Multiple Times to a File 4087 15.12.1. ARGUMENT 4089 4091 enum stable_how4 { 4092 UNSTABLE4 = 0, 4093 DATA_SYNC4 = 1, 4094 FILE_SYNC4 = 2 4095 }; 4096 struct app_data_block4 { 4097 offset4 adb_offset; 4098 length4 adb_block_size; 4099 length4 adb_block_count; 4100 length4 adb_reloff_blocknum; 4101 count4 adb_block_num; 4102 length4 adb_reloff_pattern; 4103 opaque adb_pattern<>; 4104 }; 4106 struct WRITE_SAME4args { 4107 /* CURRENT_FH: file */ 4108 stateid4 wsa_stateid; 4109 stable_how4 wsa_stable; 4110 app_data_block4 wsa_adb; 4111 }; 4113 4115 15.12.2. RESULT 4117 4119 struct write_response4 { 4120 stateid4 wr_callback_id<1>; 4121 length4 wr_count; 4122 stable_how4 wr_committed; 4123 verifier4 wr_writeverf; 4124 }; 4126 union WRITE_SAME4res switch (nfsstat4 wsr_status) { 4127 case NFS4_OK: 4128 write_response4 resok4; 4129 default: 4130 void; 4131 }; 4133 4135 15.12.3. DESCRIPTION 4137 The WRITE_SAME operation writes an application data block to the 4138 regular file identified by the current filehandle (see WRITE SAME 4139 (10) in [T10-SBC2]). The target file is specified by the current 4140 filehandle. The data to be written is specified by an 4141 app_data_block4 structure (Section 8.1.1). The client specifies with 4142 the wsa_stable parameter the method of how the data is to be 4143 processed by the server. It is treated like the stable parameter in 4144 the NFSv4.1 WRITE operation (see Section 18.2 of [RFC5661]). 4146 A successful WRITE_SAME will construct a reply for wr_count, 4147 wr_committed, and wr_writeverf as per the NFSv4.1 WRITE operation 4148 results. If wr_callback_id is set, it indicates an asynchronous 4149 reply (see Section 15.12.3.1). 4151 WRITE_SAME has to support all of the errors which are returned by 4152 WRITE plus NFS4ERR_NOTSUPP, i.e., it is an OPTIONAL operation. If 4153 the client supports WRITE_SAME, it MUST support CB_OFFLOAD. 4155 If the server supports ADBs, then it MUST support the WRITE_SAME 4156 operation. The server has no concept of the structure imposed by the 4157 application. It is only when the application writes to a section of 4158 the file does order get imposed. In order to detect corruption even 4159 before the application utilizes the file, the application will want 4160 to initialize a range of ADBs using WRITE_SAME. 4162 When the client invokes the WRITE_SAME operation, it wants to record 4163 the block structure described by the app_data_block4 on to the file. 4165 When the server receives the WRITE_SAME operation, it MUST populate 4166 adb_block_count ADBs in the file starting at adb_offset. The block 4167 size will be given by adb_block_size. The ADBN (if provided) will 4168 start at adb_reloff_blocknum and each block will be monotonically 4169 numbered starting from adb_block_num in the first block. The pattern 4170 (if provided) will be at adb_reloff_pattern of each block and will be 4171 provided in adb_pattern. 4173 The server SHOULD return an asynchronous result if it can determine 4174 the operation will be long running (see Section 15.12.3.1). Once 4175 either the WRITE_SAME finishes synchronously or the server uses 4176 CB_OFFLOAD to inform the client of the asynchronous completion of the 4177 WRITE_SAME, the server MUST return the ADBs to clients as data. 4179 15.12.3.1. Asynchronous Transactions 4181 ADB initialization may lead to server determining to service the 4182 operation asynchronously. If it decides to do so, it sets the 4183 stateid in wr_callback_id to be that of the wsa_stateid. If it does 4184 not set the wr_callback_id, then the result is synchronous. 4186 When the client determines that the reply will be given 4187 asynchronously, it should not assume anything about the contents of 4188 what it wrote until it is informed by the server that the operation 4189 is complete. It can use OFFLOAD_STATUS (Section 15.9) to monitor the 4190 operation and OFFLOAD_CANCEL (Section 15.8) to cancel the operation. 4191 An example of a asynchronous WRITE_SAME is shown in Figure 6. Note 4192 that as with the COPY operation, WRITE_SAME must provide a stateid 4193 for tracking the asynchronous operation. 4195 Client Server 4196 + + 4197 | | 4198 |--- OPEN ---------------------------->| Client opens 4199 |<------------------------------------/| the file 4200 | | 4201 |--- WRITE_SAME ----------------------->| Client initializes 4202 |<------------------------------------/| an ADB 4203 | | 4204 | | 4205 |--- OFFLOAD_STATUS ------------------>| Client may poll 4206 |<------------------------------------/| for status 4207 | | 4208 | . | Multiple OFFLOAD_STATUS 4209 | . | operations may be sent. 4210 | . | 4211 | | 4212 |<-- CB_OFFLOAD -----------------------| Server reports results 4213 |\------------------------------------>| 4214 | | 4215 |--- CLOSE --------------------------->| Client closes 4216 |<------------------------------------/| the file 4217 | | 4218 | | 4220 Figure 6: An asynchronous WRITE_SAME. 4222 When CB_OFFLOAD informs the client of the successful WRITE_SAME, the 4223 write_response4 embedded in the operation will provide the necessary 4224 information that a synchronous WRITE_SAME would have provided. 4226 Regardless of whether the operation is asynchronous or synchronous, 4227 it MUST still support the COMMIT operation semantics as outlined in 4228 Section 18.3 of [RFC5661]. I.e., COMMIT works on one or more WRITE 4229 operations and the WRITE_SAME operation can appear as several WRITE 4230 operations to the server. The client can use locking operations to 4231 control the behavior on the server with respect to long running 4232 asynchronous write operations. 4234 15.12.3.2. Error Handling of a Partially Complete WRITE_SAME 4236 WRITE_SAME will clone adb_block_count copies of the given ADB in 4237 consecutive order in the file starting at adb_offset. An error can 4238 occur after writing the Nth ADB to the file. WRITE_SAME MUST appear 4239 to populate the range of the file as if the client used WRITE to 4240 transfer the instantiated ADBs. I.e., the contents of the range will 4241 be easy for the client to determine in case of a partially complete 4242 WRITE_SAME. 4244 15.13. Operation 71: CLONE - Clone a range of file into another file 4246 15.13.1. ARGUMENT 4248 4250 struct CLONE4args { 4251 /* SAVED_FH: source file */ 4252 /* CURRENT_FH: destination file */ 4253 stateid4 cl_src_stateid; 4254 stateid4 cl_dst_stateid; 4255 offset4 cl_src_offset; 4256 offset4 cl_dst_offset; 4257 length4 cl_count; 4258 }; 4260 4262 15.13.2. RESULT 4264 4266 struct CLONE4res { 4267 nfsstat4 cl_status; 4268 }; 4270 4272 15.13.3. DESCRIPTION 4274 The CLONE operation is used to clone file content from a source file 4275 specified by the SAVED_FH value into a destination file specified by 4276 CURRENT_FH without actually copying the data, e.g., by using a copy- 4277 on-write mechanism. 4279 Both SAVED_FH and CURRENT_FH must be regular files. If either 4280 SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail 4281 and return NFS4ERR_WRONG_TYPE. 4283 The ca_dst_stateid MUST refer to a stateid that is valid for a WRITE 4284 operation and follows the rules for stateids in Sections 8.2.5 and 4285 18.32.3 of [RFC5661]. The ca_src_stateid MUST refer to a stateid 4286 that is valid for a READ operations and follows the rules for 4287 stateids in Sections 8.2.5 and 18.22.3 of [RFC5661]. If either 4288 stateid is invalid, then the operation MUST fail. 4290 The cl_src_offset is the starting offset within the source file from 4291 which the data to be cloned will be obtained and the cl_dst_offset is 4292 the starting offset of the target region into which the cloned data 4293 will be placed. An offset of 0 (zero) indicates the start of the 4294 respective file. The number of bytes to be cloned is obtained from 4295 cl_count, except that a cl_count of 0 (zero) indicates that the 4296 number of bytes to be cloned is the count of bytes between 4297 cl_src_offset and the EOF of the source file. Both cl_src_offset and 4298 cl_dst_offset must be aligned to the clone block size Section 12.2.1. 4299 The number of bytes to be cloned must be a multiple of the clone 4300 block size, except in the case in which cl_src_offset plus the number 4301 of bytes to be cloned is equal to the source file size. 4303 If the source offset or the source offset plus count is greater than 4304 the size of the source file, the operation MUST fail with 4305 NFS4ERR_INVAL. The destination offset or destination offset plus 4306 count may be greater than the size of the destination file. 4308 If SAVED_FH and CURRENT_FH refer to the same file and the source and 4309 target ranges overlap, the operation MUST fail with NFS4ERR_INVAL. 4311 If the target area of the clone operation ends beyond the end of the 4312 destination file, the offset at the end of the target area will 4313 determine the new size of the destination file. The contents of any 4314 block not part of the target area will be the same as if the file 4315 size were extended by a WRITE. 4317 If the area to be cloned is not a multiple of the clone block size 4318 and the size of the destination file is past the end of the target 4319 area, the area between the end of the target area and the next 4320 multiple of the clone block size will be zeroed. 4322 The CLONE operation is atomic in that other operations may not see 4323 any intermediate states between the state of the two files before the 4324 operation and that after the operation. READs of the destination 4325 file will never see some blocks of the target area cloned without all 4326 of them being cloned. WRITEs of the source area will either have no 4327 effect on the data of the target file or be fully reflected in the 4328 target area of the destination file. 4330 The completion status of the operation is indicated by cr_status. 4332 16. NFSv4.2 Callback Operations 4334 16.1. Operation 15: CB_OFFLOAD - Report results of an asynchronous 4335 operation 4337 16.1.1. ARGUMENT 4339 4341 struct write_response4 { 4342 stateid4 wr_callback_id<1>; 4343 length4 wr_count; 4344 stable_how4 wr_committed; 4345 verifier4 wr_writeverf; 4346 }; 4348 union offload_info4 switch (nfsstat4 coa_status) { 4349 case NFS4_OK: 4350 write_response4 coa_resok4; 4351 default: 4352 length4 coa_bytes_copied; 4353 }; 4355 struct CB_OFFLOAD4args { 4356 nfs_fh4 coa_fh; 4357 stateid4 coa_stateid; 4358 offload_info4 coa_offload_info; 4359 }; 4361 4363 16.1.2. RESULT 4365 4367 struct CB_OFFLOAD4res { 4368 nfsstat4 cor_status; 4369 }; 4371 4373 16.1.3. DESCRIPTION 4375 CB_OFFLOAD is used to report to the client the results of an 4376 asynchronous operation, e.g., Server Side Copy or WRITE_SAME. The 4377 coa_fh and coa_stateid identify the transaction and the coa_status 4378 indicates success or failure. The coa_resok4.wr_callback_id MUST NOT 4379 be set. If the transaction failed, then the coa_bytes_copied 4380 contains the number of bytes copied before the failure occurred. The 4381 coa_bytes_copied value indicates the number of bytes copied but not 4382 which specific bytes have been copied. 4384 If the client supports any of the following operations: 4386 COPY: for both intra-server and inter-server asynchronous copies 4388 WRITE_SAME: for ADB initialization 4390 then the client is REQUIRED to support the CB_OFFLOAD operation. 4392 There is a potential race between the reply to the original 4393 transaction on the forechannel and the CB_OFFLOAD callback on the 4394 backchannel. Sections 2.10.6.3 and 20.9.3 of [RFC5661] describe how 4395 to handle this type of issue. 4397 Upon success, the coa_resok4.wr_count presents for each operation: 4399 COPY: the total number of bytes copied 4401 WRITE_SAME: the same information that a synchronous WRITE_SAME would 4402 provide 4404 17. Security Considerations 4406 NFSv4.2 has all of the security concerns present in NFSv4.1 (see 4407 Section 21 of [RFC5661]) and those present in the Server Side Copy 4408 (see Section 4.10) and in Labeled NFS (see Section 9.6). 4410 18. IANA Considerations 4412 The IANA Considerations for Labeled NFS are addressed in [RFC7569]. 4414 19. References 4416 19.1. Normative References 4418 [I-D.ietf-nfsv4-minorversion2-dot-x] 4419 Haynes, T., "NFSv4 Minor Version 2 Protocol External Data 4420 Representation Standard (XDR) Description", draft-ietf- 4421 nfsv4-minorversion2-dot-x-40 (work in progress), January 4422 2016. 4424 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 4425 Resource Identifier (URI): Generic Syntax", STD 66, RFC 4426 3986, January 2005. 4428 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 4429 System (NFS) Version 4 Minor Version 1 Protocol", RFC 4430 5661, January 2010. 4432 [RFC5662] Shepler, S., Eisler, M., and D. Noveck, "Network File 4433 System (NFS) Version 4 Minor Version 1 External Data 4434 Representation Standard (XDR) Description", RFC 5662, 4435 January 2010. 4437 [RFC7569] Quigley, D., Lu, J., and T. Haynes, "Registry 4438 Specification for Mandatory Access Control (MAC) Security 4439 Label Formats", RFC 7569, July 2015. 4441 [posix_fadvise] 4442 The Open Group, "Section 'posix_fadvise()' of System 4443 Interfaces of The Open Group Base Specifications Issue 6, 4444 IEEE Std 1003.1, 2004 Edition", 2004. 4446 [posix_fallocate] 4447 The Open Group, "Section 'posix_fallocate()' of System 4448 Interfaces of The Open Group Base Specifications Issue 6, 4449 IEEE Std 1003.1, 2004 Edition", 2004. 4451 [rpcsec_gssv3] 4452 Adamson, W. and N. Williams, "Remote Procedure Call (RPC) 4453 Security Version 3", December 2014. 4455 19.2. Informative References 4457 [Ashdown08] 4458 Ashdown, L., "Chapter 15, Validating Database Files and 4459 Backups, of Oracle Database Backup and Recovery User's 4460 Guide 11g Release 1 (11.1)", August 2008. 4462 [BL73] Bell, D. and L. LaPadula, "Secure Computer Systems: 4463 Mathematical Foundations and Model", Technical Report 4464 M74-244, The MITRE Corporation, Bedford, MA, May 1973. 4466 [Baira08] Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci- 4467 Dusseau, A., and R. Arpaci-Dusseau, "An Analysis of Data 4468 Corruption in the Storage Stack", Proceedings of the 6th 4469 USENIX Symposium on File and Storage Technologies (FAST 4470 '08) , 2008. 4472 [IESG08] ISEG, "IESG Processing of RFC Errata for the IETF Stream", 4473 2008. 4475 [McDougall07] 4476 McDougall, R. and J. Mauro, "Section 11.4.3, Detecting 4477 Memory Corruption of Solaris Internals", 2007. 4479 [NFSv4-Versioning] 4480 Haynes, T. and D. Noveck, "NFSv4 Version Management", 4481 November 2014. 4483 [RFC1108] Kent, S., "Security Options for the Internet Protocol", 4484 RFC 1108, November 1991. 4486 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4487 Requirement Levels", March 1997. 4489 [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the 4490 Internet Protocol", RFC 2401, November 1998. 4492 [RFC4506] Eisler, M., "XDR: External Data Representation Standard", 4493 RFC 4506, May 2006. 4495 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 4496 4949, August 2007. 4498 [RFC5663] Black, D., Fridella, S., and J. Glasgow, "Parallel NFS 4499 (pNFS) Block/Volume Layout", RFC 5663, January 2010. 4501 [RFC7204] Haynes, T., "Requirements for Labeled NFS", RFC 7204, 4502 April 2014. 4504 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 4505 Protocol (HTTP/1.1): Message Syntax and Routing", RFC 4506 7230, DOI 10.17487/RFC7230, June 2014, 4507 . 4509 [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) 4510 version 4 Protocol", RFC 7530, March 2015. 4512 [RFC959] Postel, J. and J. Reynolds, "File Transfer Protocol", STD 4513 9, RFC 959, October 1985. 4515 [Strohm11] 4516 Strohm, R., "Chapter 2, Data Blocks, Extents, and 4517 Segments, of Oracle Database Concepts 11g Release 1 4518 (11.1)", January 2011. 4520 [T10-SBC2] 4521 Elliott, R., Ed., "ANSI INCITS 405-2005, Information 4522 Technology - SCSI Block Commands - 2 (SBC-2)", November 4523 2004. 4525 Appendix A. Acknowledgments 4527 Tom Haynes would like to thank NetApp, Inc. for its funding of his 4528 time on this project. 4530 For the Sharing change attribute implementation characteristics with 4531 NFSv4 clients, the original draft was by Trond Myklebust. 4533 For the NFS Server Side Copy, the original draft was by James 4534 Lentini, Mike Eisler, Deepak Kenchammana, Anshul Madan, and Rahul 4535 Iyer. Tom Talpey co-authored an unpublished version of that 4536 document. It was also was reviewed by a number of individuals: 4537 Pranoop Erasani, Tom Haynes, Arthur Lent, Trond Myklebust, Dave 4538 Noveck, Theresa Lingutla-Raj, Manjunath Shankararao, Satyam Vaghani, 4539 and Nico Williams. Anna Schumaker's early prototyping experience 4540 helped us avoid some traps. Also, both Olga Kornievskaia and Andy 4541 Adamson brought implementation experience to the use of copy stateids 4542 in inter-server copy. Jorge Mora was able to optimize the handling 4543 of errors for the result of COPY. 4545 For the NFS space reservation operations, the original draft was by 4546 Mike Eisler, James Lentini, Manjunath Shankararao, and Rahul Iyer. 4548 For the sparse file support, the original draft was by Dean 4549 Hildebrand and Marc Eshel. Valuable input and advice was received 4550 from Sorin Faibish, Bruce Fields, Benny Halevy, Trond Myklebust, and 4551 Richard Scheffenegger. 4553 For the Application IO Hints, the original draft was by Dean 4554 Hildebrand, Mike Eisler, Trond Myklebust, and Sam Falkner. Some 4555 early reviewers included Benny Halevy and Pranoop Erasani. 4557 For Labeled NFS, the original draft was by David Quigley, James 4558 Morris, Jarret Lu, and Tom Haynes. Peter Staubach, Trond Myklebust, 4559 Stephen Smalley, Sorin Faibish, Nico Williams, and David Black also 4560 contributed in the final push to get this accepted. 4562 Christoph Hellwig was very helpful in getting the WRITE_SAME 4563 semantics to model more of what T10 was doing for WRITE SAME (10) 4564 [T10-SBC2]. And he led the push to get space reservations to more 4565 closely model the posix_fallocate. 4567 Andy Adamson picked up the RPCSEC_GSSv3 work, which enabled both 4568 Labeled NFS and Server Side Copy to be present more secure options. 4570 Christoph Hellwig provided the update to GETDEVICELIST. 4572 Jorge Mora provided a very detailed review and caught some important 4573 issues with the tables. 4575 During the review process, Talia Reyes-Ortiz helped the sessions run 4576 smoothly. While many people contributed here and there, the core 4577 reviewers were Andy Adamson, Pranoop Erasani, Bruce Fields, Chuck 4578 Lever, Trond Myklebust, David Noveck, Peter Staubach, and Mike 4579 Kupfer. 4581 Appendix B. RFC Editor Notes 4583 [RFC Editor: please remove this section prior to publishing this 4584 document as an RFC] 4586 [RFC Editor: prior to publishing this document as an RFC, please 4587 replace all occurrences of I-D.ietf-nfsv4-minorversion2-dot-x with 4588 RFCxxxx where xxxx is the RFC number of the companion XDR document] 4590 Author's Address 4592 Thomas Haynes 4593 Primary Data, Inc. 4594 4300 El Camino Real Ste 100 4595 Los Altos, CA 94022 4596 USA 4598 Phone: +1 408 215 1519 4599 Email: thomas.haynes@primarydata.com